Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks. https://arxiv.org/abs/2004.04906 (DeepL) Open domain question answering relies on efficient sentence retrieval to select candidate contexts, and traditional sparse vector space models such as TF-IDF and BM25 are the de facto methods. In this study, we show that learning embeddings from a small number of questions and sentences via a simple dual-encoder framework allows practical implementation of retrieval using only dense representations. When evaluated on a wide range of open-domain QA datasets, our dense search significantly outperforms the powerful Lucene-BM25 system in terms of retrieval accuracy for the top 20 passages by 9%-19% absolute, indicating that our end-to-end QA system can be used in multiple open-domain QA benchmarks to help establish a new state-of-the-art.

DPR Dense Passage Retrieval Dense Passage Retriever

Validation of DPR in Open Domain QA PDF [AI Shift Advent Calendar 2022 Presentation at the 13th Symposium on Interactive Systems | AI Shift Inc.

Two-Tower model https://www.youtube.com/watch?v=3giqIW2pIW4 [Report The recommendation model behind Google’s Two- Tower” and Vector Neighborhood Search TechnologyGoogleCloudDay | DevelopersIO]


This page is auto-translated from [/nishio/Dense Passage Retrieval for Open-Domain Question Answering](https://scrapbox.io/nishio/Dense Passage Retrieval for Open-Domain Question Answering) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.