AI King - Japanās No.1 Quiz AI Competition and won 3rd placeļ½PKSHA Delta 2023-01-04
-
-
Talk about combining searches.
- Iām combining DPR and BM25.
- DPR is, in a crude way, vector search, and BM25 is, in a crude way, āmodern TF-IDF.
- Vector search is weak for low-frequency words (= proper nouns, technical terms and product names), so combine ordinary search
- Iām combining DPR and BM25.
-
BM25: Lexical Match Base
-
Dense Passage Retriever (DPR)ā¦ Dense vector-based using pre-trained models
- Strong semantic similarity
- Missing low-frequency words / Significant performance degradation outside the distribution (OOD)
-
Very small overlap in search results for both DPR and BM25
-
ā good points on both sides.
-
All teams are Retriever-Reader type
- Adopted Fusion in Decoder as Reader
- Top 100 items
-
Surveying papers around information retrieval, we found several studies in this context, and seq-to-seq based ones are more accurate than building classifiers on BERT basis
- Learned Reranker (rerank)
- Adopted Fusion in Decoder as Reader
This page is auto-translated from /nishio/ę¤ē“¢ćēµćæåććć using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iām very happy to spread my thought to non-Japanese readers.