AI King - Japanā€™s No.1 Quiz AI Competition and won 3rd placeļ½œPKSHA Delta 2023-01-04

  • image

  • Talk about combining searches.

    • Iā€™m combining DPR and BM25.
      • DPR is, in a crude way, vector search, and BM25 is, in a crude way, ā€œmodern TF-IDF.
      • Vector search is weak for low-frequency words (= proper nouns, technical terms and product names), so combine ordinary search
  • BM25: Lexical Match Base

  • Dense Passage Retriever (DPR)ā€¦ Dense vector-based using pre-trained models

    • Strong semantic similarity
    • Missing low-frequency words / Significant performance degradation outside the distribution (OOD)
  • Very small overlap in search results for both DPR and BM25

  • ā†’ good points on both sides.

  • All teams are Retriever-Reader type

    • Adopted Fusion in Decoder as Reader
      • Top 100 items
    • Surveying papers around information retrieval, we found several studies in this context, and seq-to-seq based ones are more accurate than building classifiers on BERT basis

PKSHA LLM


This page is auto-translated from /nishio/ꤜē“¢ć‚’ēµ„ćæåˆć‚ć›ć‚‹ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iā€™m very happy to spread my thought to non-Japanese readers.