technique

  • An approach that does not use linguistic knowledge
    • Simple word frequency. - Need for [stopword
      • Throwing away information on word order.
        • The “General Manager’s Association” issue where the idiom is split.
      • Synonyms are considered different
      • cooccurrence
      • co-location
        • N-grams, etc.
        • intra-window co-occurrence
      • intra-document cooccurrence - concentration (of one’s attention)
    • tf-idf
      • Approach to map real-valued scores whereas the stop word was 0/1.
      • ‘The less frequently it appears in other texts, the more appropriate it is to characterize this text.’
      • Frequent occurrence as a word, but sometimes an important key phrase in the form of an idiom
    • RAKE
  • graph based (e.g. graph)
    • Graph word adjacencies and choose the one with the highest rank.
    • Use PageRank

This page is auto-translated from /nishio/ă‚­ăƒŒăƒŻăƒŒăƒ‰æŠœć‡ș using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.