image

  • When a series has a score of 0 or 1, if you consider the ranking of “the product of the scores for a subsequence”, it includes all the subsequences of the interval that are 1.

  • If you interpret the algorithm in RAKE to mean that the columns from which stopwords are removed are candidates for key phrases, with a score of 0 for stopwords and 1 for everything else, this approach is incorrect because it also includes sub-columns.

  • If we consider that the outer scores of the subcolumns are multiplied by the score subtracted from 1, only the extreme columns will have a score of 1

    • This fits well with the concept of the RAKE stop list generation algorithm, which counts the number of adjacent occurrences of a key phrase
  • I think this all comes down to the hidden Markov model.

    • Hidden state is either 2 states of “being a keyword, not a keyword” or 4 states of “not a keyword, before a keyword, within a keyword, after a keyword”.
  • If it can be attributed to hidden Markov, then it can also be attributed to conditional probability field. - series labeling


This page is auto-translated from /nishio/極大列だけ取り出すスコア計算 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.