Score calculation to take out only the maximal columns

When a series has a score of 0 or 1, if you consider the ranking of “the product of the scores for a subsequence”, it includes all the subsequences of the interval that are 1.
If you interpret the algorithm in RAKE to mean that the columns from which stopwords are removed are candidates for key phrases, with a score of 0 for stopwords and 1 for everything else, this approach is incorrect because it also includes sub-columns.
If we consider that the outer scores of the subcolumns are multiplied by the score subtracted from 1, only the extreme columns will have a score of 1
- This fits well with the concept of the RAKE stop list generation algorithm, which counts the number of adjacent occurrences of a key phrase
I think this all comes down to the hidden Markov model.
- Hidden state is either 2 states of “being a keyword, not a keyword” or 4 states of “not a keyword, before a keyword, within a keyword, after a keyword”.
If it can be attributed to hidden Markov, then it can also be attributed to conditional probability field. - series labeling

This page is auto-translated from /nishio/極大列だけ取り出すスコア計算 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

Score calculation to take out only the maximal columns

Graph View

Backlinks