Adaptation in English
- Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p^2
- The title noriega is the name of a politician. Kind of like “Kennedy.”
- When the probability of a word appearing is p, we tend to think that the probability of it appearing twice is p
- But in reality, once a word appears, it appears with high frequency.
- How high a frequency is this?
- Surprisingly, the conditional probability of appearing k+1 times with k occurrences is,
- Not depending on the probability of one occurrence .
-
- Comparing words with the same probability of occurrence, Kennedy, for example, shows a higher concentration of occurrence than except.
- In Development of a system for extracting keywords in unexplored text information, we compare the distribution of substrings in the paper with the distribution of strings chosen by humans as keywords, and show that the keyword distribution is the same as the story here, with probability independent distribution as in the story here.
- Based on what we’re talking about here, it seems like that kind of distribution for word distribution alone, and for keywords, it’s on the upper end of the scale. I would like to compare the distribution of words with the distribution of keywords.
- Conversely, for any given string, DF2/ DF is an indicator of “keywordiness” independent of its frequency of occurrence
This page is auto-translated from /nishio/出現集中 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.