Adaptation in English

  • Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p^2
  • The title noriega is the name of a politician. Kind of like “Kennedy.”
  • When the probability of a word appearing is p, we tend to think that the probability of it appearing twice is p
  • But in reality, once a word appears, it appears with high frequency.
  • How high a frequency is this?
  • Surprisingly, the conditional probability of appearing k+1 times with k occurrences is,
    • Not depending on the probability of one occurrence .
    • image
  • Comparing words with the same probability of occurrence, Kennedy, for example, shows a higher concentration of occurrence than except.
    • image
    • Based on what we’re talking about here, it seems like that kind of distribution for word distribution alone, and for keywords, it’s on the upper end of the scale. I would like to compare the distribution of words with the distribution of keywords.
    • Conversely, for any given string, DF2/ DF is an indicator of “keywordiness” independent of its frequency of occurrence

This page is auto-translated from /nishio/出現集中 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.