Simpson’s paradox or Yule-Simpson effect is a (the study of) statistics paradox described by E. H. Simpson in 1951. The correlation in the population and the correlation in the population divided by the population may be different. In other words, a hypothesis may be true when the population is divided into two groups, but the opposite hypothesis may be true for the population as a whole. [Simpson’s paradox - Wikipedia https://ja.wikipedia.org/wiki/%E3%82%B7%E3%83%B3%E3%83%97%E3%82%BD%E3%83%B3%E3%81%AE%E3%83%91%E3%83%A9%E3% 83%89%E3%83%83%E3%82%AF%E3%82%B9]

but can also be .

  • Example 1
    • The average scores are related to 100>90 and 10>0, respectively, but the overall average is reversed because there are many (9) people with “average scores of 90” and “average scores of 10”.
    • (100 * 1) / 1 > (90 * 9) > 9
    • (10 * 9) / 9 > (0 * 1) > 1
    • 190 / 10 < 810 / 10
  • Example 2
    • Originally both were 2/4, but by dividing them differently, each can win a split.
    • 2 / 4 = 2 / 4
    • 1 / 1 > 2 / 3
    • 1 / 3 > 0 / 1
      • Similar to [Sun Yat-sen’s carriage

Mystery of Data Analysis, Simpson’s Paradox from Statistical Causal Inference - Unboundedly

  • Explanation of why we shouldn’t have tried to interpret causality in a data-driven way.
  • Even if the data is exactly the same, it is different whether it is correct to compare with or without splitting

[/okumura/Simpson’s Paradox](https://scrapbox.io/okumura/Simpson’s Paradox).

Unexpected Phenomenon

This page is auto-translated from /nishio/ă‚·ăƒłăƒ—ă‚œăƒłăźăƒ‘ăƒ©ăƒ‰ăƒƒă‚Żă‚č using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.