-
2014 Nuzzo, R.(2014). Scientific method:statistical errors. Nature, 506, 150-152
- https://www.nature.com/news/scientific-method-statistical-errors-1.14700
- p-values were created by Fisher, but were not the purpose of the test
- Neyman et al. devised the test mechanism.
- While Fisher and Neyman were criticizing each other’s approach, other authors mixed up their arguments and a “method of testing by calculating a p-value and comparing it to a threshold value (such as 0.05)” was born.
- How the distribution is updated before and after observing a result that is 0.05 significant when tested (Bayesian interpretation)
- If the prior distribution has a 5% probability of being a real effect, the probability only increases to 11% if the test is found to be significant.
- If the prior distribution is 50-50, it rises to 71% after the test is determined to be significant.
- The common interpretation “If a phenomenon is significant when tested at a 5% level of significance, there is a 95% probability that the phenomenon is real” is incorrect.
-
2015 Basic and Applied Social Psychology(BASP)
-
2016 American Statistical Association(ASA)
- Statement on p-values
- There are a lot of things, but I’ll pick some of them up.
- The p-value is not a measure of the probability that the hypothesis is true or that the data were generated by chance
- Scientific conclusions and business or policy decisions should not depend solely on whether the p-value passes a particular threshold
- The p-value or statistical significance is not a measure of the size of the effect or the importance of the result
- A p-value by itself does not provide good evidence for a model or hypothesis
- ASA (American Statistical Association), annoyed by misuse of p-values, issues statement | SciStat
-
2016 “Recent Discussions on p-values” (Japanese commentary)
memo
-
http://smrmkt.hatenablog.jp/entry/2015/04/14/234856
- Optimizely
- The sample size increases over time, but it is unnatural to not look at the data until a defined sample size is reached.
- Repeated viewing increases the false positive rate.
- sequential test Base
- From false positive rate to [false discovery rate
-
-
It is difficult to say what to do, but since Confidence intervals are more important than p-values and Clopper and Pearson’s method for obtaining confidence intervals has already become the standard, why not use it to obtain confidence intervals and not report the p-values in particular? Since Confidence intervals are more important than p-values and the Clopper and Pearson method for obtaining confidence intervals is already standard, how about obtaining confidence intervals using this method and not reporting the p-values in particular (let the respondents judge whether the confidence intervals include the number of null hypotheses or not)?
-
This page is auto-translated from /nishio/p値の批判 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.