American Journal of Epidemiology Advance Access originally published online on July 4, 2008
American Journal of Epidemiology 2008 168(4):389-390; doi:10.1093/aje/kwn152
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Author Responds to "Evaluating p Values and Bayes Factors"
1 Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
2 Biomedical Research Institute, Foundation for Research and Technology-Hellas, Ioannina, Greece
3 Department of Medicine, Tufts University School of Medicine, Boston, MA
Correspondence to Dr. John P. A. Ioannidis, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, University Campus, Ioannina 45110, Greece (e-mail: jioannid{at}cc.uoi.gr).
Received for publication March 26, 2008. Accepted for publication May 8, 2008.
Katki (1) offers a very insightful commentary on various Bayesian options for making inferences. Just for graphical clarification, the prior I proposed (2) has a spike-and-smear configuration (figure 1); this is computationally equivalent to what Katki shows in his figure 1 for support of observed data (after we see the data). One should beware that the amount of mass placed at 0 does not influence the Bayes factor.
|
Three- to sixfold variability in Bayes factors with different formulations is probably tolerable. I would not worry even with more sizeable differences. The Bayes factors encountered currently in empirical studies span quadrillion-fold differences (3). Comparative empirical studies on Bayes factors are welcome. I suspect differences in performance would be subtle. If anything, I favor more conservative Bayes factor estimates, because typical Bayes factor calculations do not explicitly account for bias. Alternatively, one could use any Bayes factor and then try to model separately the potential impact of superimposed bias (3, 4).
The main problem is not the subtle divergence of Bayesian approaches; it is the fact that instinctively researchers and clinicians think as Bayesians ("Is this true, yes or no?") (5) but mistakenly equate p values with posterior inference tools. This means that they correspond p values less than 0.05 with posterior odds greater than 1 (i.e., >50 percent chance that something is true). This would be fairly reasonable only if the prior odds are already high and the study finds what we already believe—that is, if we believe that the truth is exactly what the study finds. This is closer to dogma rather than critical scientific inference-making. Examples: It is so obvious that estrogens should decrease Alzheimer's disease risk in women (relative risk = 0.4, p = 0.01) (6, 7). It is so obvious that some polymorphism rs7566605 in our genome can increase the risk of obesity (relative risk = 1.22, p = 0.008) (8, 9). It is so obvious that consumption of yogurt specifically on the evening of March 18, 2010 (as recorded by electronic cameras in a cohort of electronically followed participants) decreases the risk of lung cancer 10 years later (relative risk = 0.2, p = 10–10). It is so obvious, now that we can measure the luminosity of walls at 0.0000000001-mm resolution (in the year 2020), that the luminosity 12.2765984917 mm away on the horizontal and 17.5490200115 mm away on the vertical from the lower left corner of the mirror in one's bathroom is related to the risk of having a heart attack (relative risk = 2.2, p = 10–20). I would question (more or less) all of these obvious assumptions of the past, present, and tentative future research agenda. Regardless, the Bayesian approach allows one to see whether it makes a difference when various priors are considered. As measurement options become astronomically multiplied, increasingly telescoped p values are misleading. On average, a p value of 0.0001 currently or 10–20 in 2020 may be as misleading as a p value of 0.01–0.05 has been in the recent past.
Empirical studies can help us build more reasonable assumptions about prior distributions under the alternative, especially when we accumulate evidence with extensive replication for many association effects in a specific field and we also allow for the inflation inherent in discovered effects (10). We can use this information to corroborate the credibility of new associations. This is a trial-and-error process that hopefully becomes more accurate with more experience. Past evidence informs priors and priors inform new evidence. However, this corrective iteration needs a commitment to think in Bayesian terms, while the entrenched dominance of p values makes our Bayesian instincts atrophic and subconscious. Making the reasoning process more explicit and transparent may help us understand what we understand and can allow us to examine whether we all understand the same thing.
| ACKNOWLEDGMENTS |
|---|
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Katki HA. Invited commentary: evidence-based evaluation of p values and Bayes factors. Am J Epidemiol (2008) 168:384–88.
[Abstract/Free Full Text] - Ioannidis JPA. Effect of formal statistical significance on the credibility of observational associations. Am J Epidemiol (2008) 168:374–83.
[Abstract/Free Full Text] - Ioannidis JPA. Calibration of credibility of agnostic genome-wide associations. Am J Med Genet B Neuropsychiatr Genet (2008) Mar 24 [Epub ahead of print].
- Ioannidis JPA. Why most published research findings are false. PLoS Med (2005) 2–e124. (Electronic article).
- Gill CJ, Sabin L, Schmid CH. Why clinicians are natural Bayesians. BMJ (2005) 330:1080–3.
[Free Full Text] - Tatsioni A, Bonitsis NG, Ioannidis JP. Persistence of contradicted claims in the literature. JAMA (2007) 298:2517–26.
[Abstract/Free Full Text] - Tang MX, Jacobs D, Stern Y, et al. Effect of oestrogen during menopause on risk and age at onset of Alzheimer's disease. Lancet (1996) 348:429–32.[CrossRef][Web of Science][Medline]
- Herbert A, Gerry NP, McQueen MB, et al. A common genetic variant is associated with adult and childhood obesity. Science (2006) 312:279–83.
[Abstract/Free Full Text] - Ioannidis JP. Non-replication and inconsistency in the genome-wide association setting. Hum Hered (2007) 64:203–13.[CrossRef][Medline]
- Zollner S, Pritchard JK. Overcoming the winner's curse: estimating penetrance parameters from case-control data. Am J Hum Genet (2007) 80:605–15.[CrossRef][Medline]
Related articles in Am. J. Epidemiol.:
- Effect of Formal Statistical Significance on the Credibility of Observational Associations
- John P. A. Ioannidis
Am. J. Epidemiol. 2008 168: 374-383.[Abstract] [FREE Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
