American Journal of Epidemiology Advance Access published online on July 15, 2008
American Journal of Epidemiology, doi:10.1093/aje/kwn183
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Cautionary Note on the Evaluation of Biomarkers of Subtypes of a Single Disease
1 Merck Research Laboratories, North Wales, PA
2 Department of Public Health Sciences, School of Public Health, University of Alberta, Edmonton, Alberta, Canada
3 Fred Hutchinson Cancer Research Center, Seattle, WA
4 Biomedical Life Sciences Group, Intel Corporation, Santa Clara, CA
Correspondence to Dr. Yutaka Yasui, Department of Public Health Sciences, School of Public Health, University of Alberta, 13-103 Clinical Sciences Building, Edmonton, Alberta T6G 2G3, Canada (e-mail: yyasui{at}ualberta.ca).
Received for publication December 7, 2007. Accepted for publication May 22, 2008.
Heterogeneity in the molecular characteristics of a disease presents a challenge to investigators attempting to identify biomarkers of the disease. Preceding the biomarker discovery effort with stratification within a heterogeneous disease group, which amounts to grouping disease cases into more homogeneous subtypes, seems to be a natural strategy for discovering subtype-specific biomarkers. This is because biologically more homogeneous subgroups are presumably easier to distinguish from the nondiseased than the entire heterogeneous disease group. The misleading benefits of this two-step approach are illustrated using an example from a protein biomarker discovery project for breast cancer. A potential analytical pitfall in this framework is explained using a conditional probability argument.
biological markers; classification problem; conditional probability; cross-validation; mass spectrometry; misclassification
Abbreviations: MALDI-TOF, matrix-assisted laser desorption/ionization time-of-flight