Skip Navigation


American Journal of Epidemiology Advance Access originally published online on June 29, 2006
American Journal of Epidemiology 2006 164(4):400-401; doi:10.1093/aje/kwj235
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
164/4/400    most recent
kwj235v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Radespiel-Tröger, M.
Right arrow Articles by Gefeller, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Radespiel-Tröger, M.
Right arrow Articles by Gefeller, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology Copyright © 2006 by the Johns Hopkins Bloomberg School of Public Health All rights reserved; printed in U.S.A.

Letter to the Editor

RE: "APPLYING RECURSIVE PARTITIONING TO A PROSPECTIVE STUDY OF FACTORS ASSOCIATED WITH ADHERENCE TO MAMMOGRAPHY SCREENING GUIDELINES"

Martin Radespiel-Tröger1,2, Torsten Hothorn1, Annette B. Pfahlberg1 and Olaf Gefeller1

1 Department of Medical Informatics, Biometry and Epidemiology, University of Erlangen-Nuremberg, D-91054 Erlangen, Germany
2 Population-Based Cancer Registry Bavaria, University of Erlangen-Nuremberg, D-91054 Erlangen, Germany

(e-mail: gefeller{at}rzmail.uni-erlangen.de)

In a recent Journal article, Calvocoressi et al. (1Go) presented an application of the recursive partitioning methodology to identify factors associated with adherence to mammography screening guidelines. Since mammography screening promises to be a valuable way to reduce breast cancer mortality given that a high proportion of women are screened, it appears to be extremely useful to characterize subgroups of women who differ from others with regard to screening participation in order to target specific campaigns to these groups to raise compliance. Recursive partitioning has been established as a means to split a sample into subgroups that ought to be homogeneous within the group but heterogeneous between the groups. However, a number of pitfalls have been identified in connection with such tree-based approaches, and we are concerned that some of these methodological problems might affect the interpretations drawn from these authors' analysis.

Most importantly, the authors do not report an unbiased estimate of the overall misclassification error, that is, the proportion of women that, according to the tree shown in figure 1 (1Go, p. 1220), would fail to classify correctly in a practical setting. Fortunately, the reader is able to estimate this proportion (34 percent) on the basis of figure 1. However, it should be kept in mind that any misclassification rate estimated from the sample used to construct the tree (the "learning sample") tends to be biased downward (2Go). Therefore, as long as no validation sample is available, such an estimate should be based on some form of resampling technique, either the bootstrap or cross-validation (2Go). Only then can one discuss whether a classification scheme, such as the one shown in figure 1, is helpful to plan campaigns and to predict the potential benefits from a public health care perspective.

Virtually every research paper dealing with recursive partitioning advocates its application because of the nice "interpretability" of plots depicting the tree structure. Nevertheless, we should keep in mind that trees are built upon locally optimal decisions—the breakpoint estimation in every node—but that we might end up with many different trees that share the same global properties, that is, prediction errors. Furthermore, the specific algorithm applied by Calvocoressi et al. (1Go) is known to prefer splits in continuous covariates offering many splits compared with, say, binary covariates (3Go, 4Go). The authors circumvent this problem somewhat by precategorizing continuous covariates (such as age or annual family income); however, choosing an alternative categorization scheme would almost surely lead to a completely different tree.

Furthermore, it has been learned from resampling-based empirical studies that tree structures are not as stable as one might hope (5Go–7Go). The covariate (and its breakpoint) selected for splitting a node may have a strong competitor that fails to be selected by only a very small margin, thereby rendering the tree structure somewhat arbitrary. If a strong competing split exists, trees tend to change their appearance markedly if only a small number of individuals are added to or withdrawn from the sample. Thus, it is worthwhile to examine competing splits, which was not done by Calvocoressi et al. (1Go).

In summary, we think that the analysis presented by Calvocoressi et al. (1Go) provides only a first step on the long road of model selection and evaluation to identify the true set of factors and their interrelation to predict screening compliance. The authors' interpretations are compromised by methodological problems, more or less inherent in every recursive partitioning algorithm. Classical classification and regression trees, as well as unbiased modern variants (3Go–9Go) and logistic regression models, need to be compared in order to find a stable and practically helpful model. In our opinion, the value of the statistical analysis of Calvocoressi et al. can be substantially enhanced if one goes beyond the application of one prepackaged recursive partitioning algorithm (10Go) and deals adequately with the above-mentioned methodological issues.

ACKNOWLEDGMENTS

Conflict of interest: none declared.

References

  1. Calvocoressi L, Stolar M, Kasl SV, et al. Applying recursive partitioning to a prospective study of factors associated with adherence to mammography screening guidelines. Am J Epidemiol 2005;162:1215–24.[Abstract/Free Full Text]
  2. Efron B, Tibshirani R. Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 1997;92:548–60.[CrossRef][Web of Science]
  3. Hothorn T, Lausen B. On the exact distribution of maximally selected rank statistics. Computat Stat Data Anal 2003;43:121–37.
  4. Lausen B, Hothorn T, Bretz F, et al. Assessment of optimal selected prognostic factors. Biomet J 2004;46:364–74.[CrossRef]
  5. Radespiel-Tröger M, Rabenstein T, Schneider HT, et al. Comparison of tree-based methods for prognostic stratification of survival data. Artif Intell Med 2003;28:323–41.[CrossRef][Web of Science][Medline]
  6. Radespiel-Tröger M, Gefeller O, Rabenstein T, et al. Association between split selection instability and predictive error in survival trees. Methods Inf Med (in press).
  7. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graphical Stat (in press).
  8. Hothorn T, Hornik K, Zeileis A. Party—a laboratory for recursive part(y)itioning. R package version 0.3-8, 2006. (http://cran.r-project.org/).
  9. Chan KY, Loh WY. LOTUS: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graphical Stat 2004;13:826–52.[CrossRef]
  10. Steinberg D, Colla P. CART—classification and regression trees. San Diego, CA: Salford Systems, 1997.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am J EpidemiolHome page
L. Calvocoressi, M. Stolar, S. V. Kasl, E. B. Claus, and B. A. Jones
THE AUTHORS REPLY
Am. J. Epidemiol., August 15, 2006; 164(4): 401 - 402.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
164/4/400    most recent
kwj235v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Radespiel-Tröger, M.
Right arrow Articles by Gefeller, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Radespiel-Tröger, M.
Right arrow Articles by Gefeller, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?