Copyright © 2004 by the Johns Hopkins Bloomberg School of Public Health
LETTERS TO THE EDITOR |
THE FIRST AUTHOR REPLIES
Department of Biostatistics, School of Public Health, University of North Carolina, Chapel Hill, NC 27599-7420
I thank Chaix et al. (1) for their observations on issues discussed in our article (2) and agree with them that multilevel logistic models have an advantage over generalized estimating equation methods for modeling variations in health or health-related behavior between areas by providing estimates for the extent to which area-level variations can be explained by a given set of individual- and area-level factors. Consideration of the "interpretability of the different model-based indexes of clustering" (1, p. 505) is indeed important, but some distinctions attributed to specific measures of association may be further clarified in the contrast of multilevel versus population-averaged models, generally, that is, irrespective of the association measure used.
Indeed, there are problems with the intraclass correlation coefficient from multilevel logistic models. Although it may be generally true that the several methods for estimating the intraclass correlation coefficient "are in reasonable agreement" (1, p. 505), this may not be the case when predicted probabilities are close to zero or one as in the surveillance study of green tobacco sickness (2, 3). In addition, we agree that the main limitation of multilevel models is that the value of the intraclass correlation coefficient "depends on the prevalence of the phenomenon in the sample" (1, p. 505). In contrast, it has been noted, the alternating logistic regression pairwise odds ratio has the advantage over the intraclass correlation coefficient (of the multilevel model) in that its value is not constrained by the prevalence. More broadly, this is a relative advantage of generalized estimating equation methods (including alternating logistic regression) over the multilevel (random effects) approach. To underscore that the distinction is broader than that of intraclass correlation coefficient versus pairwise odds ratio, we note that the former can be modeled with generalized estimating equation methods (4).
Indeed, despite problems with the intraclass correlation coefficient of multilevel logistic models, "it may be useful for some analysis cases to compute this index to obtain information on the relative weight of the variations at each level" (1, p. 505). Although this can be helpful, interpretations may be complicated because of the implicit feature that, given the structure of a specific multilevel model, the intraclass correlation coefficient corresponding to the binary scale of the outcome may vary markedly across within-cluster observation pairs over an entire data set or design space of interest. Indeed, the pattern among the collection of intraclass correlation coefficients is not managed in the multilevel analysis, but rather it is a byproduct of covariate values and estimates of parameters including variances that are explicitly represented in the multilevel model. Simulation is one way to obtain the intraclass correlation coefficients of interest, thereby circumventing the complex relation between model parameters and covariates with intraclass correlation coefficients (3). Generally, however, while the multilevel model approach provides estimates of the intraclass correlation coefficient, it is not well suited to direct modeling of the intraclass correlation coefficient.
The advantage of generalized estimating equation methods is that association measures, such as the pairwise odds ratio and the intraclass correlation coefficient, can be modeled without such severe constraints. This is because models for prevalences are separate from models for associations in the generalized estimating equation approach, whereas the model for the association is implicit in the model for the mean in the multilevel approach. Therefore, we contend that such generalized estimating equation approaches as alternating logistic regression are useful for providing detail about association structure, not just "general information" (1, p. 505). This is because they permit direct control over model specification for pairwise odds ratios or intraclass correlation coefficients with the capacity to achieve variance reduction through modeling, which simulation does not provide. The trade-off (and reason why multilevel models are preferred in many cases) is that separate modeling of prevalence and association in the generalized estimating equation approach forgoes the ability to partition and explain variance, often a primary goal in "contextual analyses." Table 1 summarizes some key features of multilevel versus generalized estimating equation-based logistic models.
|
REFERENCES
REFERENCES
- Chaix B, Bobashev G, Merlo J, et al. Re: "Detecting patterns of occupational illness clustering with alternating logistic regressions applied to longitudinal data." (Letter). Am J Epidemiol 2004;160:5056.
[Free Full Text] - Preisser JS, Arcury TA, Quandt SA. Detecting patterns of occupational illness clustering with alternating logistic regressions applied to longitudinal data. Am J Epidemiol 2003;158:495501.
[Abstract/Free Full Text] - Goldstein H, Browne W, Rasbash J. Partitioning variation in multilevel models. London, United Kingdom: Institute of Education, 2002. (http://www.mlwin.com/hgpersonal/ Variance-partitioning.pdf).
- Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics 1988;44:103348. [CrossRef][ISI][Medline]
This article has been cited by other articles:
![]() |
M. Cerda, B. N. Sanchez, S. Galea, M. Tracy, and S. L. Buka Estimating Co-Occurring Behavioral Trajectories Within a Neighborhood Context: A Case Study of Multivariate Transition Models for Clustered Data Am. J. Epidemiol., November 15, 2008; 168(10): 1190 - 1203. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
