American Journal of Epidemiology Advance Access originally published online on February 12, 2008
American Journal of Epidemiology 2008 167(8):908-916; doi:10.1093/aje/kwm386
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PRACTICE OF EPIDEMIOLOGY |
Overcoming Ecologic Bias using the Two-Phase Study Design
1 Departments of Statistics and Biostatistics, University of Washington, Seattle, WA
2 Center for Health Studies, Group Health Cooperative of Puget Sound, Seattle, WA
Correspondence to Dr. Jon Wakefield, Box 357232, Department of Biostatistics, University of Washington, Seattle, WA 98195-7232 (e-mail: jonno{at}u.washington.edu).
Received for publication February 1, 2007. Accepted for publication December 7, 2007.
| ABSTRACT |
|---|
|
|
|---|
Ecologic (aggregate) data are widely available and widely utilized in epidemiologic studies. However, ecologic bias, which arises because aggregate data cannot characterize within-group variability in exposure and confounder variables, can only be removed by supplementing ecologic data with individual-level data. Here the authors describe the two-phase study design as a framework for achieving this objective. In phase 1, outcomes are stratified by any combination of area, confounders, and error-prone (or discretized) versions of exposures of interest. Phase 2 data, sampled within each phase 1 stratum, provide accurate measures of exposure and possibly of additional confounders. The phase 1 aggregate-level data provide a high level of statistical power and a cross-classification by which individuals may be efficiently sampled in phase 2. The phase 2 individual-level data then provide a control for ecologic bias by characterizing the within-area variability in exposures and confounders. In this paper, the authors illustrate the two-phase study design by estimating the association between infant mortality and birth weight in several regions of North Carolina for 2000–2004, controlling for gender and race. This example shows that the two-phase design removes ecologic bias and produces gains in efficiency over the use of case-control data alone. The authors discuss the advantages and disadvantages of the approach.
bias (epidemiology); case-control studies; confounding factors (epidemiology); data interpretation, statistical; research design; sampling studies
| INTRODUCTION |
|---|
|
|
|---|
Epidemiologists continue to use ecologic and aggregate data. Despite their known drawbacks, these data, often aggregated across geographic areas, offer the advantages of widespread availability and gains in statistical power from large populations and increased exposure ranges. Data availability and exposure variability often determine the scale of examination and the suitability of a study. Exposures arising from a point or line source offer exposure contrasts on small scales, requiring small-area data; in contrast, dietary variables show little variation across small scales, and consequently international studies are used (1, 2).
In addition to the usual biases that may arise in observational studies, ecologic studies suffer from several biases unique to their design. The primary challenge is that ecologic data alone are generally insufficient to characterize within-area variability in exposures and confounding variables. The collective impact of the various biases that result is often referred to under the umbrella term ecologic bias. When ecologic bias causes a mismatch between conclusions concerning individual-level associations drawn from aggregate and individual-level data, this is known as the ecological fallacy. Many authors have examined the various aspects of ecologic bias (3–9). The only reliable way to characterize within-area variation in exposures and confounders, and hence control ecologic bias, is to collect and incorporate individual-level data. To help epidemiologists achieve this goal, in this paper we describe the use of the two-phase design in an ecologic setting.
To implement a two-phase design, an initial phase 1 cross-classification by the binary disease outcome and stratification variables is required; in phase 2, samples of individuals are drawn from each of the cross-classification cells, with data on additional variables being drawn from the subsamples of individuals (10, 11). Intuitively, the stratified sampling is focused on informative cells, and estimation methods use both phases of data for efficiency and to acknowledge the outcome-dependent sampling. In the simplest ecologic setting, the cross-classification is by outcome and area only, and if area is a surrogate for important risk factors this design will be efficient. We are particularly interested in situations where an initial classification is available by outcome, area, and confounders such as age and gender—this is the case in a semi-ecologic study. Phase 2 may then provide detailed exposure information on a subset of the phase 1 individuals.
To illustrate these methods, we consider infant mortality in the state of North Carolina. For this example, we have access to complete individual-level data, permitting a "gold standard" individual-level analysis. For these data, we construct an ecologic study, implement the two-phase approach, and compare the results with those of the full individual-level analysis.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Infant mortality data
The North Carolina State Center for Health Statistics provides information on vital statistics for all North Carolina residents. Data are available from the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill (http://www.irss.unc.edu/). We considered data from all 100 counties in North Carolina for the period 2000–2004. Over these 5 years, 699,035 infants were born and 5,854 died; across counties, the number of births ranged between 267 and 70,590, and the number of deaths ranged between 2 and 510.
The primary scientific goal of this illustrative study is to estimate the association between infant mortality and birth weight, controlling for gender and race. Of particular interest is potential effect modification of the infant mortality–birth weight association by race. Table 1 provides a cross-tabulation of the data, collapsed across counties, by outcome, gender, race, and low birth weight status (<2,500 g).
|
Individual-level analysis
Consider the following logistic regression model:
|
| (1) |
Table 2 provides estimates from model 1, based on the complete individual-level data. The main effect indicates a strong association between infant mortality and low birth weight among the White babies. Based on the estimate for the interaction term, there is modest evidence of an additional 11 percent increase in risk for non-White low birth weight babies.
|
A simulated ecologic study
Because of the unavailability of individual-level data, ecologic data may be resorted to and may come in a variety of forms. For example, a purely ecologic study would consist of marginal death counts across the 100 counties as well as the marginal proportions of babies who were born male, non-White, and of low birth weight. Alternatively, a semi-ecologic study might consist of the numbers of births and infant deaths cross-classified by gender and race, along with the proportion of babies that were of low birth weight, in each county.
For the North Carolina infant mortality data, we collapsed the counts within the 100 counties to mimic a purely ecologic study. Figures 1 and 2 provide histograms and maps, respectively, of the ecologic data. As expected, the proportion male is tightly clustered around 0.5 across counties, while the proportion non-White varies between 0.002 and 0.380 and the proportion with low birth weight varies between 0.06 and 0.14.
|
|
To illustrate a typical ecologic analysis, let Yk and Nk denote the observed numbers of deaths and births in county k (k = 1, ..., 100). Further, let QWk denote the proportion of low birth weight babies in county k, and let QGk and QRk denote the corresponding proportions of babies that are male and non-White. A typical ecologic analysis might fit the log-linear model:
|
| (2) |
)is the relative risk associated with an all-White area whose newborns are all low birth weight as compared with an all-White area containing no low birth weight infants, with both areas having the same proportion of male births. Hence, the interpretation of exp(β
) resembles more closely that of a contextual effect and is therefore not comparable to the individual-level effect exp(βw). The outcome, Yk, is a count, and to allow for extra-Poisson variability, we fit model 2 using quasi-likelihood (12). Table 2 shows results from a fit in which each of the proportions has been multiplied by 10. That is, each relative risk estimate compares two areas that differ in the corresponding proportion by 10 percent. We see that the ecologic relative risk associated with low birth weight is completely incomparable with the individual-level coefficient. Furthermore, non-White race now appears protective, rather than detrimental as the individual-level analysis suggests. This spurious result provides an example of the ecological fallacy, in which conclusions (here regarding race) based on ecologic data are opposite of those drawn on the basis of individual-level data.
The inherent difficulty in estimating individual-level associations from ecologic data can be illustrated by examining the induced aggregate model. Let Nkgrw denote the number of children in county k and in gender, race, and birth weight categories g, r, and w, respectively. For a rare outcome, model 1 may be approximated by a log-linear model, and aggregation within area k yields
![]() | (3) |
Collecting individual-level data
It is generally well recognized that in order for ecologic data to provide reliable inferences, they need to be supplemented with individual-level data. With a rare outcome, the aggregate data design (2) collects supplemental individual-level survey information on exposures and confounders; intuitively, these provide estimates of the Nkgrw in model 3. An approach that also uses similar data but assumes a parametric form for the within-area distributions and then fits the implied disease risk model has also been suggested (13–15). While these approaches can overcome ecologic bias, they are still ecologic in nature, since there is no linkage between outcome and exposures or confounders at the level of the individual (16).
In the setting of a nonrare outcome, a scheme for combining ecologic data with a series of 2 x 2 tables with simple random samples has been outlined (17). More recently, the parametric aggregate data design has been extended to incorporate prospectively collected information on individuals (18). In this paper, we focus on studies of rare outcomes, and we therefore consider outcome-dependent sampling. Previously we considered case-control sampling within areas (19, 20); here we consider the use of two-phase sampling to obtain individual-level data jointly on both outcomes and exposures/confounders.
Ecologic two-phase studies
Two-phase study designs are a generalization of matched case-control designs in which, initially, the entire sampling population is cross-classified according to case/control status and some stratification variable, S. The latter depends on covariates observed in all individuals and may include exposures of interest, proxy exposure measures, or potential confounders. In settings like those we are considering, such as environmental epidemiology, S may also depend on geographic area, which can act as a surrogate for the totality of confounders associated with each area, as well as provide a well-defined sampling frame for the controls. For example, Flick et al. (21) recently reported results from a case-control study of the association between nonsteroidal antiinflammatory drugs and non-Hodgkin's lymphoma in which the data were matched by county.
Let us assume that S takes on J levels. After the initial cross-classification, the phase 1 data consist of 2J counts, Nij, with i = 0/1 (corresponding to noncase/case status) and with j indexing stratum (j = 1, ..., J). In phase 2, samples of size nij are taken within each of the phase 1 strata, and individual-level measurements, xijk, are taken on these individuals (k = 1, ..., nij; i = 0, 1; j = 1, ..., J). Such individual-level data may include additional covariates not readily available on all subjects and/or accurate measurements for proxy exposures available on all individuals in phase 1 but subject to measurement error or misclassification. We note that the traditional case-control design corresponds to an initial classification based solely on case/control status (and so does not involve S), while a matched case-control design classifies additionally on confounders. Whereas case-control designs ignore the phase 1 data, in a two-phase approach these data are exploited to provide efficiency gains and to enable the estimation of intercepts and interactions, including those involving phase 1 stratification variables (11).
The outcome-dependent nature of the phase 2 sampling must be accounted for when analyzing two-phase data; a number of approaches have been developed (22–27). Software for implementing the methods in an ecologic context (along with the North Carolina data) is available from the first author (http://faculty.washington.edu/jonno/cv.html). Unless otherwise stated, all of the analyses presented here implement full maximum likelihood estimation, which, under correct model specification, provides the most efficient estimates (25). We emphasize that both the phase 1 and phase 2 data are exploited for estimation; further details are provided in the Appendix.
| RESULTS |
|---|
|
|
|---|
Simulation study
To implement a two-phase design, one must initially specify the variables upon which the phase 1 stratification is based and, in particular, define S. In the North Carolina example, we could stratify on area, gender, race, low birth weight, or some combination. Caution is required, though, as too fine a stratification may leave some cells empty, leading to a breakdown of the analysis method. In phase 2, one must decide how to allocate the individual-level samples across these strata (i.e., the nij sample sizes). One recommended choice is that of a balanced scheme, where equal numbers of individuals are sampled across phase 1 strata (23). An alternative is to consider optimal sampling strategies for two-phase sampling, typically tailored to the specific setting (28).
Below we report on a simulation study for which the results were based on 10,000 simulated data sets. In each case, we generated complete individual-level data using the parameter estimates from the full data (table 2). This provided the basis for the phase 1 stratification, from which cases and controls were sampled in phase 2.
Infant mortality data
Discrete birth weight status
We implemented schemes with eight different phase 1 stratifications, as outlined in table 3. With 100 counties, it is not possible to use county (with 100 levels) as a phase 1 stratification variable, since some counties will contain very few cases; further cross-classification will result in zero entries, leading to estimation difficulties. Hence, we constructed 10 regions based on contiguous counties, shown in figure 3, and matched on this new variable.
|
|
Under each scheme, we sampled 500 cases and 500 controls. We adopted a balanced design in which equally sized samples were taken, where possible, across the J strata; when sufficient numbers of cases were not available, noncases were sampled instead. Table 3 gives the percent bias across simulations as compared with the fitting of the individual-level model to the totality of data, as well as the relative efficiency, taken as the ratio of the variance of the two-phase relative risk estimators (across simulations) relative to that of the case-control design. The only difference between conventional case-control and two-phase sampling with phase 1 stratified by outcome only is in the estimation of the intercept, which may be estimated under the two-phase approach; inference for the relative risks is identical. The results in table 3 are presented in terms of relative risk; thus, for example,
r = exp(βr) is the relative risk corresponding to race.
It is apparent from table 3 that since region is only weakly associated with outcome, little is gained by stratifying on region alone. When we stratify results by gender or race or low birth weight, estimation of the corresponding relative risks (including the interaction) improves correspondingly. When we stratify on low birth weight in phase 1, efficiency improves markedly over case-control sampling. For example, the standard error of the relative risk for the interaction,
wr, is 0.53 under case-control sampling and 0.08 under two-phase sampling; analysis of the individual data gives a standard error of 0.07, so the two-phase analysis is almost as efficient as the analysis using the full data, even though it is based on only 500 cases and 500 controls. When we use low birth weight in phase 1, the parameter estimates are unbiased; in particular, the ecologic bias evident in table 2 is eliminated. There is some finite sample bias when we do not use low birth weight in phase 1.
Continuous birth weight
The two-phase ecologic design is particularly useful for investigating the association between health outcomes and environmental exposures. In this context, it may be possible to obtain an estimate of the proportion exposed to high concentrations within each area, perhaps by confounder strata such as age, gender, race, and socioeconomic indices. This may be achieved by first modeling a concentration surface (29) and then using finer geographic information within each area (e.g., census blocks) to estimate the fractions of different demographic groups who are above or below a concentration threshold. In phase 2, one may then sample individuals to obtain more accurate exposure measures at residential addresses. An important aspect is that the proportions exposed (in the phase 1 data) can be error-prone; the benefits of two-phase sampling when a surrogate exposure is available have been demonstrated in other contexts (30). To summarize the approach, we assume the existence of a discrete exposure and stratify by this variable in phase 1, before measuring a continuous version in phase 2.
We consider two situations: the first in which the phase 1 binary exposure is accurate and the second in which it is subject to measurement error. For the latter, we consider a hypothetical situation in which measurement error is added to the low birth weight classification that is used in the phase 1 classification. In particular, we let
|
|
Returning to the North Carolina example, we assume that individual-level associations are again given by model 1, but with Wi now a continuous measure of birth weight. Table 2 provides estimates for this model based on the complete individual-level data.
Again using 500 cases and 500 controls in phase 2, we summarize the percent bias and efficiency over various phase 1 stratifications (with equal samples across the phase 1 stratification) and in situations where both accurate and error-prone measures of low birth weight are available. Results are shown in table 4. The benefits of the two-phase design are again evident; the use of the binary low birth weight information clearly allows efficient estimation by sampling of informative individuals. There is some loss of efficiency when the error-prone phase 1 classification is used, as compared with the error-free classification. However, in this setting no bias is introduced, and it is still clearly worthwhile to use the error-prone version.
|
Further design considerations
While the results of tables 3 and 4 focus on alternative schemes for defining the phase 1 stratification, a variety of other design considerations can be investigated. Table 5 considers two extensions. Scheme A examines how reducing the phase 2 samples from 500 cases and controls to 200 cases and controls affects efficiency. The reductions are not substantial, which suggests that the two-phase design has benefits even when resources for obtaining individual-level data are limited. In scheme B, we return to case/control sizes of 500, but we now sample individuals in proportion to the phase 1 stratum sizes, rather than taking equal numbers (the standard errors for the "Y only" stratification are equal in table 4 and table 5, scheme B). The efficiencies are very similar to those in table 4, suggesting that, in this setting, efficiency is driven primarily by the choice of the variables used to define S, rather than by the specific allocation of samples in phase 2.
|
| DISCUSSION |
|---|
|
|
|---|
In this paper, we have described the two-phase study design as a means of avoiding the numerous and often severe pitfalls associated with the analysis of ecologic and/or aggregate data. The results of our simulation studies point to the benefits associated with combining the two sources of data, in terms of both bias and efficiency. Rather than supplement an ecologic study with individual-level data, it may be of interest to combine existing individual-level data with external group-level data. Strategies that combine both types of data have been shown to alleviate participation bias and improve efficiency in case-control studies with missing data (31).
When designing a two-phase study, a variety of choices must be made, including the variables which form the basis of the phase 1 stratification, the total numbers of cases and controls sampled in phase 2, and the way in which resources are allocated across phase 1 strata. It is clear that important variables should be used as a basis for the phase 1 stratification. Typically, however, one will not know the appropriate individual-level model and an educated guess will be required. While choosing a nonoptimal set of stratification variables reduces efficiency, the ability of the two-phase design to help overcome ecologic bias is not affected.
An example of an ecologic study for which we believe two-phase sampling could be particularly useful is a study of the association between death from myocardial infarction and magnesium in domestic water in northwestern England (32). In this study carried out by Maheswaren et al. (32), ecologic-level magnesium concentrations were measured in the domestic water supply, with an average of six measurements being taken per water zone (containing up to 50,000 people). The study did not provide evidence to support the protective hypothesis. The main ecologic-bias difficulties arising here were due to the within-zone variability in magnesium levels and confounding factors, particularly socioeconomic status and the water constituents fluoride, calcium, and lead, and the inability to characterize the within-area distribution of magnesium levels across all age, gender, and socioeconomic status strata. In general, the relative risks due to environmental exposures will be in the range of 1.2–1.5 (33), making control for confounding particularly important. For the magnesium example, a two-phase study would sample individual cases and noncases with the potential strata water zone, gender, age, socioeconomic status, and a categorical version of exposure based on measurements taken initially (or on historic data). Magnesium concentrations could be sampled at selected case/noncase residences and be augmented with information on confounding water constituents and individual-level confounders such as smoking. Within the two-phase framework, information on multiple exposures could be collected in phase 2 and incorporated into a single disease model. To fully characterize the joint distribution of exposures and confounders, larger phase 2 sample sizes will be required, particularly if the exposures are highly correlated.
A number of methods have been proposed for combining ecologic- and individual-level data, and our method builds on these approaches. The aggregate data method (2) does not stratify in phase 1, either by outcome or by stratum, although the latter would be possible via the inclusion of stratum-specific intercepts. In a semi-ecologic study, an ecologic exposure is combined with individual-level outcomes and confounders. A two-phase approach is particularly useful for such a study, with the phase 2 data corresponding to stratified sampling of individual exposures.
We have presented the two-phase approach from the perspective of supplementing available ecologic data with individual-level data, but it is also feasible to start with population-based matched case-control data and then add ecologic data, perhaps from the Census Bureau and a disease registry. Thinking of the design in this way emphasizes that the phase 1 and phase 2 data must be comparable; this is straightforward to think about statistically, but in any application it will be complex and require great care. Clearly a population-based case-control study is more amenable to the two-phase design than is a hospital-based study, since the geographic catchment area of the latter will be difficult to determine. An existing cohort provides an alternative sampling frame. It is now common practice to embed case-control studies within a larger cohort. For example, the multicountry European Prospective Investigation into Cancer and Nutrition (1) has provided a population from which numerous case-control studies have been constructed. With two-phase methodology, it is possible to use the data from the complete cohort to inform confounder relations, particularly confounding by geographic area.
| APPENDIX |
|---|
|
|
|---|
The likelihood for the two-phase design consists of two components for the phase 1 and phase 2 data, respectively. Following the notation of Breslow and Holubkov (25), the likelihood may be written as
![]() |
| ACKNOWLEDGMENTS |
|---|
This research was supported by grants R01 CA095994 and R01 CA125081 from the National Institutes of Health.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Riboli E. Nutrition and cancer: background and rationale of the European Perspective Investigation into Cancer and Nutrition (EPIC). Ann Oncol (1992) 3:783–91.
[Abstract/Free Full Text] - Prentice RL, Sheppard L. Aggregate data studies of disease risk factors. Biometrika (1995) 82:113–25.
[Abstract/Free Full Text] - Morgenstern H. Ecologic study. In: Encyclopedia of biostatistics—Armitage P, Colton T, eds. (1998) 2. New York, NY: John Wiley and Sons, Inc. 1255–76.
- Piantadosi S, Byar DP, Green SB. The ecological fallacy. Am J Epidemiol (1988) 127:893–904.
[Free Full Text] - Greenland S, Morgenstern H. Ecological bias, confounding and effect modification. Int J Epidemiol (1989) 18:269–74.
[Abstract/Free Full Text] - Greenland S. Divergent biases in ecologic and individual level studies. Stat Med (1992) 11:1209–23.[Web of Science][Medline]
- Greenland S, Robins J. Invited commentary: ecologic studies—biases, misconceptions and counterexamples. Am J Epidemiol (1994) 139:747–60.
[Abstract/Free Full Text] - Richardson S, Montfort C. Ecological correlation studies. In: Spatial epidemiology: methods and applications—Elliott P, Wakefield JC, Best NG, et al, eds. (2000) New York, NY: Oxford University Press. 205–20.
- Wakefield JC. Sensitivity analyses for ecological regression. Biometrics (2003) 59:9–17.[CrossRef][Web of Science][Medline]
- White JE. A two stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol (1982) 115:119–28.
[Abstract/Free Full Text] - Weinberg CR, Wacholder S. The design and analysis of case-control studies with biased sampling. Biometrics (1990) 46:963–75.[CrossRef][Web of Science][Medline]
- McCullagh P, Nelder JA. Generalized linear models. (1989) 2nd ed. London, United Kingdom: Chapman and Hall Ltd.
- Richardson S, Stucker I, Hémon D. Comparison of relative risks obtained in ecological and individual studies: some methodological considerations. Int J Epidemiol (1987) 16:111–20.
[Abstract/Free Full Text] - Wakefield JC, Salway RE. A statistical framework for ecological and aggregate studies. J R Stat Soc Ser A (2001) 164:119–37.[CrossRef]
- Best N, Cockings S, Bennett J, et al. Ecological regression analysis of environmental benzene exposure and childhood leukemia: sensitivity to data inaccuracies, geographical scale and ecological bias. J R Stat Soc Ser A (2001) 164:155–74.[CrossRef]
- Sheppard L. Insights on bias and information in group-level studies. Biostatistics (2003) 4:265–78.[Abstract]
- Wakefield J. Ecological inference for 2 x 2 tables (with discussion). J R Stat Soc A (2004) 167:385–445.
- Jackson S, Best N, Richardson S. Improving ecological inference using individual-level data. Stat Med (2006) 25:2136–59.[CrossRef][Web of Science][Medline]
- Haneuse SJ, Wakefield J. Hierarchical models for combining ecological and case-control data. Biometrics (2007) 63:128–36.[CrossRef][Web of Science][Medline]
- Haneuse SJ, Wakefield J. The combination of ecological and case-control data. J R Stat Soc B (2007) 70:73–93.
- Flick ED, Chan KA, Bracci PM, et al. Use of nonsteroidal antiinflammatory drugs and non-Hodgkin lymphoma: a population-based case-control study. Am J Epidemiol (2006) 164:497–504.
[Abstract/Free Full Text] - Cain KC, Breslow NE. Logistic regression analysis and efficient design for two-stage studies. Am J Epidemiol (1988) 128:1198–206.
[Free Full Text] - Breslow NE, Cain KC. Logistic regression for two-stage case-control data. Biometrika (1988) 75:11–20.
[Abstract/Free Full Text] - Flanders W, Greenland S. Analytic methods for two-stage case-control studies and other stratified designs. Stat Med (1991) 10:739–47.[Web of Science][Medline]
- Breslow NE, Holubkov R. Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J R Stat Soc Ser B (1997) 59:447–61.[CrossRef]
- Breslow NE, Holubkov R. Weighted likelihood, pseudo likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. Stat Med (1997) 16:103–16.[CrossRef][Web of Science][Medline]
- Scott AJ, Wild CJ. Fitting regression models to case-control data by maximum likelihood. Biometrika (1997) 51:54–71.
- Reilly M. Optimal sampling strategies for two-stage studies. Am J Epidemiol (1996) 143:92–100.
[Abstract/Free Full Text] - Jerrett M, Afrain A, Kanaroglou P, et al. A review and evaluation of intraurban air pollution exposure. J Expo Anal Environ Epidemiol (2005) 15:185–204.[CrossRef][Web of Science][Medline]
- Breslow NE, Chatterjee N. Design and analysis of two-phase studies with binary outcome applied to Wilms tumor prognosis. Appl Stat (1999) 48:457–68.
- Stromberg U, Bjork J. Incorporating group-level exposure information in case-control studies with missing data on dichotomous exposures. Epidemiology (2004) 15:494–503.[CrossRef][Web of Science][Medline]
- Maheswaren R, Morris S, Falconer S, et al. Magnesium in drinking water supplies and mortality from acute myocardial infarction in north west England. Heart (1999) 82:455–60.
[Abstract/Free Full Text] - Pekkanen J, Pearce N. Environmental epidemiology: challenges and opportunities. Environ Health Perspect (2001) 109:1–5.[Web of Science][Medline]
This article has been cited by other articles:
![]() |
J. Wakefield Multi-level modelling, the ecologic fallacy, and hybrid study designs Int. J. Epidemiol., April 1, 2009; 38(2): 330 - 336. [Full Text] [PDF] |
||||
![]() |
C. Infante-Rivard CHEMICAL RISK FACTORS AND CHILDHOOD LEUKAEMIA: A REVIEW OF RECENT STUDIES Radiat Prot Dosimetry, December 2, 2008; (2008) ncn292v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






