Skip Navigation


American Journal of Epidemiology Advance Access originally published online on October 3, 2008
American Journal of Epidemiology 2008 168(10):1204-1210; doi:10.1093/aje/kwn236
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Appendix
Right arrow All Versions of this Article:
168/10/1204    most recent
kwn236v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Cessie, S. l.
Right arrow Articles by van Houwelingen, H. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cessie, S. l.
Right arrow Articles by van Houwelingen, H. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology © The Author 2008. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

PRACTICE OF EPIDEMIOLOGY

Combining Matched and Unmatched Control Groups in Case-Control Studies

Saskia le Cessie, Nico Nagelkerke, Frits R. Rosendaal, Karlijn J. van Stralen, Elisabeth R. Pomp and Hans C. van Houwelingen

Correspondence to Dr. Saskia le Cessie, Department of Medical Statistics and Bioinformatics, S5-P, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, the Netherlands (e-mail: cessie{at}lumc.nl).

Received for publication February 8, 2008. Accepted for publication July 14, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
Multiple control groups in case-control studies are used to control for different sources of confounding. For example, cases can be contrasted with matched controls to adjust for multiple genetic or unknown lifestyle factors and simultaneously contrasted with an unmatched population-based control group. Inclusion of different control groups for a single exposure analysis yields several estimates of the odds ratio, all using only part of the data. Here the authors introduce an easy way to combine odds ratios from several case-control analyses with the same cases. The approach is based upon methods used for meta-analysis but takes into account the fact that the same cases are used and that the estimated odds ratios are therefore correlated. Two ways of estimating this correlation are discussed: sandwich methodology and the bootstrap. Confidence intervals for the pooled estimates and a test for checking whether the odds ratios in the separate case-control studies differ significantly are derived. The performance of the method is studied by simulation and by applying the methods to a large study on risk factors for thrombosis, the MEGA Study (1999–2004), wherein cases with first venous thrombosis were included with a matched control group of partners and an unmatched population-based control group.

bootstrap; case-control studies; control groups; matching; sandwich estimator; venous thrombosis


Abbreviations: MEGA, Multiple Environmental and Genetic Assessment of Risk Factors for Venous Thrombosis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
In case-control studies, it is not uncommon to include cases and several control groups. In this way, the effect of risk factors can be studied, controlling for different sources of possible bias. For example, cases can be compared with family controls, to adjust for unmeasured genetic factors; with household controls, to reduce the confounding effect of unidentified environmental factors; and also with a random sample of the population. Each control group yields an odds ratio estimate, and these may vary due to chance or due to some uncontrolled confounding variable(s). For instance, observing a different effect using family controls suggests hidden genetic confounding variables.

Stratified or matched control groups are useful if potential confounders are difficult to measure (e.g., genome, lifestyle) or if there are many different strata (e.g., professions). If controls are matched to cases, it is necessary to analyze these data using stratified analytical methods like Mantel-Haenszel estimation or conditional logistic regression. Because of the matching, the exposure distribution in the controls can differ from the exposure in the population. If the matching is associated with the exposure, ignoring the matching generally induces biased estimates even if the matching factors are unrelated to the outcome (see Breslow and Day (1), chapter 7, and Rothman and Greenland (2), chapter 10). For example, smoking is a risk factor for thrombosis, and partners often have similar smoking habits. Therefore, the prevalence of smoking in the matched control group is higher than that in the unmatched control group, yielding biased odds ratios if matching is ignored. Hence, it is incorrect to pool data from several control groups together in 1 large data set and analyze these data with unconditional logistic regression.

An example of a study with 2 control groups is the Multiple Environmental and Genetic Assessment of Risk Factors for Venous Thrombosis (MEGA) Study (3). In this study, the cases were patients with a first diagnosis of venous thrombosis who were enrolled consecutively between March 1999 and September 2004. Two control groups were selected. Partners of the patients were used as an individually matched control group. A second population-based control group with the same age and sex distribution as the cases was acquired using random digit dialing.

In the MEGA Study, 2 separate analyses were performed to relate the occurrence of thrombosis to risk factors, like smoking (3). The cases were compared with their partners in a matched case-control analysis using conditional logistic regression. In the second analysis the cases were compared with the population-based controls, using unconditional logistic regression with adjustment for possible confounders, including age and sex.

In both analyses, odds ratios were estimated using only part of the data. Conditional logistic regression ignores the unmatched cases and controls, while the matched partners are not used in unconditional logistic regression. If the 2 sets of odds ratios measure the same quantity—that is, given "infinite" samples, they would be identical—we could combine all data and obtain 1 overall, more efficient estimate.

The need to combine data from different sources arises in many different contexts. Specific settings have been addressed previously—for example, pooling studies with unrelated cases and controls (4), conducting matched case-control studies with missing exposure data (5), and combining matched and unmatched controls to estimate gene-environment interactions (6, 7).

Combining all data and performing 1 overall analysis could be done by writing down the full likelihood of the data. However, this is complicated by the different sampling mechanisms used for the matched and unmatched subjects. Instead, a simple approach is introduced, where the estimates of the odds ratios from the analyses are pooled. This can be seen as performing a small meta-analysis, combining the 2 estimates from the separate analyses. A complicating factor, however, is that the same cases are used twice, which means that the estimated odds ratios in the separate analyses are correlated.

Below, we show how this correlation can be estimated and how an efficient combined estimate of the odds ratio can be obtained.


    POOLING TWO CORRELATED ESTIMATES
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
We consider the situation with 2 correlated estimates of the same parameter which we want to combine into 1 overall estimate. This is the situation in the MEGA Study, where 1 set of log odds ratios is obtained by conditional logistic regression using the matched pairs, while another set of log odds ratios is obtained from the unconditional logistic regression analysis. Here we assume that the correlation between the estimates is known; in the next section, we discuss how this correlation can be estimated.

The simplest case is 1 risk factor, with both models estimating the same odds ratio, without any confounders—for example, when the effect of a genetic variant is assessed. Here the estimated log odds ratios can be simply averaged. If one of the log odds ratios is more precisely estimated—for example, because of a larger sample size for one of the control groups—a more accurate pooled estimate is obtained by taking a weighted average. The weighting then should take the standard errors of the 2 estimates into account. This can be done as follows.

Let β be the logarithm of the odds ratio. Let Formula and Formula be the 2 estimates of β, with standard errors SE1 and SE2, and with {rho} the correlation between the 2 estimates. The weighted average of the 2 estimates can be written as

Formula (1)
The optimal choice for the weight w is the one which minimizes the standard error of the pooled estimate Formula. The standard error of this estimate is

Formula (2)
This expression can be minimized with respect to w by taking the first derivative of equation 2. This yields

Formula (3)
as most efficient choice of w.

Often there are confounding variables. These confounders and their effects can differ in the separate analyses. For example, in the MEGA Study, where the investigators’ interest was in the effect of smoking, partners are more similar regarding lifestyles than unmatched controls, which could result in different effects of lifestyle confounders in the matched analysis. The effect of the risk factor of interest after adjustment for the confounders is assumed to be equal in the 2 analyses.

This situation can be written down formally. To simplify the formulas, we use matrix notation. Let {theta}1 be the vector of log odds ratios of the first case-control analysis. This vector can be written as {theta}1 = ({alpha}1, β), with {alpha}1 being the vector containing the log odds ratios of the confounding variables and β being the log odds ratio of interest. Similarly, {theta}2 = ({alpha}2, β) is the vector of log odds ratios in the second analysis. The log odds ratio of interest β is shared by both models, while the log odds ratios for the confounders {alpha}1 and {alpha}2 are not shared. Only the estimates of β should be pooled. This pooling and calculation of standard errors can be done in the same way as described without confounders.

So far, we have considered only 1 exposure variable of interest, with 1 pooled odds ratio. We now consider the situation where more odds ratios are pooled simultaneously. This can occur if the exposure is categorical and several dummy variables are used to model its effect, or when the joint effect of 2 or more exposures is studied. Now β is a vector of log odds ratios. Let Formula be the vector of estimates from the first analysis and Formula be the vector of estimates from the second analysis. Suppose for a moment that we know the precision of the estimates, with Formula and Formula. We also assume that the covariance matrix between the 2 estimated vectors of parameters is known: Formula.

Again the estimates should be pooled in the most efficient way. In the situation with 1 β, weights minimizing the standard error of the pooled estimate were used. One can show that this is equivalent to minimizing the weighted squared distance between the observed Formula and the pooled β, weighting by the inverse of the covariance matrix of Formula and Formula. This can be readily extended to more β’s, considering the following regression model:

Formula
with Ik being the k-dimensional identity matrix and k being the dimension of β. The most efficient estimate of β is the weighted least-squares estimate:

Formula
with Formula. If β is 1-dimensional, C11 = Formula, C22 = Formula, C12 = SE1SE2{rho}, and Ik = 1. Standard matrix calculus shows that this expression for Formula then equals equation 1.

Standard errors and confidence intervals for the pooled β’s can be derived from the covariance matrix of Formula. Standard calculations show that this matrix equals

Formula

In practice, the matrices C11, C12, and C22 are unknown. Maximum likelihood estimation automatically yields estimates for C11 and C22, and most software packages calculate them routinely. The estimation of C12 is discussed below.


    ESTIMATING THE CORRELATION BETWEEN THE ESTIMATES
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
The correlation between the estimated log odds ratios Formula and Formula is needed to obtain the optimal pooled estimate with correct standard errors and confidence intervals. Since the joint distribution of the obtained data is unknown, robust methods are needed to estimate the correlation. We consider 2 ways to obtain robust estimates. The first is the "sandwich" estimator, which requires only that the mean response be correctly specified. It is often used in the field of longitudinal data when parameters are estimated with generalized estimating equations (8). An overview is given by Freedman (9).

Bootstrapping is another robust way to estimate the correlation. Here the joint distribution of Formula and Formula is estimated by repeatedly drawing samples with replacement from the original data set. Each resampled data set yields new estimates for β1 and β2. The correlation between these bootstrapped estimates is then an (asymptotically unbiased) estimate of the correlation between Formula in the original data set, if the mean response is modeled correctly. Below we provide details on both methods.

Sandwich estimate
To employ the sandwich method in our situation, we assume that in both analyses maximum likelihood is used to estimate the log odds ratios. Let l1({theta}) and l2({theta}) be the log-likelihood functions for both analyses, and let Formula and Formula be the corresponding maximum likelihood estimates. The log-likelihood functions can be written as the sum of independent parts: Formula and Formula. In the MEGA Study, conditional logistic regression is used in the first case-control analysis, with summation over the matched pairs, while for unconditional logistic regression, summing is done over all cases and unmatched controls. Let U1({theta}) = {partial}l1({theta})/{partial}{theta} be the score function of the first likelihood—that is, the first derivative of the log likelihood with respect to {theta}—and let U2({theta}) = {partial}l2({theta})/{partial}{theta} be the score function of the second likelihood function. We can write Formula and similarly for U2({theta}). Furthermore, let I1({theta}) and I2({theta}) be the negative Hessian matrix, with I1({theta}) = Formula({theta})/{partial}{theta}2 and I2({theta}) = Formula({theta})/{partial}{theta}2. Details for the MEGA Study are given in the Web Appendix, which is posted on the Journal’s website (http://aje.oxfordjournals.org/).

By Taylor expansion, one can approximate Formula by Formula and Formula. We use this to estimate the covariance between Formula and Formula:

Formula
The cross-product E[U1i({theta})U2j({theta})T] is 0 for any i and j, unless terms of the same subject are in U1i ({theta}) and U2j ({theta}). In our situation, only the matched cases are used in both the matched and unmatched analysis and occur in both U1i({theta}) and U2j({theta}). Hence,

Formula
where M is the set of matched cases.

We estimate this covariance matrix by means of a sandwich estimator, replacing the expected score products by the observed ones. Hence, Formula is replaced by the observed cross-product between the score terms, Formula, and for I1({theta}) and I2({theta}), the observed negative Hessian matrix in Formula and Formula, respectively, is used. This yields a sandwich estimate of the covariance matrix between the estimates:

Formula (4)
From this the covariance between Formula and Formula is directly estimated.

Bootstrap
Bootstrapping is an alternative way to estimate the correlation between estimates (10). By repeatedly sampling with replacement from the original data set, a collection of new data sets is obtained. In each of the resampled data sets, both case-control analyses can be repeated, yielding 2 sets of estimates. The correlation between these bootstrapped estimates is then an approximation of the correlation between Formula in the original data set.

The manner of resampling should mimic the selection of the controls in the original data set. In the MEGA Study, the total number of cases was fixed, with matched controls being available for part of the cases, and a fixed number of unmatched controls were selected separately. Therefore, separate resamples are drawn from the cases and from the unmatched controls, and a matched control is sampled only when the corresponding case is sampled. This means that the fraction of matched pairs varies among bootstrap samples.

Some technical aspects
The covariance matrix C12 between Formula and Formula is estimated with sandwich or bootstrap methods, while the matrices C11 and C22 are estimated using standard maximum likelihood methods. Therefore, the estimate of

Formula
could occasionally not be a valid covariance matrix, because it is not assured to be positive-definite. In this case, the pooled estimate of β can be grossly wrong. A solution is to also estimate C11 and C22 using the sandwich or bootstrap method. This yields a positive-definite estimate of C, but the obtained standard errors of Formula and Formula can differ from those of the separate analyses. An alternative estimate is USUT, where S is the sandwich or bootstrap estimate of the total matrix C and Formula, with Formula and Formula being the estimated matrices from the separate analyses. This matrix is always positive-definite and yields the same standard errors as the separate analyses. Checking whether the estimate of C is positive-definite can be done by means of its eigenvalues. If any of the eigenvalues are negative, the matrix USUT can be used instead to obtain a positive-definite covariance matrix.


    TESTING WHETHER THE ODDS RATIOS FROM BOTH ANALYSES ARE EQUAL
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
In our analysis, we pool Formula and Formula. This is appropriate if the same odds ratio is estimated in both analyses. In the MEGA Study, the partner controls are matched on several factors, including environmental and lifestyle factors. In the unconditional logistic regression, some of these factors can be included as covariates. When it is possible to include all matching factors, unconditional and conditional logistic regression estimate the same odds ratio (1). Systematic differences indicate that there are unmeasured risk factors that are implicitly controlled in the matched analysis but not in the unconditional logistic regression.

We can calculate confidence intervals for the difference between the odds ratios on the log scale and transform them back to the original scale. The observed difference between the 2 estimates Formula has variance equal to var({Delta}) = C11C12Formula + C22. This can be used to construct confidence intervals for β1β2. To formally test the hypothesis of no systematic difference (β1 = β2), the test statistic Formula can be used, the distribution of which under the null hypothesis can be approximated by a chi-squared distribution with k df. One could also use this to determine specifically whether one of the components of β1 β2 differed from 0.


    SIMULATION
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
To evaluate the performance of the pooling procedure, we conducted a simulation study. We simulated a large cohort from a logistic model with 2 covariates. The first variable, x1, was a categorical covariate indicating whether an individual was exposed (x1 = 1) or not (x1 = 0). The exposure probability was set to 0.5. A second variable, x2, was drawn from a uniform distribution and was used as a matching variable. Pairs of observations with the same value of x2 were generated.

The outcome y was generated from a logistic model logit (Pr(Y = 1)) = {alpha} + βx1 + {gamma}x2, with {alpha} = –7, β = 1, and {gamma} = 2. This corresponds to logits between –7 and –4 and probabilities on Y = 1 between 9 x 10–4 and 0.017. From the large cohort, we drew 100 pairs with discordant outcomes: the matched case-control sample. A random set of 100 controls (observations with y = 0) was also drawn.

In each of 1,000 simulations, the cases were compared with the matched controls using conditional logistic regression and with the unmatched controls using unconditional logistic regression. In the matched analysis, only x1 was used as a covariate, because x2 is constant within each pair and {gamma} is not estimable. The unconditional logistic regression analysis used x1 and x2 as covariates. The covariance between Formula and Formula was estimated using the sandwich estimator and by bootstrapping with 500 bootstrap resamples. Table 1 shows the results.


View this table:
[in this window]
[in a new window]

 
Table 1. Results From a Simulation Study With 100 Matched Pairs and 100 Unmatched Controlsa

 
Both the sandwich and the bootstrap correlation were often close to the observed correlation of 0.501. The sandwich correlation was slightly more variable (SE = 0.084 vs. SE = 0.069), but this did not influence the estimates of the pooled β, which were nearly always very similar, with means close to the true value 1. The last 2 columns of Table 1 show that the standard error of Formula was estimated well. Calculated 95% confidence intervals contained the true parameter in 95.6% of the cases with the sandwich method and 95.7% of the cases with the bootstrap method. Ignoring the correlation between the estimates yielded standard errors that were too small in almost all simulations (median SE, 0.225; p5 = 0.208, p95 = 0.251).

In 3 of the 1,000 simulations, the bootstrap yielded an estimate of

Formula
which was not positive-definite. On these occasions, the pooled Formula was widely off the true value 1, yielding estimates of 2.11, 2.13, and 2.83. All estimates using the sandwich estimator were positive-definite, and the largest estimated pooled Formula was 1.82.


    EXAMPLE: THE MEGA STUDY
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
We now consider data from the MEGA Study (3). In this example, we study the relation between smoking, coded as current or past smoker versus nonsmoker, and the occurrence of thrombosis. Age (continuous), sex, body mass index (continuous), and current pregnancy were considered as possible confounding variables. Leaving out subjects with missing values yielded a data set with 3,986 patients. Of these patients, 2,286 had a matched control. The unmatched control group consisted of 2,612 subjects. Table 2 shows the results of the analyses carried out without correction for confounders. Four estimates of the odds ratio are given: 1) an estimate obtained using only the matched pairs, 2) an estimate obtained using only cases and unmatched controls, 3) the pooled estimate obtained using the sandwich correlation estimate, and 4) the pooled estimate obtained using the bootstrap correlation estimate.


View this table:
[in this window]
[in a new window]

 
Table 2. Estimated Log Odds Ratios (Formula) and Odds Ratios for the Relation Between Smoking and Thrombosis in Matched, Unmatched, and Pooled Analyses, Using Models Without Any Putative Confoundersa

 
For both pooling methods, the pooled coefficient is between the estimated coefficients of the matched and unmatched analyses. It is closer to the unmatched estimate because this was estimated with greater precision. The combined estimate had a smaller standard error and 95% confidence interval. The results derived using the bootstrap estimator and the sandwich estimator are very similar. The effects in the matched analysis are smaller than those in the unmatched analysis, suggesting that part of the relation between smoking and thrombosis could be due to confounding factors, which implicitly are adjusted for in the matched analysis but not in the unmatched analysis.

Table 3 shows results obtained after adjusting for confounders. Only the estimate for smoking was pooled; the effect of the confounders was allowed to be different in the matched and unmatched analysis. Here the odds ratios from the matched and unmatched analyses were very similar. Both the sandwich method and the bootstrap method yielded a pooled odds ratio of 1.33 (95% confidence interval: 1.21, 1.46).


View this table:
[in this window]
[in a new window]

 
Table 3. Estimated Log Odds Ratios (Formula) and Odds Ratios for the Relation Between Smoking and Thrombosis in Matched, Unmatched, and Pooled Analyses, Using Models With Age, Sex, Body Mass Index, and Pregnancy Included as Confoundersa

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 
We have introduced a simple method of combining estimates from several case-control analyses into 1 overall estimate. In ordinary meta-analysis, calculating the pooled mean of 2 studies yields exactly the same estimate as that derived using the individual data (11). Additionally, pooling log odds ratios yields an estimator which is asymptotically equivalent to the maximum likelihood estimate, using individual data. Therefore, we do not expect that much efficiency will be lost with the pooling approach.

We used the sandwich method and the bootstrap method to estimate the covariance matrix between the correlated β’s. Although both estimators are consistent, neither is unbiased. In addition, the sandwich estimator can be quite variable in comparison with parametric estimates (12). In the simulations, the sandwich correlation was slightly more variable, but the pooled estimates were very similar. The sandwich estimate is easy to calculate, and in all simulations it yielded a positive-definite covariance matrix. The bootstrap is more computer-intensive, and it occasionally yielded non-positive-definite covariance matrices. Whether the estimated covariance matrix C is positive-definite should be checked, because the pooled estimate can otherwise be very wrong. Alternatively, the method described above in "Some technical aspects" could be used.

Matching can be done using measured covariates, like age and sex, in which case it is clear which variables to include in the unconditional regression analysis. If matching is implicitly done on more factors than those measured, like the situation with siblings or partner controls, one should be careful in pooling the results. If there are unmeasured matching factors which are also related to the outcome, the conditional odds ratio, which is a stratum-specific odds ratio, is generally larger than the unconditional odds ratio. The test for equivalence of the odds ratios and the estimated difference with its associated 95% confidence interval can be used to study the magnitude of this problem. If the difference is not too large, pooling could still be done. As in meta-analysis, even if studies do not estimate exactly the same parameter, an overall pooled effect measure with a confidence interval gives an impression of the size of the effect.

Our example was of a large case-control study with a pairwise matched control group and an unmatched control group. There are many other applications; for example, it is straightforward to apply these methods to control groups with other forms of matching or stratification. Another application is in genetic studies, where matching within families is done to study rare genetic effects and cases are combined with related and unrelated controls in order to estimate gene-environment interaction more efficiently (6, 7). We have used this method to combine family association data with results from an unmatched case-control analysis (13). The methods can also be useful if different confounders are measured in 2 control groups or confounders are measured in a different way. Instead of using complex imputation methods, the pooling methods can be used as an easy alternative.

A SAS macro (SAS Institute Inc., Cary, North Carolina) for pooling odds ratios using the sandwich method is available from the first author upon request.


    ACKNOWLEDGMENTS
 
Author affiliations: Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands (Saskia le Cessie, Hans C. van Houwelingen); Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands (Saskia le Cessie, Frits R. Rosendaal, Karlijn J. van Stralen, Elisabeth R. Pomp); Department of Community Medicine, United Arab Emirates University, Al Ain, United Arab Emirates (Nico Nagelkerke); and Laboratory for Vaccine-Preventable Diseases, National Institute for Public Health and the Environment, Bilthoven, the Netherlands (Hans C. van Houwelingen).

Conflict of interest: none declared.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 POOLING TWO CORRELATED ESTIMATES
 ESTIMATING THE CORRELATION...
 TESTING WHETHER THE ODDS...
 SIMULATION
 EXAMPLE: THE MEGA STUDY
 DISCUSSION
 References
 

  1. Breslow NE, Day NE. Statistical Methods in Cancer Research. Vol 1. The Analysis of Case-Control Studies (1980) Lyon, France: International Agency for Research on Cancer. (IARC Scientific Publication no. 32).
  2. Rothman KJ, Greenland S, eds. Modern Epidemiology (1998) 2nd ed. Philadelphia, PA: Lippincott-Raven Publishers.
  3. Pomp ER, Rosendaal FR, Doggen CJM. Smoking increases the risk of venous thrombosis and acts synergistically with oral contraceptive use. Am J Hematol (2008) 83(2):97–102.[CrossRef][Web of Science][Medline]
  4. Moreno V, Martìn ML, Bosch FX, et al. Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies. Am J Epidemiol (1996) 143(3):293–300.[Abstract/Free Full Text]
  5. Huberman M, Langholz B. Application of the missing-indicator method in matched case-control studies with incomplete data. Am J Epidemiol (1999) 150(12):1340–1345.[Abstract/Free Full Text]
  6. Andrieu N, Goldstein AM. The case-combined-control design was efficient in detecting gene-environment interactions. J Clin Epidemiol (2004) 57(7):662–671.[CrossRef][Web of Science][Medline]
  7. Goldstein AM, Dondon MG, Andrieu N. Unconditional analyses can increase efficiency in assessing gene-environment interaction of the case-combined-control design. Int J Epidemiol (2006) 35(4):1067–1073.[Abstract/Free Full Text]
  8. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics (1986) 42(1):101–130.
  9. Freedman DA. On the so-called "Huber sandwich estimator" and "robust standard errors." Am Stat (2006) 60(4):299–302.[CrossRef]
  10. Efron B, Tibshirani RJ. An Introduction to the Bootstrap (1993) London, United Kingdom: Chapman and Hall Ltd.
  11. Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics (1998) 54(1):317–322.[CrossRef][Medline]
  12. Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. (2001) 96(10):1387–1396.[CrossRef][Web of Science]
  13. Janssen R, Bont L, Siezen CLE, et al. Genetic susceptibility to respiratory syncytial virus bronchiolitis is predominantly associated with innate immune genes. J Infect Dis (2007) 196(6):826–834.[CrossRef][Web of Science][Medline]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Appendix
Right arrow All Versions of this Article:
168/10/1204    most recent
kwn236v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Cessie, S. l.
Right arrow Articles by van Houwelingen, H. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cessie, S. l.
Right arrow Articles by van Houwelingen, H. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?