Skip Navigation


American Journal of Epidemiology Advance Access originally published online on November 17, 2005
American Journal of Epidemiology 2006 163(1):76-83; doi:10.1093/aje/kwj011
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
163/1/76    most recent
kwj011v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Hoffmann, K.
Right arrow Articles by Boeing, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hoffmann, K.
Right arrow Articles by Boeing, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology Copyright © 2005 by the Johns Hopkins Bloomberg School of Public Health All rights reserved; printed in U.S.A.

Original Contribution

Estimating the Proportion of Disease due to Classes of Sufficient Causes

Kurt Hoffmann, Christin Heidemann, Cornelia Weikert, Matthias B. Schulze and Heiner Boeing

From the Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany

Correspondence to Dr. Kurt Hoffmann, Department of Epidemiology, German Institute of Human Nutrition, Arthur-Scheunert-Allee 114–116, 14558 Nuthetal, Germany (e-mail: khoff{at}mail.dife.de).

Received for publication April 21, 2005. Accepted for publication August 17, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Disease can be caused by different mechanisms. A possible causal model proposed by Rothman is a complete causal mechanism or a so-called "sufficient cause" consisting of a set of component causes that can be illustrated in a pie chart. However, this model does not allow finding out what sufficient causes produce the majority of cases. The authors' objective was to extend Rothman's work by quantifying the proportion of disease that can be attributed to a class of sufficient causes. The underlying idea was to consider all combinations of a given set of known risk factors and to assign each combination to a class of sufficient causes. This assignment makes it possible to evaluate a class of sufficient causes by the population attributable fraction of the corresponding combination of risk factors. The approach presented was applied to sufficient causes of myocardial infarction by use of data on participants recruited between 1994 and 1998 into the European Prospective Investigation into Cancer and Nutrition-Potsdam Study. As a result, 51.8% of cases were attributed to only four different classes of sufficient causes. In conclusion, the statistical method described in the paper may be beneficial for quantifying the importance of different sufficient causes and for improving the efficiency of public health programs.

models, statistical; myocardial infarction; risk factors; statistics


Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; PAF, population attributable fraction; PDC, proportion of disease due to a class of sufficient causes


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Rothman's model of sufficient and component causes (1Go, 2Go) introduced in 1976 is one of the most discussed causal models in epidemiology. Despite some inherent limitations (3Go, 4Go), this model seems to be appropriate to reflect the multiplicity of causal pathways and the biologic interaction among component causes. The pie-chart description of possible classes of sufficient causes is an illustrative method to differentiate among distinct etiologic mechanisms, leaving some unlabeled slices to represent unknown component causes. However, up to now, Rothman's model is only a theoretical framework for epidemiology, with no direct connection to empirical data. Although epidemiologic studies on a specific disease should give information as to which of the possible sufficient causes probably exist and which do not, no approach has been suggested to get quantitative results. In other words, the application of the sufficient-component cause model in epidemiologic research is hampered by lack of an appropriate statistical method for estimating the proportion of disease due to different sufficient causes. Thus, an extension of Rothman's work is needed.

An obvious approach to find out the most relevant sufficient causes is based on calculating the frequencies of component causes and of their combinations within cases (5Go). However, a high frequency of a specific combination of component causes within cases does not necessarily mean that this combination is a sufficient cause of high relevance, because the corresponding frequency within noncases may also be high. Moreover, this approach does not allow adjustment for other known factors and is not related to the common notions and concepts of epidemiology. In general, the importance of sufficient causes depends on both the relative risks and the frequencies of all possible component causes and their combinations. Because the population attributable fraction (PAF) already comprises the strength of effect and the frequency of exposure, it is promising to look for a complex relation between sufficient causes and PAF. Such a relation should enable us to evaluate the importance of sufficient causes by using the same epidemiologic data that are useful for estimating PAF.

In the present paper, a system of equations is derived and represented that relates adjusted PAF to the proportion of disease due to a class of sufficient causes (PDC). Solving these equations leads to an estimated frequency distribution for sufficient causes. Moreover, PDC itself can be interpreted as the PAF of a complex event that is characterized by a specific combination of present and absent single risk factors. Thus, estimation formulas and confidence intervals proposed for PAF can be carried over to PDC. We applied this approach to data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam Study to evaluate classes of sufficient causes for incident myocardial infarction.


    SUFFICIENT CAUSES AND POPULATION ATTRIBUTABLE FRACTION
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
A "sufficient cause" is defined as a set of minimal conditions that inevitably produce disease (2Go). It consists of a number of component causes. Each component cause is necessary for the completion of the sufficient cause. The completion of a sufficient cause is equivalent to the onset of the earliest stage of disease. In other words, the component cause that acts last determines the time of the onset. If one or more of the component causes are absent, the causal mechanism cannot operate.

Two sufficient causes can be similar because of having some of the same components. Differentiating between these sufficient causes can possibly be difficult, since some of the distinct component causes are often not known or not measured. Leaving some pieces unlabeled in a pie chart is not only a way to represent unknown component causes but also a possibility to group similar sufficient causes. Thus, any pie chart with an unlabeled piece can be considered a class of sufficient causes. The common characteristic of sufficient causes belonging to the same class is the same set of known component causes. Formally, such classes can also include sufficient causes with unpredictable components to include random events (6Go).

For the sake of simplicity, we first regard the case of only two known component causes, for example, X1 and X2. Then, four different classes of sufficient causes can be distinguished as shown in figure 1. The class S1 comprises all sufficient causes that have X1, but not X2, as the component cause. The symbol U1 stands for one or more unknown component causes. Analogously, S2 is defined as the class of sufficient causes that contains X2 as the only known component. S12 stands for causal mechanisms that require the presence of both X1 and X2. Finally, the class S0 comprises all sufficient causes that produce disease in the absence of X1 and X2. Obviously, any single sufficient cause belongs to one and only one of the four classes depicted in figure 1.



View larger version (17K):
[in this window]
[in a new window]
 
FIGURE 1. The four classes (S1, S2, S12, and S0) of sufficient causes in the case of two known risk factors (X1 and X2). The U symbols stand for one or more unknown component causes.

 
Now, consider the PAF of the component causes X1 and X2. The PAF of a single condition is defined here as the fraction of cases that would not have occurred if the condition were not present (7Go). If stratification successfully removes confounding, the PAF can be validly estimated by the formula of Miettinen (8Go) that requires estimates of the condition prevalence among cases and the risk ratio standardized to the condition (2Go, p. 295). Analogously, the joint PAF of two conditions defined as the fraction of cases that would not have occurred if both conditions were not present can also be estimated by Miettinen's formula. Denoting the proportion of disease due to a class S of sufficient causes by PDC(S), the following four equations hold.

(1)

The equations can be verified by the pie charts of figure 1. First, consider the case that X1 has not occurred. Then, sufficient causes of types S1 and S12 could not have been completed, whereas the completion of all other sufficient causes would not have been affected. Therefore, the fraction of cases that can be attributed to X1 must be the sum of the proportions of disease caused by S1 and S12 as stated by the first equation. The second equation has a quite similar interpretation, but here X2 plays the role of X1. The remaining two equations refer to the joint PAF of both component causes. A generalization of these equations to the situation of more than two known component causes is straightforward. In general, the joint PAF of any m conditions can be calculated by summing up the PDC values of all sufficient causes that contain at least one of the m conditions as component cause.

The equations 1 are not suitable to calculate the frequencies of sufficient causes from the single and joint PAF of component causes. For this purpose, the equations 1 should be rewritten in the following form:

(2)

To get an impression of equations 2, we considered a very strong component cause X1 with a PAF of 0.5 and a strong component cause X2 with a PAF of 0.3. For simplicity, we assumed that both components are risk factors and that the presence of one of the components is not protective for disease in any etiologic mechanism. Then, the joint PAF cannot be smaller than the larger of each individual PAF and cannot be larger than their sum. Therefore, in this example, the joint PAF can vary only between 0.5 and 0.8.

For different values of the joint PAF, table 1 gives the frequency distribution for sufficient causes. Obviously, for the increasing value of the joint PAF, the proportions of S1 and S2 also increase, whereas the proportions of S12 and S0 decrease. In the case of the minimal joint PAF, no sufficient cause that contains only the weaker component X2 can exist. Thus, the proportion of disease due to S2 has to be zero. More interesting is the opposite case of the maximal joint PAF. Here, the joint PAF is equal to the sum of single PAFs, which means that no interaction in public health contexts between X1 and X2 exists (9Go, 10Go). As we can see in the last row of table 1, the additivity of attributable fractions implies that both causes X1 and X2 cannot belong to a common sufficient cause. In other words, they act in different etiologic mechanisms and have no biologic interaction (2Go). Thus, the two concepts of public health and biologic interaction have the same referent point corresponding to PAF(X1,X2) = 0.8.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Relation between the population attributable fraction of component causes and the frequency distribution of sufficient causes in a fictive example

 
The equations 2 can be generalized to the case of more than two component causes. Let k be the number of known component causes, and then 2k classes of sufficient causes can be distinguished. To calculate the PDC for all 2k classes from the PAF of the component causes, the same number of equations is necessary. These 2k equations are derived in appendix 1.


    EVALUATING SUFFICIENT CAUSES
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Up to now, we have only a vague impression of PDC as the proportion of cases that is due to a specific class of sufficient causes. Indeed, we need a precise definition of this notion by using the terminology of probability theory. For this purpose, consider the class of sufficient causes that contains from the k known factors X1, ..., Xk only the first m ones as component causes. We denote this multifactorial event by E = (X1 = 1, ..., Xm = 1, Xm+1 = 0, ..., Xk = 0) and the corresponding class of sufficient causes by SE. Obviously, the PDC of SE cannot be larger than the proportion of cases that had the event E at the onset of disease. Formally, the PDC is bounded above by P(E|D) that denotes the conditional probability of the event E under the condition that the individual develops the disease D. However, within the stratum of cases that have the event E, a fraction of individuals would also develop the disease if all known component causes of SE were absent before. The remaining fraction in this stratum is equal to the proportion of cases that is due to SE. Thus, we get the definition,

(3)
where E0 stands for the zero event (X1 = 0, ..., Xk = 0), whereas P(D|E) and P(D|E0) are the disease rates in the strata defined by E and E0, respectively. Throughout the paper, we assume that the disease rate among the individuals with event E exceeds the disease rate among the unexposed. Dividing the numerator and the denominator of the right-hand term by P(D|E0) yields the representation,

(4)
with RR denoting the relative risk of the disease in individuals with event E versus individuals with E0. Obviously, equation 4 is similar to the formula of Miettinen (8Go) for attributable fraction. Actually, the PDC of the class SE of sufficient causes is equal to the PAF of the multifactorial event E. By definition, the class of sufficient causes and the multifactorial event form an inseparable pair. The 2k different combinations for presence and absence of the k known factors define the same number of strata in the sample and the same number of classes of sufficient causes.

It can be proven that the definition of PDC by equation 3 or equation 4 is consistent with the relations between PDC and PAF represented before. Actually, equation 4 gives a solution of the equation systems 1 and 2 and its generalization to the multicomponent case (appendix 2). Because 2k equations for 2k different PDC variables cannot have more than one solution for each PDC, the solution just given is the unique one.

Equation 4 is suitable to get a valid estimate of PDC from any valid estimate of RR. Because equation 4 is the PAF formula of Miettinen for the event E, it holds also if adjustment is needed (2Go). The estimate of PDC can be adjusted for confounding factors by applying any one of the available statistical methods (11Go–13Go). We chose the most flexible and general adjustment method based on RR estimates from a logistic regression model that includes all the confounders assessed in the study. Confidence limits for the estimated PDC can be derived from the theory of PAF by using an implicit delta method (14Go, 15Go), the delta method ignoring some variances and covariances (16Go, 17Go), the asymptotic variance formula for the maximum likelihood estimator (18Go), or a variance formula originally proposed for case-control studies (2Go, 19Go, 20Go). In this paper, we utilized the last one.

To differentiate between relevant and nonrelevant sufficient causes and to simultaneously estimate the PDC for all relevant classes of sufficient causes, we propose the following multistep procedure.

  1. Define indicator variables (dichotomous variables) for all 2k – 1 multifactorial events different from the zero event E0.
  2. Define all covariates that are planned for the analysis.
  3. Starting with a model with all the covariates defined before, apply stepwise regression on the indicator variables only, to include all significant indicator variables in the final model.
  4. Calculate the relative frequencies of the remaining significant multifactorial events within all cases.
  5. Estimate the relative risks for the significant multifactorial events and insert these estimates into equation 4 to estimate the corresponding PDC.

This procedure includes statistical tests for the stratum-specific effects (step 3). A significant rejection of the hypothesis that the stratum-specific effect parameter is zero is equivalent to a significant rejection of the corresponding hypothesis RR = 1 and is a necessary condition for the significant rejection of the hypothesis PDC = 0. Thus, exclusion of nonsignificant indicator variables ignores only such classes of sufficient causes for which the confidence interval for PDC does contain zero. The subsequent statistical analysis was performed with SAS software (21Go).


    APPLICATION TO REAL DATA
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Study population
The study population was the EPIC-Potsdam cohort, which is one of the two German cohorts contributing to the EPIC Study, a multicenter cohort study into diet and chronic diseases (22Go). A total of 27,548 subjects aged from 35 to 65 years were recruited from the general population between 1994 and 1998 (23Go). Baseline examinations included anthropometric measurements, blood sampling, the completion of a food frequency questionnaire, and a personal interview on lifestyle habits and medical history. Follow-up questionnaires are sent to the study participants every 2–3 years.

All potential cases of incident myocardial infarction were identified by self-reports or death certificates. Cases were defined as participants who developed myocardial infarction according to International Classification of Diseases, Tenth Revision, codes I21.0–I21.9 (24Go). All incident cases of nonfatal or fatal myocardial infarction were verified by patients' medical records or death certificates according to World Health Organization Monitoring of Trends and Determinants in Cardiovascular Disease (MONICA) criteria (25Go). Participants with missing follow-up status or prevalent myocardial infarction at baseline were excluded, leaving 26,972 participants (10Go,470 men and 16,502 women) for analyses. Among these, we identified 159 newly diagnosed cases of myocardial infarction (116 nonfatal and 43 fatal) that occurred between baseline and April 30, 2004. Mean follow-up was 4.6 years.

Risk factors
From the potentially modifiable risk factors associated with myocardial infarction (26Go–30Go), the following four were considered: smoking, hypertension, obesity, and lack of exercise. We defined individuals at risk from smoking if they were current smokers or had quit smoking within the past 5 years. Hypertension was defined as having a systolic blood pressure of 140 mmHg or more, having a diastolic blood pressure of 90 mmHg or more, taking antihypertensive medication, or self-reporting a hypertension diagnosis. According to previous studies (30Go), obesity was considered abdominal obesity and defined by a waist/hip ratio of 0.9 or more in men and 0.8 or more in women. Individuals were judged to have a lack of exercise if they engaged in less than 2 hours a week in sporting activities.

Estimated risks and frequencies
Indicator variables were defined for all 15 combinations of the four risk factors that include at least one present risk factor. The prevalence of each combination within cases and noncases is given in table 2. As expected, the percentage of individuals with three or four risk factors was markedly higher within cases than within noncases. Indeed, 78.1 percent (48.5 percent plus 29.6 percent) of cases had at least three risk factors, whereas the corresponding percentage within noncases was only 37.6 percent (30.0 percent plus 7.6 percent). However, the inverse relation was observed for individuals with none, one, or two risk factors.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Prevalence and relative risk for combinations of risk factors for incident myocardial infarction in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study (159 cases, 26,813 noncases), 2004

 
Relative risks for the 15 combinations were estimated using a logistic regression model adjusted for age, sex, prevalence of diabetes, and history of dyslipidemia based on self reports of a diagnosis or of taking cholesterol-lowering medication. The results are given in the last column of table 2. The relative risks for combinations with one or two present risk factors were not significantly different from one. In contrast, lack of exercise together with any two other risk factors was associated with a significantly elevated relative risk for myocardial infarction. The highest relative risk of 5.4 (95 percent confidence interval: 3.5, 8.5) was associated with the simultaneous presence of all four risk factors.

Following the procedure described before, PDC was estimated for all classes of sufficient causes that correspond to significant multifactorial events. The estimates and the 95 percent confidence intervals are given in figure 2. As expected from table 2, the class of sufficient causes containing all four risk factors as component causes was the most important one. The estimated proportion of disease due to this class was 24.1 percent. Somewhat surprisingly, the class of sufficient causes that requires the presence of hypertension, obesity, and lack of exercise ranked second with 14.6 percent of cases, although the corresponding relative risk was only 2.0. This high percentage can be explained by the high prevalence of the corresponding multifactorial event. The PDCs of the other two classes with three known component causes were 9.6 percent and 3.5 percent. The PAF of the single risk factors can be estimated by summing up the PDC values of all sufficient causes that contain the risk factor as component cause. Thus, lack of exercise is the risk factor with the highest PAF of 51.8 percent, followed by obesity (48.3 percent), hypertension (42.2 percent), and smoking (37.2 percent).



View larger version (20K):
[in this window]
[in a new window]
 
FIGURE 2. Proportion of myocardial infarction (95% confidence intervals) due to classes of sufficient causes in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study, 2004.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
The concept of sufficient cause and component causes introduced by Rothman is a tempting approach to model causality. In contrast to a completely deterministic model, stochastic elements for treating unmeasured or unpredictable component causes can be included in the pie charts by unlabeled slices. Thus, sufficient causes can be modeled without knowing the component causes completely. In this paper, we classified sufficient causes by multifactorial events that reflect specific combinations of known risk factors. Therefore, sufficient causes are not distinguishable if they belong to the same class. However, sufficient causes have different known causal components if they belong to different classes. Based on such a classification scheme, a statistical approach was described to estimate the proportion of disease that is due to a class of sufficient causes.

Although the PDC is defined as attributable fraction of a multifactorial event, it clearly differs from the joint PAF of multiple factors because the known component causes of the sufficient causes are fixed. The estimated PDC of all possible classes allows a partition of cases into strata characterized by combinations of known risk factors, whereas PAF estimates refer to overlapping events in general. Therefore, the practical value of the presented estimation procedure consists in deriving more detailed and precise results on the associations between the presence of risk factors and development of disease. To illustrate this issue, consider abdominal obesity as a risk factor for myocardial infarction in the EPIC-Potsdam Study. A recommendation to lose weight would be addressed to all obese individuals if only the PAF estimates are given, whereas the PDC estimates allow restricting the recommendation to obese individuals lacking exercise who are smokers or have hypertension. For both strategies, the proportion of disease that can be potentially prevented is 48.3 percent. However, the efficiency of the second strategy is much higher, because only 33.7 percent of subjects have the described risk profile compared with 53.1 percent of individuals who are obese (refer to table 2 and summing up respective prevalence within noncases).

Some limitations for the described estimation procedure should be noted. First, the simultaneous consideration of a lot of risk factors and the differentiation among many classes of sufficient causes require a high number of cases. If the sample size and especially the number of cases are too low, stepwise regression will exclude many indicator variables, and the estimated PDC of some important classes of sufficient causes may not be significantly different from zero. This limitation was apparent in the EPIC-Potsdam Study of incident myocardial infarction in which all combinations with one and two risk factors present and one combination with three risk factors present were excluded by stepwise regression. Large studies, such as the INTERHEART study (30Go) with about 15,000 cases, do not encounter this problem and should show a higher number of important classes of sufficient causes. Second, the estimation procedure for PDC supposes dichotomous risk factors. If originally continuous variables are dichotomized, the cutoffs chosen may be arbitrary, and change of cutoffs may lead to different results. Although it is possible to include different possible cutoffs by different types of sufficient causes, a conclusive strategy to handle this problem is not available.

Different concepts and interpretations of PAF have been proposed and discussed over the last 50 years (2Go, 31Go, 32Go). The concept used in our paper is that of "excess fraction," defined as the fraction of cases that would not have occurred if the exposure (risk factor) had not occurred. The advantage of this concept is that an excess fraction can be validly estimated by the formulas of Miettinen (8Go) and Bruzzi et al. (33Go) by inserting an adjusted odds ratio from logistic regression, provided that there are no unobserved confounders and disease is rare (2Go). Because we have defined the PDC of a specific class SE as the PAF of the corresponding multifactorial event E, it can be interpreted as the excess fraction of cases. Clearly, from the standpoint of biology, the fraction of cases that are etiologically attributable to SE seems to be more relevant than the fraction of excess cases. Unfortunately, one cannot estimate the etiologic fraction without resorting to very strong biologic assumptions. The etiologic fraction is larger than the excess faction (2Go, 31Go), because some cases caused by SE still have become cases within the considered time period had E never occurred and, therefore, do not belong to the excess fraction. On the other hand, because the etiologic fraction attributable to SE cannot be larger than the proportion of cases with the event E, the current approach gives lower and upper bounds for the unknown etiologic fraction. In our example of myocardial infarction, for the class of sufficient causes with all four known risk factors as component causes, the etiologic fraction cannot be smaller than 24.1 percent (figure 2) and not larger than 29.6 percent (table 2).

Besides sufficient-component cause models, three other major types of causal models have been applied in health-sciences research: causal diagrams (34Go, 35Go), potential-outcome models (36Go, 37Go), and structural-equations models (38Go). Whereas the last two models provide a basis for quantitative analysis of effects, Rothman's model of sufficient and component causes was considered only to illustrate the specific hypotheses about mechanisms of action (39Go). The extension of Rothman's work presented may be helpful for quantitative analyses and for using the sufficient-component cause model beyond teaching examples. Furthermore, because a class of sufficient causes is characterized by a specific combination of present and absent risk factors, recommendations aimed to reduce the risk of disease can be specified to subgroups of individuals with the same risk profile. An application of the method to a study of myocardial infarction demonstrated the possible public health benefit of quantifying the importance of different sufficient causes.


    APPENDIX 1
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Equation System for PDC in the Multicomponent Case
This appendix provides the equations for calculating the frequency distribution for sufficient causes from the single and joint PAF of component causes. Let X1, ..., Xk be known risk factors of disease, let I be a subset of {1, ..., k}, and let SI be the class of sufficient causes that all contain Xi, i I, as component causes but that do not contain Xi for i {notin} I.

Obviously, the joint PAF of X1, ..., Xk must be greater than or equal to the joint PAF of X1, ..., Xi–1, Xi+1, ..., Xk. The difference between these two PAFs is equal to the fraction of cases that is due to the class Si of sufficient causes containing Xi as the only known component cause. Thus, the following equations hold.

(A1)

Now consider any permutation X(1), ..., X(k) of the risk factors and let X(m+1), ..., X(k) be the last km factors in this arrangement. Then, the difference between the PAF of X1, ..., Xk and the PAF of X(m+1), ..., X(k) can be attributed to sufficient causes containing only risk factors as known component causes that have to belong to {X(1), ..., X(m)}. Therefore, we obtain the following relations:

(A2)
where the sum on the left-hand side runs over all subsets of {(1), ..., (m)}. If we isolate the class of sufficient causes with the largest number of known component causes, equation A2 can be rewritten as

(A3)
where the sum on the right-hand side runs over all proper subsets of {(1), ..., (m)}. In the special case m = k, the second term on the right-hand side of equation A3 must be set equal to zero. The recursive equations A3, together with the initial equations A1, allow the calculation of almost all frequencies of sufficient causes. The remaining frequency refers to S0 that can be calculated by the simple additional formula:

(A4)
The equations A1, A3, and A4 together form a system of 2k equations appropriate to determine the frequencies of the 2k classes of sufficient causes.


    APPENDIX 2
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 
Solution of the Equation System in the Multicomponent Case
Again, let X(1), ..., X(k) be any permutation of the risk factors X1, ..., Xk, and let SI, I {subseteq} {1, ..., k}, be the class of sufficient causes that contains all Xi, i I, as component causes but that does not contain Xi for i {notin} I. Then, in the generalization of equation 1, the joint PAF of the first m risk factors X(1), ..., X(m), m ≤ k, can be obtained by summing up the PDC values of all sufficient causes that contain at least one of the m risk factors as the component cause. Thus, we have the following equation:

(A5)

On the other hand, applying the formula of Bruzzi et al. (33Go) that was formally proven by Benichou (12Go), the PAF of X(1), ..., X(m) is equal to

(A6)
where EI is the multifactorial event with Xi = 1 for i I and Xi = 0 for i {notin} I, and where RRI is the relative risk of EI versus E0. Thus, PDC defined by equation 4 is a solution of the equation system A5. Since equation A5 is equivalent to equations A1 and A2, equation 4 also represents the solution for the latter ones.


    ACKNOWLEDGMENTS
 
Conflict of interest: none declared.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 SUFFICIENT CAUSES AND POPULATION...
 EVALUATING SUFFICIENT CAUSES
 APPLICATION TO REAL DATA
 DISCUSSION
 APPENDIX 1
 APPENDIX 2
 References
 

  1. Rothman KJ. Causes. Am J Epidemiol 1976;104:587–92.
  2. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven, 1998.
  3. Karhausen LR. Causation in epidemiology: a Socratic dialogue: Plato. Int J Epidemiol 2001;30:704–6.
  4. Parascandola M, Weed DL. Causation in epidemiology. J Epidemiol Community Health 2001;55:905–12.[Abstract/Free Full Text]
  5. Reiber GE, Vileikyte L, Boyko EJ, et al. Causal pathways for incident lower-extremity ulcers in patients with diabetes from two settings. Diabetes Care 1999;22:157–62.[Abstract/Free Full Text]
  6. Poole C. Commentary: positivized epidemiology and the model of sufficient and component causes. Int J Epidemiol 2001;30:707–9.[Free Full Text]
  7. Levin ML. The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum 1953;9:531–41.
  8. Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait, or intervention. Am J Epidemiol 1974;99:325–32.[Abstract/Free Full Text]
  9. Blot WJ, Day NE. Synergism and interaction: are they equivalent? Am J Epidemiol 1979;110:99–100.[Free Full Text]
  10. Rothman KJ, Greenland S, Walker AM. Concepts of interaction. Am J Epidemiol 1980;112:467–70.[Free Full Text]
  11. Gefeller O. Comparison of adjusted attributable risk estimators. Stat Med 1992;11:2083–91.[Web of Science][Medline]
  12. Benichou J. Methods of adjustment for estimating the attributable risk in case-control studies: a review. Stat Med 1991;10:1753–73.[Web of Science][Medline]
  13. Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res 2001;10:195–216.[Abstract/Free Full Text]
  14. Benichou J, Gail MH. Variance calculations and confidence intervals for estimates of the attributable risk based on logistic models. Biometrics 1990;46:991–1003.[CrossRef][Web of Science][Medline]
  15. Benichou J, Chow WH, McLaughlin JK, et al. Population attributable risk of renal cell cancer in Minnesota. Am J Epidemiol 1998;148:424–30.[Abstract/Free Full Text]
  16. Wilson PD, Loffredo CA, Correa-Villasenor A, et al. Attributable fraction for cardiac malformations. Am J Epidemiol 1998;148:414–23.[Abstract/Free Full Text]
  17. Platz EA, Willett WC, Colditz GA, et al. Proportion of colon cancer risk that might be preventable in a cohort of middle-aged US men. Cancer Causes Control 2000;11:579–88.[CrossRef][Web of Science][Medline]
  18. Greenland S, Drescher K. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics 1993;49:865–72.[CrossRef][Web of Science][Medline]
  19. Whittemore AS. Statistical methods for estimating attributable risk from retrospective data. Stat Med 1982;1:229–43.[Medline]
  20. Whittemore AS. Estimating attributable risk from case-control studies. Am J Epidemiol 1983;117:76–85.[Abstract/Free Full Text]
  21. SAS Institute, Inc. SAS/STAT user's guide, version 8. Cary, NC: SAS Institute, Inc, 1999.
  22. Riboli E, Kaaks R. The EPIC Project: rationale and study design. Int J Epidemiol 1997;26(suppl 1):S6–14.[Abstract/Free Full Text]
  23. Boeing H, Korfmann A, Bergmann MM. Recruitment procedures of EPIC-Germany. Ann Nutr Metab 1999;43:205–15.[CrossRef][Web of Science][Medline]
  24. World Health Organization. International statistical classification of diseases and related health problems. Geneva, Switzerland: World Health Organization, 1992.
  25. Tunstall-Pedoe H, Kuulasmaa K, Amouyel P, et al. Myocardial infarction and coronary deaths in the World Health Organization MONICA Project. Registration procedures, event rates, and case-fatality rates in 38 populations from 21 countries in four continents. Circulation 1994;90:583–612.[Abstract/Free Full Text]
  26. Grundy SM, Pasternak R, Greenland P, et al. Assessment of cardiovascular risk by use of multiple-risk-factor assessment equations: a statement for healthcare professionals from the American Heart Association and the American College of Cardiology. Circulation 1999;100:1481–92.[Free Full Text]
  27. Greenland P, Knoll MD, Stamler J, et al. Major risk factors as antecedents of fatal and nonfatal coronary heart disease events. JAMA 2003;290:891–7.[Abstract/Free Full Text]
  28. Poirier P, Eckel RH. Obesity and cardiovascular disease. Curr Atheroscler Rep 2002;4:448–53.[Medline]
  29. Thompson PD, Buchner D, Pina IL, et al. Exercise and physical activity in the prevention and treatment of atherosclerotic cardiovascular disease: a statement from the Council on Clinical Cardiology (Subcommittee on Exercise, Rehabilitation, and Prevention) and the Council on Nutrition, Physical Activity, and Metabolism (Subcommittee on Physical Activity). Circulation 2003;107:3109–16.[Free Full Text]
  30. Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 2004;364:937–52.[CrossRef][Web of Science][Medline]
  31. Greenland S, Robins JM. Conceptual problems in the definition and interpretation of attributable fractions. Am J Epidemiol 1988;128:1185–97.[Free Full Text]
  32. Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health 1998;88:15–19.[Free Full Text]
  33. Bruzzi P, Green SB, Byar DP, et al. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol 1985;122:904–14.[Abstract/Free Full Text]
  34. Pearl J. Causal diagrams for empirical research (with discussion). Biometrika 1995;82:669–710.[Abstract/Free Full Text]
  35. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37–48.[CrossRef][Web of Science][Medline]
  36. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3:143–55.[Web of Science][Medline]
  37. Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Am Rev Public Health 2000;21:121–45.
  38. Pearl J. Causality. New York, NY: Cambridge University Press, 2000.
  39. Greenland S, Brumback B. An overview of relations among causal modeling methods. Int J Epidemiol 2002;31:1030–7.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Diabetes CareHome page
X. Yang, G. T.C. Ko, W. Y. So, R. C.W. Ma, A. P.S. Kong, C. W.K. Lam, C. S. Ho, C.-C. Chow, P. C.Y. Tong, and J. C.N. Chan
Additive Interaction of Hyperglycemia and Albuminuria on Risk of Ischemic Stroke in Type 2 Diabetes: Hong Kong Diabetes Registry
Diabetes Care, December 1, 2008; 31(12): 2294 - 2300.
[Abstract] [Full Text] [PDF]


Home page
StrokeHome page
C. Weikert, J. Dierkes, K. Hoffmann, K. Berger, D. Drogan, K. Klipstein-Grobusch, J. Spranger, M. Mohlig, C. Luley, and H. Boeing
B Vitamin Plasma Levels and the Risk of Ischemic Stroke and Transient Ischemic Attack in a German Cohort
Stroke, November 1, 2007; 38(11): 2912 - 2918.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
K. Hoffmann and W. D. Flanders
RE: "ESTIMATING THE PROPORTION OF DISEASE DUE TO CLASSES OF SUFFICIENT CAUSES"
Am. J. Epidemiol., December 15, 2006; 164(12): 1254 - 1255.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
163/1/76    most recent
kwj011v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Hoffmann, K.
Right arrow Articles by Boeing, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hoffmann, K.
Right arrow Articles by Boeing, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?