Skip Navigation


American Journal of Epidemiology Advance Access originally published online on September 17, 2007
American Journal of Epidemiology 2007 166(9):994-1002; doi:10.1093/aje/kwm231
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
166/9/994    most recent
kwm231v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Related articles in Am. J. Epidemiol.
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Robins, J. M.
Right arrow Articles by Rotnitzky, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Robins, J. M.
Right arrow Articles by Rotnitzky, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology © The Author 2007. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Invited Commentary

Invited Commentary: Effect Modification by Time-varying Covariates

James M. Robins1,2, Miguel A. Hernán1 and Andrea Rotnitzky2,3

1 Department of Epidemiology, Harvard School of Public Health, Boston, MA
2 Department of Biostatistics, Harvard School of Public Health, Boston, MA
3 Department of Economics, Universidad Di Tella, Buenos Aires, Argentina

Correspondence to Dr. Miguel A. Hernán, Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115 (e-mail: miguel_hernan{at}post.harvard.edu).

Received for publication November 21, 2006. Accepted for publication March 9, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
Marginal structural models (MSMs) allow estimation of effect modification by baseline covariates, but they are less useful for estimating effect modification by evolving time-varying covariates. Rather, structural nested models (SNMs) were specifically designed to estimate effect modification by time-varying covariates. In their paper, Petersen et al. (Am J Epidemiol 2007;166:985–993) describe history-adjusted MSMs as a generalized form of MSM and argue that history-adjusted MSMs allow a researcher to easily estimate effect modification by time-varying covariates. However, history-adjusted MSMs can result in logically incompatible parameter estimates and hence in contradictory substantive conclusions. Here the authors propose a more restrictive definition of history-adjusted MSMs than the one provided by Petersen et al. and compare the advantages and disadvantages of using history-adjusted MSMs, as opposed to SNMs, to examine effect modification by time-dependent covariates.

causality; confounding factors (epidemiology); longitudinal studies; nested model; observational data; structural model; time-dependent covariate


Abbreviations: MSM, marginal structural model; SNM, structural nested model


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
Marginal structural models (MSMs) are being increasingly used to estimate the effects of time-varying treatments or exposures. Unlike conventional statistical methods, MSMs allow consistent estimation of the effect of a time-varying treatment on an outcome of interest even when there is confounding by time-varying covariates affected by earlier treatment. However, MSMs have an important limitation. As was pointed out by Robins (1, 2) and Hernán et al. (3), MSMs naturally allow estimation of effect modification by baseline covariates, but they are less useful for estimating effect modification by evolving time-varying covariates. Rather, structural nested models (SNMs) were specifically designed to estimate effect modification by time-varying covariates.

In this issue of the Journal, Petersen et al. (4) describe history-adjusted MSMs, a generalized form of MSM that was first proposed by Joffe et al. (5) and studied in detail by van der Laan et al. (6). Petersen et al. argue that history-adjusted MSMs allow a researcher to easily estimate effect modification by time-varying covariates, thus overcoming an important shortcoming of standard MSMs.

However, as we explain below, this apparent advantage of history-adjusted MSMs over standard MSMs comes at a price: History-adjusted MSMs can produce logically incompatible parameter estimates and hence result in contradictory substantive conclusions. As a consequence, clinicians or other decision-makers relying on history-adjusted MSMs to decide the best course of action can be left without guidance. In this commentary, we clarify how history-adjusted MSMs differ from standard MSMs and describe the conditions under which incompatible parameter estimates can arise in the former. We also propose a more restrictive definition of history-adjusted MSMs than the one provided by Petersen et al. (4) and compare the advantages and/or disadvantages of using history-adjusted MSMs, as opposed to SNMs, to examine effect modification by time-dependent covariates.


    STANDARD MARGINAL STRUCTURAL MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
We start by briefly reviewing standard MSMs using Petersen et al.'s notation. To simplify the exposition, we assume a closed cohort with a well-defined time of enrollment for each subject and no loss to follow-up. Time is measured in periods (e.g., months) since time of enrollment, m = 0, until the end of follow-up, m = K + 1. We denote the treatment received in month m as A(m) and covariates measured at the start of month m as L(m). A subject's chronologically ordered data are therefore L(0), A(0), L(1), A(1), ...., L(K), A(K), L(K + 1). In Petersen et al.'s article (4), a subject's A(m) is 1 for the times m that the subject stays on the failing antiretroviral treatment and 0 after switching to another treatment, and CD4 T-cell count Y(m) is a component of the vector L(m). A nondynamic treatment regime that specifies the treatment at each time from time m through time t 1 is denoted by a(m, t – 1) = {a(m), a(m + 1), ..., a(t – 1)}. For example, in the paper by Petersen et al. (4), a(m, t – 1) = {1, 1, 0, 0, 0, ..., 0} would be the regime "switch from the failing treatment at time m + 2 and continue on the new treatment through t – 1." The counterfactual (or potential) variable Ya(m)(t) represents a subject's CD4 T-cell count measured at time t had the subject followed regime a(m, t – 1). In the paper by Petersen et al. (4), t is m + 8.

A standard MSM can be used to model the mean CD4 T-cell count Ya(m)(t) at time t under all possible nondynamic treatment regimes from baseline time m to t – 1, that is, E[Ya(m)(t)], where E[X] is the expected value or mean of the random variable X. If so desired, the model may be made conditional on baseline variables V(m) to model the conditional mean E[Ya(m)(t)|V(m)] as a function of a(m, t – 1) and V(m). Here V(m) is a vector whose components may include any function of a subject's treatment and covariate history measured before A(m). The model is not defined until the analyst chooses a baseline time m, a response time t, and a functional form for E[Ya(m)(t)|V(m)]. The choice of the times m and t turns out to be a key point in the comparison of standard versus history-adjusted MSMs, so we defer the discussion of this topic to the next section. For now, let us think of m and t as two fixed times after the time of enrollment—for example, m = 1 and t = 9. As to the choice of a functional form for E[Ya(m)(t)|V(m)], the analyst needs to use her subject-matter knowledge to decide what functions of treatment (e.g., duration of treatment, average treatment dose) and baseline variables V(m) are the most appropriate.

An example of a standard MSM is

Formula
where

Formula
is the duration of use of the failing treatment from the baseline time m through t – 1, as described in the paper by Petersen et al. (4). The parameter vector ß = (ß0, ß1) = (0, 0) is equivalent to the null hypothesis that treatment has no effect—that is, that E[Ya(m)(t)|V(m)] is the same for all regimes a(m, t – 1). The parameter ß1 captures effect modification by baseline variables on an additive scale: If ß0 and ß0 + ß1v(m) differ in sign for certain values v(m) of V(m), there is qualitative effect modification by V(m). In particular, if V(m) is univariate and binary, there is qualitative effect modification by V(m) if ß0 and ß0 + ß1 differ in sign. Under the assumption of no unmeasured confounding for the effect of treatment from time m to t – 1 on the mean of Y(t), the parameters of the MSM can be consistently estimated by inverse probability weighting (see Appendix).

Before comparing standard and history-adjusted MSMs in the next section, we point out one important warning for the causal interpretation of a standard MSM. Suppose the baseline time m exceeds 0 and V(m) is a vector with two components: "duration of treatment before m" and "CD4 T-cell count at m." Further suppose that both the estimate of the main effect of "treatment duration before m" and the estimate of the interaction between "treatment duration before m" and dur[a(m, t – 1)] ("treatment duration from m onwards") are large and highly significant. One cannot conclude that "treatment duration before m" has a causal effect on the response Y(t), because these results are compatible with 1) unmeasured confounding for treatment before m or 2) selection bias. To understand why those results might be explained by selection bias, consider the following scenario: 1) Treatment before m is a cause of CD4 T-cell count at m but not a cause of CD4 T-cell count at t, Y(t), and 2) an unmeasured genetic trait that is unassociated with treatment history is a cause of CD4 T-cell count at m and also causes Y(t) both directly and by interacting with treatment subsequent to m. When conditions 1 and 2 hold, conditioning the analysis on CD4 T-cell count at m, a common effect of the genetic trait and treatment before m, induces an association between treatment before m and the unmeasured genetic trait, and therefore between treatment before m and Y(t) (7). The causal directed acyclic graph shown in figure 1 depicts this situation with A(m–), C(m), A(m+), and Y(t) representing treatment before baseline m, CD4 T-cell count at m, treatment after baseline m, and outcome at time t, respectively. We refer to this association as "selection bias" because it exists even when both treatment before m has no causal effect on Y(t) and the genetic trait responsible for the selection bias is marginally unassociated with treatment before m and thus is a nonconfounder.


Figure 1
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIGURE 1. Selection bias for the effect of treatment before baseline.

 

    STANDARD VERSUS HISTORY-ADJUSTED MARGINAL STRUCTURAL MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
Below we discuss the choice of the response time t and the baseline time m. As we will see, these choices are intimately connected with the definitions of standard and history-adjusted MSMs.

Let us first discuss the choice of the response time t. The above standard MSM models the mean outcome at a single fixed time t (e.g., t = 9), and thus we say that it is a univariate MSM. However, a standard MSM need not be univariate. If we are willing to assume that the above model holds for all possible values of t greater than baseline time m, we can simultaneously model the mean of the outcome at all times t > m. The MSM is then multivariate. The procedure for the estimation, via inverse probability weighting, of the parameters of a multivariate MSM requires only a minor generalization of the procedure used for univariate MSMs (see Hernán et al. (8) and the Appendix for details). If one believes this multivariate MSM to be unrealistic because it assumes that the effect of treatment does not depend on the time t, one can make the model more flexible and allow for treatment effects that vary with time by replacing {theta} = ({theta}0, {theta}1) and ß = (ß0, ß1) with time-specific parameter vectors, {theta}t = ({theta}0,t, {theta}1,t) and ßt = (ß0,t, ß1,t).

Let us now turn our attention to the choice of the baseline time m. In many longitudinal studies, the effect of treatment received at time m from enrollment will be confounded unless one can adjust for high-quality time-varying laboratory, clinical, and treatment data collected over a number of periods prior to m. Any time m at which such high-quality data are available is eligible to be the baseline time of an MSM, although generally the earliest eligible time is chosen. For example, if measurements of treatment in the past month and CD4 T-cell counts in the previous 2 months were needed to control confounding for the effect of current treatment, then the earliest possible baseline time would be m = 1 if CD4 T-cell measurements began at the time of enrollment m = 0. (See Robins et al. (9) for a more detailed discussion.)

However, rather than using precisely one eligible baseline time (e.g., m = 1), one could decide to use all eligible baseline times m = 1, 2, 3, ... before t. Thus, in the above univariate MSM, we could see m as an index for multiple baseline times instead of as a single fixed time m. If one believes the MSM with multiple baseline times to be unrealistic because it assumes that the effect of treatment does not depend on the baseline time m, one can make the model more flexible and allow for treatment and covariate effects that vary with time by replacing {theta} = ({theta}0, {theta}1) and ß = (ß0, ß1) with time-specific parameter vectors, {theta}m = ({theta}0,m, {theta}1,m) and ßm = (ß0,m, ß1,m), which are indexed by the eligible baseline times m < t.

A univariate MSM with multiple baseline times appears closely analogous to a multivariate MSM, except with multiple baseline times m per subject substituted for multiple response times t. In fact, the procedure for estimation, via inverse probability weighting, of the parameters of a univariate MSM with multiple baseline times and of a multivariate MSM are also analogous (see Appendix).

Petersen et al. (4) refer to MSMs with multiple baseline times as "history-adjusted MSMs" (6, 10). MSMs with multiple baseline times can be divided into two mutually exclusive groups. For a given MSM and outcome time t, let num(t) count the number of different baseline times m for which the MSM models the effect of regimes beginning at m on the outcome Y(t). The first group is composed of MSMs for which num(t) exceeds 1 for one or more outcome times t. This group includes MSMs, similar to those considered in an earlier paper by van der Laan et al. (6), that model the effect on an outcome Y(t) of treatment regimes beginning at all times m prior to t. The second group is composed of MSMs for which num(t) is 1 for all outcome times t. This group includes the MSM discussed by Petersen et al. (4) that restricts the set of outcome times to months 8 and later and only models the effect on each outcome Y(t) of treatment regimes beginning at time m = t – 8. We propose that the use of the term "history-adjusted MSM" be reserved for the first group, for the following reasons.

First, restricting the name "history-adjusted" to MSMs in group 1 is more in keeping with Petersen et al.'s conceptualization of the difference between history-adjusted MSMs and standard MSMs (4, 10). Specifically, in their abstract, the authors state that "unlike standard MSMs, history-adjusted MSMs can be used to estimate modification of treatment effects by time-varying covariates" (4, p. 985). However, this claim is true only for MSMs in group 1: To estimate effect modification by a time-varying covariate on a response Y(t), we must, by definition, model effect modification at two or more times m, since otherwise we could regard the covariate as non-time-varying. In contrast to MSMs in group 1, MSMs in group 2 are like standard MSMs in that, for a given response Y(t), they estimate the magnitude of effect modification by past time-varying covariates only at a single baseline time m—for example, m = t 8. For this reason, we can regard MSMs in group 2 to be simply a collection of ordinary MSMs that, just like multivariate MSMs, allow increased estimation efficiency 1) by assuming that the parameters corresponding to different members of the collection are related and 2) by using more realistic working models than the independence model for within-individual correlations.

Second, it was only MSMs in group 1 that Robins (1, 2) was warning against when he stated that MSMs could not be easily used to estimate effect modification by evolving time-dependent covariates. This is because, as we explain below and in the Appendix, only MSMs in group 1 can be incompatible and thus lead to logical inconsistencies. Henceforth we refer only to models in group 1 as history-adjusted MSMs.


    MODEL INCOMPATIBILITY IN HISTORY-ADJUSTED MARGINAL STRUCTURAL MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
Below we show that the apparently nearly exact analogy between history-adjusted MSMs and a standard multivariate MSM goes only so far. Specifically, a history-adjusted MSM, unlike a standard multivariate MSM, may be an incompatible model. We say a model is incompatible if there exist any logically inconsistent (incompatible) parameter values. A familiar case of an incompatible model is a linear regression model Pr(D = 1|X) = {alpha}0 + {alpha}1X for a binary outcome D. For example, if the covariate X takes values 0, 1, ..., 100 and one fits this model by a method (such as ordinary least squares) that does not impose the constraint that predicted probabilities must lie between 0 and 1, one can easily obtain incompatible parameter estimates, such as Formula 0 = 0, Formula 1 = 0.02, that result in illogical statements such as "2 = 0 + (0.02)100 is the estimated probability that D = 1 among subjects with X = 100." In contrast, a logistic regression model logit Pr(D = 1|X) = {alpha}0 + {alpha}1X is compatible, because e{alpha}0 + {alpha}1X/(1 + e{alpha}0 + {alpha}1X) is always between 0 and 1.

We now provide an informal explanation of why history-adjusted MSMs may be incompatible (see the Appendix for a formal treatment). Let us start by considering our original univariate MSM with the only response time t equal to K + 1 but now with multiple baseline times m:

Formula
where V(m) is the entire covariate and treatment history measured before A(m). This history-adjusted MSM makes three critical assumptions:

  1. The direct effect of baseline treatment a(m) is the same as the effect of each subsequent component of the treatment.
  2. The effect of treatment from m + 1 to K is the same regardless of (i.e., is not modified by) the value of the baseline treatment a(m).
  3. The effect of (baseline and subsequent) treatment is the same for all baseline times.
In many settings, including most human immunodeficiency virus studies like the one described by Petersen et al. (4), these three assumptions are implausible and a more flexible, realistic, history-adjusted MSM is needed, such as

Formula
This model relaxes assumption 1 by including separate parameters for a(m) and dur[a(m + 1, K)], assumption 2 by including an interaction term between a(m) and dur[a(m + 1, K)], and assumption 3 by including an interaction term between time to the end of follow-up Km and both a(m) and dur[a(m + 1, K)]. In this model, the parameter vector ß(1) = (ßFormula, ßFormula, ßFormula) encodes the direct effect of baseline treatment when subsequent treatment is withheld, a(m + 1, K) = 0. We will henceforth refer to this simply as the direct effect of baseline treatment. The parameter vector ß(2) = (ßFormula, ßFormula, ßFormula, ßFormula) encodes the effect of subsequent cumulative treatment. The estimates of the model parameters, (Formula , Formula (1), Formula (2)), can be obtained by inverse probability weighting. The problem with this more flexible, realistic model is that it may lead to parameter estimates that are logically inconsistent, as we discuss below.

Suppose, as an example, that 1) the components of Formula (1) are all negative and lie within the interval (–1/4, –3/4) and a joint 95 percent confidence interval for ß(1) only includes vectors with all components lying between –1 and –0.01 and 2) the components of Formula (2) all exceed 10 and a joint 95 percent confidence interval for ß(2) includes only vectors with all components exceeding 8. The negative Formula (1) implies that, for each m, the effect of baseline treatment a(m) has a negative effect on Y(K + 1) when a(m + 1, K) = 0. The positive Formula (2) implies that cumulative treatment from m + 1 to K has a large positive effect. Furthermore, suppose the confidence intervals imply that the opposite signs of the estimated effects of baseline versus subsequent treatment cannot be explained by sampling variability.

However, it is logically impossible for cumulative treatment from m + 1 to K to have a large positive effect on Y(K + 1) if, for each time s greater than m, a(s) alone has a negative effect. This implies that the history-adjusted MSM is an incompatible model and the parameter estimates Formula (1) and Formula (2) are logically inconsistent. This result is made precise in theorem 1, shown in the Appendix, where it is formally proven that pairs (ß(1), ß(2)) with ß(1) negative and ß(2) positive are logically incompatible.

The incompatible estimates of Formula (1) and Formula (2) also result in logically inconsistent statements about clinical strategies. Specifically, theorem 1 shows that all components of Formula (1) being less than 0 implies that the estimated optimal treatment regime starting from any eligible time m is the regime 0(m), "always withhold treatment from m." However, in the Appendix, we also show that all components of Formula (2) being positive and larger in absolute value than those of Formula (1) implies that the regime 1(m), "always take treatment starting at m," is (estimated) to be preferable to the regime 0(m). The preceding two statements are logically inconsistent and taken together would leave a health-care provider without any guidance as to a reasonable treatment strategy. Thus, an analyst committed to using history-adjusted MSMs would face two undesirable alternatives: to use a compatible but unrealistic, and therefore probably very badly misspecified, model or to use a more realistic but incompatible model that may lead to logically inconsistent estimates.

Of course, the use of incompatible models only poses a difficulty if incompatible estimates are likely to occur. It is clear that an ordinary least-squares fit of our linear Bernoulli regression model will frequently result in incompatible estimates. It may be less clear that an inverse probability weighting fit of our incompatible history-adjusted MSM can also easily result in incompatible estimates. However, in the model used in our example, incompatible estimates may occur if 1) the model is somewhat misspecified in that the effect of subsequent treatment on the mean outcome actually depends on a much more complicated function of a(m + 1, K) than the assumed linear dependence on dur[a(m + 1, K)] and 2) for most times j, A(j) is highly correlated with the part of that complicated function of A(m + 1, K) that is uncorrelated with dur[A(m + 1, K)]. In the Appendix, we argue that it may be prohibitively difficult to develop an empirical test of fit for a history-adjusted MSM that reliably indicates that conditions 1 and 2 have not only occurred but are of sufficient magnitude to produce estimates which suffer from incompatibility to such an extent that the clinically relevant inferences may be compromised.


    STRUCTURAL NESTED MODELS VERSUS HISTORY-ADJUSTED MARGINAL STRUCTURAL MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
We have seen that the problem in fitting a history-adjusted MSM by inverse probability weighting is that the estimates Formula (1) of the direct effect of baseline treatment a(m) may be logically inconsistent with the estimated effect Formula (2) of subsequent treatment. A natural way to overcome this difficulty would be to only model the direct effect of a(m) while leaving the effect of subsequent treatment unmodeled. When, as in our example, V(m) is the entire covariate-and-treatment history measured before A(m), a model for the direct effect of a(m) at all eligible baseline times m is precisely an (additive) SNM (11, 12). Because we are only modeling the direct effect of baseline treatment, the parameters ß(1) can no longer be well estimated by inverse probability weighting but rather should be estimated using g-estimation. Furthermore, after obtaining a g-estimate Formula (1), one can estimate the mean of the counterfactual outcome of interest under any treatment regime without having to model the effect of subsequent treatment and thus without having to use incompatible models (see Appendix).

Indeed, the fact that we can estimate ß(1) by g-estimation rather than inverse probability weighting is a second important benefit (in addition to avoiding model incompatibility) of using an SNM rather than a history-adjusted MSM. As we describe in the Appendix, inverse probability weighting estimation requires a "positivity assumption" (13) and is sensitive to the presence of extreme weights, either true or estimated. In contrast, g-estimation does not require a positivity assumption and is much less affected by extreme weights. The Appendix also contains a brief discussion of approaches other than g-estimation to handling model incompatibility and of how incompatible models might be used for goodness-of-fit testing and model selection.

In the absence of model misspecification or confounding by unmeasured factors, both inverse probability weighting estimation of standard or history-adjusted MSMs and g-estimation of SNMs allow one to estimate the effect of a time-varying treatment even when there is time-dependent confounding by time-varying covariates affected by earlier treatment. However, for the reasons discussed above, we would recommend that SNMs rather than history-adjusted MSMs be the routine model choice for investigation of effect modification by evolving time-varying covariates. Nevertheless, we also encourage comparison of the results obtained with SNMs to those obtained with history-adjusted MSMs to deepen our understanding of and experience with these new models. Only through such comparisons will we learn whether incompatible estimates occur with history-adjusted MSMs frequently enough to be of concern.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 
Here we prove a number of results mentioned in the main text as well as briefly touch on certain more advanced issues. Our discussion is restricted to structural mean models, that is, models for the conditional mean of a counterfactual outcome.

Estimation of the parameters of marginal structural models (MSMs)
Throughout we use the following notational conventions. Capital letters such as L(m) refer to random variables, that is, a variable which can take on different values for different study subjects. Small letters such as l(m) refer to the possible values of L(m). Overbar variables with a time t in parentheses denote the history of the variable from 0 to t, and overbars without parentheses denote the entire covariate history, that is, Formula (t) = {A(0), ..., A(t)} and Formula = Formula (K). In addition, we use underbars to denote future values of a variable in the following way: A(m, t) = {A(m), A(m + 1), ..., A(t – 1), A(t)} is the A-history from time m through time t and A(m) = A(m, K) is a subject's treatment history from m to the end of the study. Similarly, we let a(m, t) and a(m) = a(m, K) denote a possible treatment history from m to t and from m to K, respectively. By convention, a(m) = 0 denotes either no treatment or a standard treatment at time m. Thus, the history a(m, t) = 0(m, t) stands for the history "withhold treatment from m through t" or "receive the standard treatment from m through t."

Let H(m) = {Formula (m), Formula (m 1)} be the entire covariate and treatment history prior to receiving treatment A(m), and let V(m) be a subvector of H(m) = {Formula (m), Formula (m – 1)} that is of interest as an effect modifier. We may sometimes choose V(m) to be all of H(m). Let Ya(m)(t) be a subject's counterfactual outcome at time t if the subject had received his observed treatment regime Formula (m – 1) up to time m and history a(m) from m onwards.

A standard, univariate MSM models the mean of the counterfactual outcome Ya(m)(t) at time t > m as a function of the possible treatment histories a(m, t – 1) from time m to time t – 1 and the baseline covariates V(m). For example,

Formula
and

Formula
where {theta}* = ({theta}Formula, {theta}Formula) and ß* = (ßFormula, ßFormula) are unknown parameter vectors and

Formula
is the cumulative treatment from the baseline time m through t under regime a(m).

Under the assumption of no unmeasured confounders for the effect of the time-varying treatment A(m), the parameters of the univariate MSM can be estimated by weighted least squares with estimated stabilized inverse probability weights depending on the baseline time m and response time t:

Formula
The parameters of the multivariate model are estimated using a weighted generalized estimating equations program with (m, t)-specific weights Formula (m, t) and with a user-supplied subject-specific working covariance matrix. If one chooses an independence working covariance matrix, the estimate of E[Ya(m)(t)|V(m)] based on the univariate MSM will be exactly equal to that based on the multivariate MSM with time-specific parameters. The same holds true for MSMs with multiple baseline times m.

Model incompatibility in history-adjusted MSMs
To explain why history-adjusted MSMs may be incompatible, we will consider a univariate MSM with t = K + 1 that allows the effect of the treatment a(m) on Y(K + 1) to differ from the effect of later treatments. It will be helpful to decompose E[Ya(m)(K + 1)|V(m)] into the sum of three functions:

  1. The conditional mean of Y(K + 1) when treatment is withheld from m onwards:

    Formula

  2. The direct effect of treatment a(m) on Y(K + 1) when treatment from m + 1 onwards is withheld (i.e., a(m + 1, K) = 0):

    Formula

  3. The effects of treatment a(m + 1, K) from m + 1 onwards (including effects due to interactions with treatment a(m) at m):

    Formula

If we specify parametric models for these three unknown functions, we obtain a model r(K + 1, m, a(m, K), V(m), {theta}*, ß*) for E[Ya(m)(K + 1)|V(m)]. As a concrete example, suppose we specify
  1. r0(K + 1, m, V(m), {theta}*) = {theta}Formula + {theta}Formula V(m);
  2. r1(K + 1, m, a(m), V(m), ß(1)*) = ßFormulaa(m) + ßFormulaa(m) x V(m) + ßFormulaa(m)(K m); and
  3. r2(K + 1, m, a(m, K), V(m), ß(2)*) = ßFormula dur[a(m + 1, K)] + ßFormula dur[a(m + 1, K)]V(m) + ßFormula dur[a(m + 1, K)](Km) + ßFormulaa(m)dur[a(m + 1, K)],
where, by definition, dur[a(K + 1, K)] = 0. This model assumes that the main effect of treatment at m and of subsequent cumulative treatment dur[a(m + 1, K)] are modified by the baseline time m only through the linear term (Km). The vector ß(1)* encodes the direct effects of a(m) on Y(K + 1) when treatment from m + 1 onwards is withheld, while the vector ß(2)* encodes all effects of a(m + 1, K) on Y(K + 1), including its effect due to interactions with a(m) encoded in the parameter ßFormula.

A given treatment regime g(m) = {gm{h(m)}, gm +1{h(m + 1)}, ..., gK{h(K)}} has a subject follow her observed treatment history up to m and then, at each time j ≥ m, determines her treatment dose at j by the value of a given function gj{h(j)} of past treatment and covariate history h(j). If, for each j ≥ m, gj{h(j)}gives the same value a(j) for all past h(j), we can say that the regime g(m) is nondynamic and write the regime as a(m) = {a(m), ..., a(K)}, as in the main text. Otherwise, the regime is dynamic. The following is an immediate consequence of theorem 4 in the paper by Robins (12).

Theorem 1.
Suppose the sequential randomization assumption holds (i.e., there are no unmeasured confounders) for all m and that all treatments a(m) are coded as nonnegative. Suppose that for each m the effect of a(m) on the mean of Y(K + 1) is less than 0 when a(m + 1, K) = 0—that is, for all m, H(m):

Formula
Then

Formula
and

Formula
for any regime g(m).

We now discuss the relevance of theorem 1 for the example given in the text. Suppose all levels h(m) of H(m) are coded as nonnegative and V(m) = H(m). It follows from the theorem that the inverse-probability-weighted estimates Formula (1) and Formula (2) in our example are logically inconsistent (in the sense that no actual distribution exists with these parameter values), since all components of Formula (1) being negative and all components of Formula (2) being positive imply that our estimate r1(K + 1, m, a(m), V(m), Formula (1)) of r1(K + 1, m, a(m), H(m)) is negative but our estimate r2(K + 1, m, a(m, K), V(m), Formula (2)) of r2(K + 1, m, a(m, K), V(m)) is positive for all H(m), which contradicts the above theorem.

Furthermore, the last part of theorem 1 implies, as stated in the text, that our negative estimate of Formula (1) means that the regime 0(m) is the estimated optimal regime. We next verify our claim in the text that components of Formula (2) positive and larger in absolute value than those of Formula (1) imply that the regime 1(m), "always take treatment starting at m," is estimated to be preferable to the regime 0(m). Note that

Formula
By the above relation between Formula (2) and Formula (1), our estimate of r2(K + 1, m, 1(m, K), H(m)) is positive and much larger in absolute value than our estimate of r1(K + 1, m, 1, H(m)). It then follows that our estimate of

Formula
is positive.

Finally, we stress that incompatible estimates often pose no difficulty when they cannot result in logically contradictory estimates of substantively important effects. As an example, Robins and Rotnitzky (14) argued that using incompatible models and estimates to construct generalized doubly robust estimators posed no problem, because the models served simply as statistical tools for reducing bias. In contrast, use of incompatible history-adjusted MSMs can be problematic, because they are substantive tools used to estimate treatment effects.

Structural nested models (SNMs) for handling model incompatibility
Standard inverse probability weighting methods require that the positivity assumption f[a(j)|H(j)] > 0 hold for all possible values of a(j) and (essentially) all histories H(j). Even when the positivity assumption holds, the denominator of Formula (m, K),

Formula
can be difficult to model well, can vary greatly between subjects, and can be exceedingly small for certain subjects, particularly when Km is large, the treatment A(j) has many levels or is continuous, and/or there exist many continuous covariates in L(j). As a consequence, the few subjects whose weights Formula (m, K) are largest may have a huge effect on the analysis, leading to decreased precision, finite-sample bias, and often severe large-sample bias because misspecification of a model for f[Aj|H(j)] often disproportionately affects the largest weights (15). (Joffe et al. (5) initially proposed a particular type of history-adjusted MSM as a means of partially surmounting the problem of extreme weights for large K m.) Although there exist a number of ways to partially surmount these problems, such as doubly robust estimation, use of various diagnostics, truncation of extreme weights, etc., none is entirely satisfactory.

In contrast, g-estimation of a structural nested mean model r1(K + 1, m, a(m), H(m), ß(1)*) for the direct effect r1(K + 1, m, a(m), H(m)) of treatment a(m) does not require the positivity assumption and is much less affected by K m being large, the treatment A(j) having many levels or being continuous, and there being many continuous covariates in L(j). First, one does not divide by estimates of f[A(j)|H(j)], so the problem of extreme weights does not exist. In fact, those subjects who would have the most extreme weights and thus cause the most trouble for inverse probability weighting make a much smaller contribution to the g-estimation analysis, thereby causing little trouble. Second, for continuous or many-leveled A(j)'s, one need only model the mean of A(j) given H(j), a much easier task than modeling the entire density function f[a(j)|H(j)]. Third, even if Km is large, one can choose not to model the mean of A(j) given H(j) for large j near K, thereby trading off some loss of precision for better bias control.

Another apparent advantage of estimating r1(K + 1, m, a(m), H(m)) by g-estimation of an SNM rather than inverse probability weighting estimation of an MSM is that no model for r0(K + 1, m, H(m)) is required. However, this advantage is only apparent; Robins (1) describes a modification of an MSM, referred to as a "semiparametric regression MSM," that also does not require a model for r0(K + 1, m, H(m)) and is fitted by inverse probability weighting.

In our example, we took V(m) to be the entire past H(m). When V(m) and H(m) differ, a model for r1(K + 1, m, a(m), V(m)) is referred to as a "marginal structural nested model"; as befits its name, a hybrid of g-estimation and inverse probability weighting estimation is used to estimate the model parameters. (See van der Laan and Robins (16) for details.)

Finally, g-estimation of an SNM, unlike inverse probability weighting estimation of an MSM, has not been possible when the response Y(t) was a dichotomous indicator of disease status, except under the rare disease assumption. Hence, history-adjusted MSMs might be preferred to SNMs for nonrare dichotomous responses. However, recent work by van der Laan et al. (17) and Richardson and Robins (T. Richardson and J. Robins, Harvard School of Public Health, unpublished data) holds the promise that, in the near future, g-estimation of SNMs may be extended to cover nonrare dichotomous responses.

We now describe how, after obtaining a g-estimate Formula (1) of the parameter ß(1)* of an SNM r1(K + 1, m, a(m), H(m); ß(1)*), we can use Monte Carlo simulation to estimate E[Yg(m)(K + 1)] for any g(m) without having to model r2(K + 1, m, a(m, K), H(m)). First we estimate E[Y(0(m))(K + 1)] by the sample average of

Formula
Then we estimate E[Yg(m)(K + 1)] as follows.

  1. First, for k = m, ..., K, fit a parametric model for f[l(k)|Formula (k–1),Formula (k– 1)] to the data and let Formula [l(k)|Formula (k – 1), Formula (k – 1)] denote the estimate of f[l(k)|Formula (k – 1), Formula (k – 1)] under the model.
  2. Do the following for v = 1, ..., V, with V selected to be very large:
    a) Choose hv(m) = Formula v (m), Formula v(m 1) to be the value of H(m) for a subject randomly drawn from the n study subjects.
    b) Recursively for k = m + 1, ..., K, draw lv(k) from Formula [l(k)|Formula v(k – 1), Formula v(k – 1)] with the treatment history from m to k 1 determined by the regime g(m).
    c) Let Formula g(m),v = {sum}Formula r1(K + 1, j, av(j), hv(j), Formula (1)).

  3. Let Formula [Yg(m)(K + 1)] = Formula [Y(0(m))(K + 1)] + {sum}Formula Formula g(m),v/V be the estimate of E[Yg(m)(K + 1)].

The above approach is based on theorem 4 in the paper by Robins (12). Alternative approaches to the estimation of E[Yg(m)(K + 1)] for both dynamic and nondynamic regimes, based on other recent extensions of MSMs and SNMs, have been developed by Orellana et al. (18), van der Laan et al. (10), Murphy et al. (19), and Robins (20).

Alternative approaches to handling model incompatibility
We now discuss alternative approaches to handling model incompatibility and consider their possible application to history-adjusted MSMs.

Saturated models.
If an incompatible model is saturated, one will never obtain incompatible parameter estimates. Thus, in the context of our linear probability example, if we fit the saturated incompatible model

Formula
where Ik is a dummy variable that takes the value 1 if X = k and 0 otherwise, our estimates of Pr(D = 1|X) are guaranteed to lie between 0 and 1 (although the associated confidence intervals may contain incompatible values). Unfortunately, because the data in realistic longitudinal studies are sparse and high-dimensional, fitting saturated history-adjusted MSMs is not possible. Therefore, possibly misspecified nonsaturated models must be used.

Replacing models with approximations.
All models are incorrect. Van der Laan et al. (6) argue that it is therefore more honest to redefine ß(1)* and ß(2)* in the history-adjusted MSM of our example to be the limits of the inverse-probability-weighted estimates Formula (1) and Formula (2) as the sample size goes to infinity. They then view r1(K + 1, m, a(m), V(m); ß(1)*) and r2(K + 1, m, a(m, K), V(m); ß(2)*) as approximations of, rather than models for, r1(K + 1, m, a(m), V(m)) and r2(K + 1, m, a(m, K), V(m)). From this point of view, since there are no models, there is no possibility of model or parameter incompatibility. Thus, neither ß(1)* and ß(2)* nor Formula (1) and Formula (2) can be incompatible.

Our difficulty with this approach is that it does nothing to solve our problem; it simply sweeps the problem under the rug. In the context of our example, a health-care provider remains without a clue as to a reasonable treatment strategy, since she can still deduce from theorem 1 that it is logically impossible for both r1(K + 1, m, a(m), V(m); Formula (1)*) to be a good approximation of r1(K + 1, m, a(m), V(m); ß(1)*) and r2(K + 1, m, a(m, K), V(m); Formula (2)*) to be a good approximation of r2(K + 1, m, a(m, K), V(m)).

Exploiting incompatible models for goodness-of-fit (GOF) testing and model selection.
We say that a model indexed by a parameter vector {eta} is correctly specified if there is a true (and therefore compatible) value {eta}* of {eta} under which the data were generated. All saturated models are correctly specified. In contrast to a saturated model, if one fits a correctly specified incompatible model that is not saturated, one may obtain incompatible parameter estimates; however, a 1 – {alpha} confidence interval for {eta}* must include the true compatible parameter vector {eta}* and, thus, a compatible parameter value with probability at least 1 – {alpha}. Therefore, we can perform a valid (albeit conservative) {alpha}-level GOF test of the null hypothesis that an incompatible model is correctly specified by rejecting the null hypothesis whenever a 1 – {alpha} confidence interval for {eta}* fails to contain a compatible parameter value {eta}. If the GOF test accepts, we accept the null hypothesis of correct specification, and the set of compatible parameter values {eta} in the 1 – {alpha} confidence interval for {eta}* forms a 1 – {alpha} confidence set for {eta}*.

If, as would be the case in our example with {eta}* = (ß(1)*, ß(2)*), our GOF test rejects, we enlarge our model by increasing the dimension of {eta}*—for example, by adding quadratic interactions with time, ßFormulaa(m)(Km)2 and ßFormuladur[a(m + 1, K)](Km)2—and then testing whether the enlarged model fits. If not, we continue enlarging until we finally have a model that fits, and we report the set of compatible values of the enlarged parameter {eta} contained in the 1 – {alpha} confidence interval for the enlarged {eta}* as a 1 {alpha} confidence set for {eta}*. The actual coverage of these intervals would not be 1 – {alpha}, but appropriate corrections could be worked out. Furthermore, one needs an algorithm for finding the set {eta} of compatible values in a given 1 – {alpha} confidence interval for {eta}*, which is a highly nontrivial problem. In addition, the power properties of this procedure are almost certainly poor.

With much additional work, it is conceivable that this GOF-testing-based model selection strategy might someday become, in certain settings, a viable alternative to the strategy of using SNMs rather than history-adjusted MSMs. A naive reader might think the strategy based on GOF testing even has certain advantages over the use of SNMs, since, if the model r1(K + 1, m, a(m), V(m); ß(1)*) is badly misspecified, the GOF approach might detect such misspecification while the most straightforward use of g-estimation will not.

However, if one really wishes to perform a GOF test of the model r1(K + 1, m, a(m), V(m); ß(1)*) with V(m) = H(m), GOF tests based on g-estimation of enlargements of the model should be more efficacious and powerful than the above inverse-probability-weighting-based GOF test of compatibility of the model r1(K + 1, m, a(m), V(m); ß(1)*) with the model r2(K + 1, m, a(m, K), V(m); ß(2)*), since the latter model may itself be badly misspecified. Thus, we are skeptical that the use of incompatible models for GOF testing and model selection will prove beneficial.


    ACKNOWLEDGMENTS
 
This work was supported by National Institutes of Health grants R37-AI032475 and R01-HL080644.

Conflict of interest: none declared.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 STANDARD MARGINAL STRUCTURAL...
 STANDARD VERSUS HISTORY-ADJUSTED...
 MODEL INCOMPATIBILITY IN HISTORY...
 STRUCTURAL NESTED MODELS VERSUS...
 APPENDIX
 References
 

  1. Robins JM. Marginal structural models. In: In: 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science (1998) Alexandria, VA: American Statistical Association. 1–10.
  2. Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Statistical models in epidemiology: the environment and clinical trials—Halloran E, Berry D, eds. (1999) New York, NY: Springer-Verlag. 95–134.
  3. Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of human immunodeficiency virus-positive men. Epidemiology (2000) 11:561–70.[CrossRef][Web of Science][Medline]
  4. Petersen M, Deeks S, Martin J, et al. History-adjusted marginal structural models for estimating time-varying effect modification. Am J Epidemiol (2007) 166:985–93.[Abstract/Free Full Text]
  5. Joffe M, Santanna J, Feldman H. Partially marginal structural models for causal inference. (Abstract). Am J Epidemiol (2001) 153(suppl):S261.
  6. van der Laan MJ, Petersen ML, Joffe MM. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. Int J Biostat (2005) 1. article 4. (Electronic article). (http://www.bepress.com/ijb/vol1/iss1/4).
  7. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology (2004) 15:615–25.[CrossRef][Web of Science][Medline]
  8. Hernán MA, Brumback B, Robins JM. Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures. Stat Med (2002) 21:1689–709.[CrossRef][Web of Science][Medline]
  9. Robins JM, Hernán MA, Siebert U. Effects of multiple interventions. In: Comparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors—Ezzati M, Lopez AD, Rodgers A, et al, eds. (2004) Vol II. Geneva, Switzerland: World Health Organization. 2191–230.
  10. van der Laan MJ. Causal effect models for intention to treat and realistic individualized treatment rules. In: (U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 203) (2006) Berkeley, CA: Division of Biostatistics, School of Public Health, University of California, Berkeley. (http://www.bepress.com/ucbbiostat/paper203).
  11. Robins JM. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Health service research methodology: a focus on AIDS—Sechrest L, Freeman H, Mulley A, eds. (1989) Washington, DC: National Center for Health Services Research, US Public Health Service. 113–59.
  12. Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat (1994) 23:2379–412.
  13. Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health (2006) 60:578–86.[Abstract/Free Full Text]
  14. Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, "Inference for semiparametric models: some questions and an answer." Stat Sinica (2001) 11:920–36.
  15. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc (1995) 90:106–21.[CrossRef][Web of Science]
  16. van der Laan M, Robins JM. Unified methods for censored and longitudinal data and causality (2003) New York, NY: Springer Verlag.
  17. van der Laan MJ, Hubbard AE, Jewell NP. Estimation of treatment effects in randomized trials with noncompliance and a dichotomous outcome. In: (U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 157) (2004) Berkeley, CA: Division of Biostatistics, School of Public Health, University of California, Berkeley. (http://www.bepress.com/ucbbiostat/paper157).
  18. Orellana L, Rotnitzky A, Robins JM. Generalized marginal structural models for estimating optimal treatment regimes. In: (Technical report) (2006) Boston, MA: Department of Biostatistics, Harvard School of Public Health.
  19. Murphy SA. Optimal dynamic treatment regimes. J R Stat Soc B (2003) 65:331–66.[CrossRef]
  20. Robins JM. Optimal structural nested models for optimal sequential decisions. In: Proceedings of the Second Seattle Symposium on Biostatistics—Lin DY, Heagerty P, eds. (2004) New York, NY: Springer Publishing Company.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?

Related articles in Am. J. Epidemiol.:

History-adjusted Marginal Structural Models for Estimating Time-varying Effect Modification
Maya L. Petersen, Steven G. Deeks, Jeffrey N. Martin, and Mark J. van der Laan
Am. J. Epidemiol. 2007 166: 985-993. [Abstract] [FREE Full Text]  



This article has been cited by other articles:


Home page
Am J EpidemiolHome page
R. W. Platt, E. F. Schisterman, and S. R. Cole
Time-modified Confounding
Am. J. Epidemiol., September 15, 2009; 170(6): 687 - 694.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
M. L. Petersen and M. J. van der Laan
Petersen et al. Respond to "Effect Modification by Time-varying Covariates"
Am. J. Epidemiol., November 1, 2007; 166(9): 1003 - 1004.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
166/9/994    most recent
kwm231v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Related articles in Am. J. Epidemiol.
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Robins, J. M.
Right arrow Articles by Rotnitzky, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Robins, J. M.
Right arrow Articles by Rotnitzky, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?