Skip Navigation


American Journal of Epidemiology Advance Access originally published online on May 17, 2006
American Journal of Epidemiology 2006 164(3):272-281; doi:10.1093/aje/kwj180
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
164/3/272    most recent
kwj180v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Localio, A. R.
Right arrow Articles by Norman, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Localio, A. R.
Right arrow Articles by Norman, S. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology Copyright © 2006 by the Johns Hopkins Bloomberg School of Public Health All rights reserved; printed in U.S.A.

Practice of Epidemiology

Measuring Screening Intensity in Case-Control Studies of the Efficacy of Mammography

A. Russell Localio1, Lan Zhou1,2 and Sandra A. Norman1

1 Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, PA
2 Department of Statistics, College of Science, Texas A&M University, College Station, TX

Correspondence to Dr. A. Russell Localio, 606 Blockley Hall, 423 Guardian Drive, University of Pennsylvania, Philadelphia, PA 19104-6021 (e-mail: rlocalio{at}cceb.med.upenn.edu).

Received for publication March 19, 2005. Accepted for publication January 31, 2006.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
Of great interest in studies of screening for breast cancer is the relative efficacy of different screening frequencies (intensities). Prior work has suggested that estimates of the association between screening intensity and outcome in case-control studies would not produce valid results and that only binary indicators (no screens vs. one or more) of exposure can be used. Using case-control studies drawn from simulated cohorts of 30,000–40,000 women, the authors found that biases demonstrated in prior studies can be explained by 1) misclassification of true exposure groups by observed screening history, and 2) differential exposure misclassification of cases and controls. Binary as well as ordered categorical and interval measures can be biased unless they account for misclassification. By combining measurements of screening history from multiple periods of observation of varying lengths and using repeated-measures logistic regression models, the effect of screening intensity can be estimated in the presence of misclassification. Assessing the effect of screening intensity in case-control studies of mammography is possible if principles and methods for misclassification and measurement error guide the analysis.

bias (epidemiology); case-control studies; computer simulation; mammography


Abbreviations: DPP, detectable preclinical phase; REW, retrospective exposure window


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
Case-control studies have long been viewed as potential designs for assessing the efficacy of screening for breast cancer. The disease is rare, with an incidence of about 4 per 1,000 person-years of follow-up (1Go), and the outcomes used to assess efficacy, death, or a diagnosis at a late stage are even rarer. Thus, randomized controlled trials or cohort studies of sufficient power must be large and have a long follow-up. Challenges in the design of case-control studies are numerous, and potential biases abound (2Go).

Several authors (3Go–7Go) maintain that these studies cannot use screening history to measure the effect of screening intensity on outcome but must consider only a mammogram that might have some benefit. A beneficial screen, they argue, can occur only during a period beginning when cancer is detectable with screening and ending when disease is apparent with a clinical examination. Therefore, screening prior to the start, or after the close, of this "detectable preclinical phase" (DPP), or the "sojourn time," should not be counted in analyzing the efficacy of screening. For the controls, for whom there is no true DPP, the exposure of interest is the presence of a mammogram during a comparable period of observation. Weiss and Etzioni conclude that

comparison of cases and controls for a history of annual, less-frequent, and no screening – will not [emphasis in original] produce a valid result because: (1Go) such an approach would include consideration of tests done before the presence of the occult tumor (or premalignant detectable condition) tests that could not have been of any benefit: and (2Go) the number of screening tests done during that period of time that occult tumors typically are present will almost always differ between cases and controls even if no effective treatment for early disease is available. For a test of high sensitivity, the cases will be screened only once, because the test will identify the tumor at that time. Controls, on the other hand, could be screened multiple times (because the large majority of them will not have the cancer in question), producing the spuriously low odds ratio associated with multiple (or "regular") screening. (8Go, p. 715)

This position, if true, would invalidate case-control studies that seek to assess the potential benefit of more- as contrasted with less-intensive screening.

Our investigation proposes an alternative theoretical framework that rests on estimating screening frequency rather than DPP duration. To achieve this end, we rely on principles of exposure misclassification and measurement error for using the observed numbers of screens to measure the unobserved intensity of screening. To demonstrate an empirical basis for this framework, we simulated case-control studies nested within a dynamic cohort.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
The theoretical framework
Our theoretical framework assumed three principles: 1) In a dynamic cohort, and a case-control study derived from it, the intensity of screening for any person can be estimated by looking backward for a given period from an "index" time and counting the frequency of screening mammograms. 2) Women with higher frequencies of screening will, on average, have a greater probability of undergoing screening mammography during the DPP and earlier in the DPP than women with lower frequencies. 3) Screens can be self-reported with good accuracy and/or confirmed by obtaining actual mammographic history (9Go). Appendix table 1 summarizes these principles and their consequences.

As in much epidemiologic research, in our study the unobserved "exposure" of interest, the intensity of screening, is measured with error by using a woman's screening history. The outcome of interest is the disease endpoint to be avoided: detection of cancer at a late stage, as in this investigation, or death. Early detection opens the opportunity for treatments that potentially prolong survival (10Go), and early detection depends on the probability that a screen will occur before the onset of late-stage disease. For the sake of clarity in describing the simulations, we define "late-stage" cancer as palpable tumors that might be detectable upon clinical examination by a clinician, but our methods and results would apply to other definitions. This probability of a screen before late-stage disease occurs in turn depends on the duration of the DPP for an individual woman, the rate of disease progression, and the sensitivity of screening technology. The large box in the middle of figure 1 represents the DPP—starting with the time at which disease is detectable by screening by perfectly reliable radiologists using ideal imaging technology and ending with the time at which disease is detectable by clinical examination. The right-pointing arrows reflect the progression of disease, from the start of the DPP to the end of the arrow. Progression to late stage can occur during or after the end of the DPP. During this DPP, cancers might be missed because the sensitivity of mammography is less than perfect. At the top of the figure, the left-pointing arrows reflect the screening history observed during the "retrospective exposure window" (REW), starting at the time of diagnosis of disease for the cases and terminating at a fixed point in the past. The vertical ticks represent screens.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
 
FIGURE 1. Theoretical relation of mammography screening history (top arrows), the detectable preclinical phase ((DPP), represented by the large box in the middle), and onset of late-stage disease (bottom arrows) for use in simulations. Person A has no screens (lack of vertical ticks), and her cancer is detected by clinical examination at the end of the DPP. She will have late-stage cancer only if onset of late stage is early (right end of arrow D) rather than late (right end of arrow E). Person B has infrequent screening. Her cancer is detected by screening if she has a mammogram during the DPP. Her cancer will be late stage if onset is rapid (arrow D) and will be early stage if onset is slow (arrow E). Person C has frequent screening. The probability that she will have a screen early in the DPP is greater than for person B. Her cancer will be discovered by screening before the onset of late-stage disease (arrows D and E). Mammography history is estimated from the point of diagnosis backward a set time for each woman (backward-pointing arrows).

 
As depicted by arrow C in figure 1, a woman with a high screening intensity, compared with women with lower screening intensities (arrow B in figure 1), will have an increased probability of having a screen within her DPP and early after the onset of disease, regardless of the exact start of her DPP. At one extreme, a woman who never undergoes mammography has a zero probability of early detection during the DPP. Her cancer is identified at the end of the DPP by clinical examination, while her cancer stage at diagnosis depends on the rate of progression during the span of the DPP (arrow A in figure 1). At the other extreme, if the sensitivity of mammography were 100 percent, and if a woman had daily screens, her cancer would be discovered by screening on the first day of her DPP. The shorter the screening intervals, the more likely that a woman will have a cancer detected before it progresses to late stage. This theoretical construct is inherently probabilistic and requires no knowledge about the actual start or end of the DPP.

Simulating the cohort
Although Hosek et al. (11Go) used a deterministic (algebraic) analysis of some issues in screening, we found, as have others (12Go), that the complex interrelations of factors necessitated probabilistic simulations. As with previous investigations (13Go), our simulations began with a synthetic cohort of 30,000 women in three true screening groups or 40,000 in four exposure groups. For simplicity, we ignored age and assumed a constant incidence of breast cancer of 4 per 1,000 person-years of follow-up (0.33 per 1,000 person-months), the approximate incidence for US women aged 50 years or older (1Go).

Each month, a woman developed cancer if a uniform random number (U(0,1)) drawn for that person fell below 0.00033. Again for simplicity, we assumed that preclinical detection, at the start of the DPP, could occur at the onset of cancer. The duration of the DPP was normally distributed with a stipulated mean and standard deviation. For each successive month, each woman had a probability of developing late-stage disease that followed a beta distribution (~ß(5,2)) (14Go). Given this distribution, if all patients progressed to late-stage disease within 30 months, for example, then the mean time to progression would be 21.4 months. Sensitivity of screening mammography was assumed to be fixed in simple simulations. Alternatively, we assumed a sensitivity of 0.65 at the start of the DPP, increasing on the odds scale by 3 percent per month to average 0.80 by the end of the DPP. This range agrees with recent summaries (15Go). Any cancer not discovered by screening during the DPP window was detected at the end of the DPP by clinical examination. The model assumed that all cancers progress to late stage.

True screening intervals in months were simulated by using a random draw from a normal distribution with a stipulated group mean and standard deviation. This interval could vary over time for each woman. The first-ever screen was assumed to begin during a month drawn from a uniform distribution bounded by zero, the beginning of the cohort, and the mean number of months in the screening interval for the group.

As noted in previous work, bias arises when patients are newly assigned to a cohort and are followed prospectively (10Go), because the start of screening can pick up cancers that have accrued undetected. To avoid this bias, we first allowed the cohort to run for 60 months before beginning to assess the efficacy of screening for an additional 60 months. The simulation therefore mimicked a sample drawn from a dynamic cohort as in an observational study rather than from a randomized controlled trial, in which all subjects share a common starting point.

Drawing the case-control sample
At each iteration of the simulation, cases (women with late-stage disease) and matched controls (women at risk of becoming a case at the time of selection) were drawn by using incidence density sampling from the cohort beginning at month 61 (16Go–18Go). The "index" time became the month of diagnosis of late-stage disease for the cases and the month of selection for the matched controls.

Estimating "exposure"
Each simulation stipulated a fixed REW. The number of screens during this REW became the observed exposure measure. By running the cohort for 60 months before beginning the selection of women, we ensured the potential for a complete, unbounded period of looking backward for all patients.

As Sasco et al. (19Go) described, the appropriate screening history for cases includes the time up to the actual diagnosis, but not after. Although some authors (20Go) count the screen that led to the diagnosis in arriving at a total number of screens for a woman, Hosek et al. (11Go) criticized this convention because the exposure measurement (number of screens) is related to case status. For cases, counting screening mammograms that produced the cancer diagnosis would understate screening efficacy (odds ratio biased toward the null), while failing to count this last screening mammogram would bias results away from the null. We tested this theory and adjusted for this source of bias using the rationale outlined in appendix 2, by counting only one half for the screening mammogram that detected cancer.

Statistical analysis
True incidence rate ratios in the cohort for comparing the screening groups with the reference group (no screening) were estimated by using Poisson regression with follow-up time as an offset (16Go). For case-control studies with incidence density sampling, the odds ratio estimates this rate ratio, a relation our simulations confirmed (18Go). We used logistic regression to estimate the odds ratio between groups defined by different combinations of number of screens during the stipulated REW, with zero screens serving as the reference group. These exposure groups included both binary classifications (any screen during the REW) and more complex ordered categories of counts of screens. Conditional logistic regression for matched cases and controls produced the same results. We judged the presence of bias by comparing the odds ratio in the case-control study with the incidence rate ratio of the true exposure and outcome in the cohort.

Finally, we created for each woman 10 REWs incremented by 3-month intervals from 21 through 48 months; for each, we computed screening intensity as the rate of screens per 36 months. For example, one screen observed in an REW of 24 months equaled 1.5 screens per 36 months. Programs for simulation, sampling, and analysis were written in Stata version 8.2 software (Stata Corporation, College Station, Texas).

We proceeded from simple simulations with no random variation within or among individuals to more complex studies incorporating random variation. Each time, we assessed the extent and direction of bias of estimates of the incidence rate ratio relating screening exposure and diagnosis at late stage.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
The first simulation determined whether, under ideal circumstances, the counts of screening mammograms in the woman's REW could yield unbiased results. When the DPP, the months between screens, and the sensitivity of mammography were assumed to be fixed in case-control studies generated from cohorts of 30,000 women, a logistic regression model produced unbiased estimates of the association of screening and outcome (table 1). In this ideal case, the observed measure of screening intensity in the REW defined perfectly the true categories of mammography screening in the cohort. As a result, the odds ratios from the regression model using categorical variables for none, one, or two or more screens were unbiased estimates of the true incidence rate ratios. The alternative model 2, with a single binary predictor (one or more screens observed during the REW vs. none), produced estimates between those for one and two screens. Moreover, when the true DPP was not equal to the REW (24 months for all cases), the odds ratios from the case-control study were unaffected. Thus, under ideal circumstances, measures of screening intensity are not only possible but also unbiased, and an accurate guess of the unobserved DPP is therefore not essential.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Impact of duration of the DPP,* TLS,* and REW* in months on the true IRR* and the observed OR* of mammography screening and the development of late-stage breast cancer: results of 100 simulations of a case-control study nested within a cohort of 30,000 women

 
Even under an assumption of perfect screening sensitivity, more than one screen can be observed for a case during an REW when she could have only a single mammogram during her actual DPP (table 2). Thus, contrary to prior characterizations of the challenges of measuring screening intensity, both cases and controls can have multiple screens counted during an REW that is chosen to correspond to the actual average DPP. In a more realistic simulation designed to parallel the analysis of Etzioni and Weiss (7Go), we examined the role of exposure misclassification as a source of bias when using a simple binary measure of screening exposure. We estimated the odds ratios for the outcome in two observed groups of women: those with one screen or more during the REW versus those with none (table 3). The simulated cohort consisted of four groups of women with screen frequencies ranging from frequent (on average once per 12-month period) to never. With three simulated DPPs of 18, 30, and 42 months, the observed odds ratios for the binary predictor (one or more observed screens vs. none) were initially high, dropped to a minimum at or near the stipulated DPP, and then either rose to an apparent plateau or remained at the minimum. For comparison, the last column of table 3 reports the odds ratios for the outcome and screening when the screening groups were defined not by the presence or absence of any screens during the REW but by the true, but unobserved groupings: 1) women never screened, and 2) those screened at an average frequency of one or more times per 36 months.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Number of mammography screens among cases during a 24-month REW* and during a comparable DPP*: results from a single simulated case-control study{dagger}

 

View this table:
[in this window]
[in a new window]
 
TABLE 3. Impact of length of the REW* on estimated efficacy of screening with varying DPP* and binary exposure variable (OR*,{dagger} for the association of diagnosis of late-stage disease (case) with the presence of one or more screening mammograms during the REW vs. none): single simulated cohort and case-control study matched 1 to 1 by using incidence density sampling{ddagger}

 
Although the analysis re-created the same pattern of odds ratios observed by Etizioni and Weiss (7Go), it points to a different conclusion. To understand better these U-shaped or J-shaped distributions of observed odds ratios in table 3, we used the same simulated data set and plotted for each DPP the proportion of cases and proportion of controls whose exposure status (being screened or not) was misclassified by the number of screens in the selected REWs (figure 2A–C). As the REW lengthened, the frequency of misclassification declined for both cases and controls because the longer observed screening history better distinguished between women who were never screened and those who had at least some screening. If misclassification rates were equal for cases and controls, bias would be toward the null in the absence of covariates.


Figure 2
View larger version (14K):
[in this window]
[in a new window]
 
FIGURE 2. Variation in the frequency of misclassification of cases and controls according to length of the detectable preclinical phase of cancer (18 months (A), 30 months (B), 42 months (C)) in a simulated case-control study sampled from a cohort of 40,000 patients.

 
In addition to this overall misclassification, however, were differential rates of misclassification for cases and controls. As the plots in figure 2 suggest, the rates of exposure misclassification for controls were unchanged across DPPs, because these nondiseased women had no DPP. By contrast, the length of the DPP did influence the rates of misclassification for cases because, with a longer DPP, the fraction of cancers detected by screening increased. Only those cases who had been screened and whose cancers were discovered by clinical examination rather than by screening could be misclassified. With a short DPP, most cases of disease were discovered at the end of DPP by clinical examination, increasing the chance of misclassification of cases. Conversely, cases determined by screening would always be correctly classified as having been screened because the screen that found the cancer would be counted. Finally, women who were never screened would always be observed with no screens.

Because cases and controls were subject to different rates of misclassification, for some combinations of DPP and REW, the estimated odds ratios were sometimes lower and sometimes greater than the true value of the incidence rate ratio in the cohort. For example, as the DPP increased and when the REW was short, controls were more likely than cases to be misclassified as not having been screened. For longer DPPs, this greater rate of misclassification of controls compared with cases resulted in such upward bias that the odd ratios exceeded 1.0, causing screening to appear harmful. Bias could also be downward, overstating the efficacy of screening. The greatest downward bias in the odds ratio occurred when differential misclassification was the greatest. This point on the plots in figure 2 corresponds to the nadir of the odds ratios in table 3. Thus, rates of misclassification of exposure status, and differential misclassification by case/control status, revealed by the simulation explained the observed patterns of estimated odds ratios.

Measuring intensity of exposure with error
Moving from a simple binary classification of exposure to ordinal classifications of screening intensity, table 4 summarizes the odds ratios for exposure and outcome for different counts of observed numbers of screens according to the length of the REW in a single simulation of a nested case-control study. To reflect true incidence rate ratios in the cohort, the odds ratios should fall (away from the null) with each increase in the number of observed screens. However, when the exposure was measured by using categorical variables, the odds ratios did not always decrease monotonically. At some REWs, the true dose response appeared more consistently. Likewise, with three exposure categories (none, one or two, and three or more observed screens in the REW), the dose response became clear. The binary indicator of screening ("≥1 vs. none") produced nearly unbiased odds ratios across REWs above 24 months or more but did not permit inferences about screening intensity. Finally, the estimate from a logistic regression with number of screens as a linear term (on the logit scale) remained slightly above 0.7 when the REW was 30 months or longer.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Impact of length of the REW* on measurement of screening intensity (odds ratios{dagger} for the association of number of screens with diagnosis of late-stage disease): nested case-control sample from a single simulated data set{ddagger}

 
The fluctuations in estimates with increasing observed numbers of screens occurred in this simulation when the observed categories failed to correspond to actual screening intensity groups. More specifically, an observed count of one mammogram in an REW of 24 months represented a mixture of two groups, for example, one screen per 36 months and one screen per 24 months.

To produce a more accurate measure of screening frequency, the numbers of screens over multiple values of REW were translated into a common metric—the number of screens per 36 months of observation—and the overall estimates of association represented an average over these REWs. Table 5 summarizes the simulations of 100 case-control studies on the effect of screening intensity as modeled by using repeated-measures logistic regression. The five-category, four-category, and three-category models all demonstrated an overall downward trend in the odds ratio, or a dose response with screening intensity.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Association (odds ratios) of screening intensity (categorized no. of screens per 36 months of observation) with diagnosis of late-stage disease, sensitivity of results to choice of observed screening categories: results of 100 simulations*

 
This monotonic decrease in odds ratios was not always found in the simulations we investigated, however, and further adjustment was therefore warranted. We applied a correction of –0.5 to the number of screens per 36 months for the cases whose cancer was discovered by screening (refer to the Materials and Methods section and appendix 2). This correction reduced the fluctuations in the dose-response pattern in the same simulated data sets (table 5), and estimated odds ratios were less biased. Results were comparable in simulations in which DPP equaled 24 months (data not shown).

Finally, when screening intensity was measured as a linear term (on the logit scale), the two options outlined by Hosek et al. (11Go) of counting or not counting a screening mammogram that led to a diagnosis of cancer produced results that were biased toward or away from the null, respectively, as predicted (table 6). Using a correction of –0.5 produced estimates of minimal bias, especially on the odds ratio scale as theory suggested (refer to the Materials and Methods section and appendix 2).


View this table:
[in this window]
[in a new window]
 
TABLE 6. Association (odds ratios) of screening intensity (screens per 36 months of observation measured as an interval variable) with diagnosis of late-stage disease: results of 100 simulations*

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
As Morrison (10Go) noted, a challenge of observational cohort and case-control studies is assessing the exposure category, such as the intensity of screening, from observed data. Our alternative paradigm frees the investigator from hypothesizing about the length of an unobserved DPP or the identity of screens within this DPP (appendix table 1). Rather, we assume that a higher screening frequency increases the probability of screens falling within the DPP and of detection during the early part of that DPP. The key lies in correctly measuring the true screening frequency when intervals can be long and irregular.

The time sequence of screening and the onset of late-stage disease (our outcome of interest) within the DPP is also irrelevant if disease has not yet been diagnosed and screening frequency has not changed. A screen that follows the start of undiagnosed, late-stage disease may, in theory, be counted as long as it contributes to an accurate estimate of screening intensity. The timing of any unobserved event in the development of cancer is irrelevant to observed screening behavior. Only when a diagnosis has occurred, and screening mammography ceases, should the counting cease. Using this paradigm, careful measurement of exposure can lead to valid estimates of association between screening intensity and outcome. Bias can occur if the REW does not capture the frequency of screening accurately. With short windows, there is no hope of measuring true screening intensity. Exposure misclassification, both differential and nondifferential, can occur. For some examples of differential misclassification, cases are more affected than controls. With a sufficiently long REW, and regardless of the DPP, the potential for bias diminishes because the longer REWs are better able to discriminate among actual screening groups.

Our investigation is not without limitations. First, it focused on a particular case-control design for a dynamic cohort of women with an endpoint of late-stage disease. Other, far more complex biologic models would be required to analyze the impact of screening on population mortality (16Go, 21Go). Second, our simulations were deliberately simplified to clarify the sources of and possible solutions for misclassification bias. For example, we assumed that the time to late-stage disease and the DPP were uncorrelated. Third, our simulations assumed that screening intensity remains constant over the retrospective window used to estimate exposure. If screening intensity in the population increases over time, the probability of a mammogram occurring during the DPP, and early in the DPP, increases, whereas estimates of screening intensity using long REWs might understate the current intensity (6Go). Fourth, our simulations assumed potentially effective screening by reason of a moderately long DPP. Measuring the effect of screening intensity will be futile if the true DPP is so short that screening will not be effective.

As we have shown, taking several measurements per person reduces the potential for chance measurement error from a single REW. Furthermore, subtracting a fraction of a screen in counting exposure intensity for women whose cancers were detected by screening can offset a special problem resulting from the link between the definition of a case and the estimation of intensity of exposure. If dose response is not confirmed, then the effects of any screens versus none should be estimated by combining the results from several, longer exposure windows to reduce the degree of misclassification bias. Likewise, for comparing groups of patients rather than estimating the effects of intensity of screening within a group, a binary indicator of the presence or absence of screening might be more stable over a range of REWs.

In summary, misspecification of the actual length of the unobserved DPP does not cause bias in estimates of screening efficacy. Rather, misspecification of the actual screening intensity leads to bias. A change in the paradigm of measuring screening allows for well-designed studies of the association of screening intensity and outcome.


APPENDIX TABLE 1. Summary of the correspondence between alternative theories of measuring the effectiveness of screening mammography


Issue


DPP*-based theory


Alternative theory


1. Estimation of the length of the DPP The DPP must be estimated accurately because a screen conducted before the start of the period is not effective and therefore should not be counted as an exposure. The DPP need not be estimated at all. Screens that occur prior to the start of the DPP, which is not observed, can be counted to estimate the frequency of screening.
2. Screening "exposure"—the unit of measurement of exposure A binary exposure variable: Whether the cases or controls had at least one screen within a period, ending at the "index date,"{dagger} that approximates the length of the DPP. A continuous or ordered categorical variable: The rate of screens per month counted during a retrospective period that is long enough to measure accurately the rate but not so long that it reflects a screening frequency that no longer applies to the period just before the "index date."
3. Number of screens allowable to measure exposure One: If mammography is perfectly sensitive, then cases will have only a single screen during the DPP. The first screen will detect disease. After diagnosis, screening will stop. Any number of screens during an REW* that ends on the index date. The REW need not correspond to the length of the DPP.
4. Screening intensity Not estimable by reason of issue 3. Estimable by using an ordered categorical variable.
5. Principal source of bias Bias in estimating the length of the DPP. Measurement error: The observed rate of screening measures the true frequency with error.
6. Observed bias from simulations Understating the length of the DPP leads to estimates of the odds ratio (of having at least one screen during the assumed DPP and the outcome) that are too high. Underestimating the length of the DPP is equivalent to choosing a retrospective period that is too short to distinguish between two exposure groups: those who are never screened and those who are rarely screened. This measurement error or misclassification leads to the observed bias.
7. Use of regression Regression models would always have a binary factor for the presence or absence of a screen during the "assumed DPP." Regression models can use the number of screens as a categorical or linear term.
8. Among-group comparisons, e.g., pre- vs. postmenopausal

Same as issue 3. The exposure variable is binary and should be selected from an "assumed DPP" that gives the lowest estimate.

If a binary indicator is used to estimate the effect of screening, then REWs long enough to correspond to the length of a typical screening interval (≥24 months) should be used to distinguish between true categories of typical vs. irregular or no screening.

* DPP, detectable preclinical phase; REW, retrospective exposure window.

{dagger} The "index date" for cases is the date of diagnosis; for the controls selected by incidence density sampling, it is the date of diagnosis of the matched case.


    APPENDIX 2
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 
To avoid bias in the estimates of screening intensity, we have recommended, for the purposes of our simulations, subtracting 0.5 from the number of observed mammograms for those cases whose cancer was discovered by screening. The basis for this adjustment lies in the difference between the expected value of the number of screens during a random period of observation and the expected value during a period starting (or stopping) with a given screen.

In general, let r = 0, ..., R represent the screening periods for a given person counting from the index date backward, with r = 0 representing the most recent period. Then, let Sr represent a random variable of the time between screens in the rth period for women being screened with some degree of regularity. Without loss of generality, we assumed for the purposes of this study that Sr ~ N (µ, {sigma}2) is normally distributed, with a mean time between screens = µ and with a standard deviation = {sigma}. Other distributions of this number of months are possible. Then, for an index date fixed by the discovery of cancer in a case, and with retrospective ascertainment of screening frequency, the expected number of screens (n1) observed during a finite retrospective period of observation of length l is given by

Formula A1(A1)
Pr(S0 ≤ l) = 1 because the screen at the time of discovery is observed with certainty. The other terms in the sum depend on the distribution of S and the length l. As the length of the retrospective period of observation l increases, the number of additional screens expected to be observed increases.

For the women identified as having cancer upon clinical examination or for the controls with an index time selected to match that of the cases, the index time is largely independent of the screening cycle. Therefore, the time of the initial screen before the index date occurs at random during the course of the screening cycle. If we assume that the index date is random, the time of the most recent screen follows a uniform distribution. Let m0 represent the number of months from the index date to the most recent screen in the past. Then, for a woman with the same screening frequency as that given in equation A1, the expected number of screens (E(n2)) is given again by equation A1 but with a different value for the initial term: Pr(S0 < l). This initial probability, no longer equal to 1, depends upon the probability of this initial screen occurring in any given month and the expected value of m0. The probability of a screen occurring in a random month, m0, is given by the probability density function of a uniform distribution U(a,b), where a and b represent the endpoints of a time span equivalent to the length of the screening cycle. The probability of the screen occurring in any given month by definition = 1/(ba), the reciprocal of the length of the screening cycle. In the case of a cycle of length 24, the probability of finding the most recent screen in any given month = 1/24. In addition, for a uniform distribution over the interval a, b, the expected value = (a + b)/2. Thus, the expected number of months (m0) from the index date back to this mean observation time is Formula A1 The expected value of the first term Pr(S0 < l) therefore becomes the probability of a screen in each month times the expected number of months to the most recent screen (counting from the index time) = Formula A1

The difference in the expected values of the number of screens observed for two women with identical screening histories, one whose index date is fixed by the discovery of cancer and another whose index date is independent of the screening cycle, depends only on the difference of the first term in equation A1. Therefore, Formula A1 In other words, one can expect that the difference in the observed number of screens for a woman whose cancer was discovered by screening and an identical control is 0.5. Consequently, to permit the same estimate of screening intensity for these two women, 0.5 should be subtracted from the total number of screens for women whose cancer was discovered by screening.

Simulations of the differences in numbers of screens counted during a finite period l were run as follows:

  1. Generate a set of persons for whom the number of months between screenings follows a normal distribution, as in the main simulation.
  2. From a random start, count the number of screens within different periods of retrospective review.
  3. For the same persons, count the number of screens within these periods, but use a fixed index date and start with a count of the number of screens = 1.
  4. Calculate the mean number of screens for the group under the different index dates.
  5. Calculate the difference in these means.
For the 12-month screening interval simulation, the mean difference in numbers of observed screens for the same hypothetical women was 0.48 depending on whether the start of the look-back window (REW) was fixed (by a screening date) or random. For the 24-month interval, the mean difference was 0.46. Thus, we adopted an adjustment of 0.5 to reduce the number of screens for persons whose cancer was discovered by screening when estimating the screening intensity from the number of screens in a given retrospective period of observation.


    ACKNOWLEDGMENTS
 
This simulation study was funded by Centers for Disease Control and Prevention, Division of Cancer Prevention and Control, contract 200-2002-00370 with the University of Pennsylvania (Sandra A. Norman, Principal Investigator).

Conflict of interest: none declared.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX 2
 References
 

  1. Figure IV-4. In: Ries LAG, Eisner MP, Kosary CL, et al, eds. SEER cancer statistics review, 1975–2001. Bethesda, MD: National Cancer Institute, 2004. (http://seer.cancer.gov/csr/1975_2001/results_merged/sect_04_breast.pdf, page 21).
  2. Weiss NS. Application of the case-control method in the evaluation of screening. Epidemiol Rev 1994;16:102–8.[Free Full Text]
  3. Weiss NS, McKnight B, Stevens NG. Approaches to the analysis of case-control studies of the efficacy of screening for cancer. Am J Epidemiol 1992;135:817–23.[Abstract/Free Full Text]
  4. Weiss NS. Case-control studies of the efficacy of screening tests designed to prevent the incidence of cancer. Am J Epidemiol 1999;149:1–4.[Free Full Text]
  5. Weiss NS. Adjusting for screening history in epidemiologic studies of cancer: why, when, and how to do it. Am J Epidemiol 2003;157:957–61.[Abstract/Free Full Text]
  6. Weiss NS, Dhillon PK, Etzioni R. Case-control studies of the efficacy of cancer screening. Overcoming bias from nonrandom patterns of screening. Epidemiology 2004;15:409–10.[CrossRef][Web of Science][Medline]
  7. Etzioni RD, Weiss NS. Analysis of case-control studies of screening: impact of misspecifying the duration of detectable preclinical pathologic changes. Am J Epidemiol 1998;148:292–7.[Abstract/Free Full Text]
  8. Weiss NS, Etzioni R. Estimating the influence of rescreening interval on the benefits associated with cancer screening: approaches and limitations. Epidemiology 2002;13:713–17.[CrossRef][Web of Science][Medline]
  9. Norman SA, Localio AR, Zhou L, et al. Validation of self-reported screening mammography histories among women with and without breast cancer. Am J Epidemiol 2003;158:264–71.[Abstract/Free Full Text]
  10. Morrison A. Screening in chronic disease. New York, NY: Oxford University Press, 1985.
  11. Hosek RS, Flanders WD, Sasco AJ. Bias in case-control studies of screening effectiveness. Am J Epidemiol 1996;143:193–201.[Abstract/Free Full Text]
  12. Boer R, Plevritis S, Clarke L. Diversity of model approaches for breast cancer screening: a review of model assumptions by The Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups. Stat Methods Med Res 2004;13:525–38.[Abstract/Free Full Text]
  13. Connor RJ, Boer R, Prorok PC, et al. Investigation of design and bias issues in case-control studies of cancer screening using microsimulation. Am J Epidemiol 2000;151:991–8.[Abstract/Free Full Text]
  14. Evans M, Hastings N, Peacock B. Statistical distributions. 2nd ed. New York, NY: John Wiley & Sons, 1993:29–37.
  15. Elmore JG, Armstrong K, Lehman CD, et al. Screening for breast cancer. JAMA 2005;293:1245–56.[Abstract/Free Full Text]
  16. Clayton DG, Hills M. Statistical models in epidemiology. Oxford, United Kindgom: Oxford University Press, 1993.
  17. Beaumont JJ, Steenland K, Minton A, et al. A computer program for incidence density sampling of controls in case-control studies nested within occupational cohort studies. Am J Epidemiol 1989;129:212–19.[Abstract/Free Full Text]
  18. Rothman KJ, Greenland S, eds. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven, 1998:95–6.
  19. Sasco AJ, Day NE, Walter SD. Case-control studies for the evaluation of screening. J Chronic Dis 1986;39:399–405.[CrossRef][Web of Science][Medline]
  20. Moss SM. Case-control studies of screening. Int J Epidemiol 1991;20:1–6.[Abstract/Free Full Text]
  21. Feuer EJ, Etzioni R, Cronin KA, et al. The use of modeling to understand the impact of screening on US mortality: examples from mammography and PSA testing. Stat Methods Med Res 2004;13:421–42.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
164/3/272    most recent
kwj180v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Localio, A. R.
Right arrow Articles by Norman, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Localio, A. R.
Right arrow Articles by Norman, S. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?