American Journal of Epidemiology Advance Access originally published online on October 20, 2006
American Journal of Epidemiology 2007 165(1):94-100; doi:10.1093/aje/kwj344
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ORIGINAL CONTRIBUTIONS |
Confidence Intervals for Biomarker-based Human Immunodeficiency Virus Incidence Estimates and Differences using Prevalent Data
1 Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
2 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
Correspondence to Dr. Stephen Cole, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Room E7640, Baltimore, MD 21205 (e-mail: scole{at}jhsph.edu).
Received for publication November 21, 2005. Accepted for publication May 17, 2006.
| ABSTRACT |
|---|
|
|
|---|
Prevalent biologic specimens can be used to estimate human immunodeficiency virus (HIV) incidence using a two-stage immunologic testing algorithm that hinges on the average time, T, between testing HIV-positive on highly sensitive enzyme immunoassays and testing HIV-positive on less sensitive enzyme immunoassays. Common approaches to confidence interval (CI) estimation for this incidence measure have included 1) ignoring the random error in T or 2) employing a Bonferroni adjustment of the box method. The authors present alternative Monte Carlo-based CIs for this incidence measure, as well as CIs for the biomarker-based incidence difference; standard approaches to CIs are typically appropriate for the incidence ratio. Using American Red Cross blood donor data as an example, the authors found that ignoring the random error in T provides a 95% CI for incidence as much as 0.26 times the width of the Monte Carlo CI, while the Bonferroni-box method provides a 95% CI as much as 1.57 times the width of the Monte Carlo CI. Further research is needed to understand under what circumstances the proposed Monte Carlo methods fail to provide valid CIs. The Monte Carlo-based CI may be preferable to competing methods because of the ease of extension to the incidence difference or to exploration of departures from assumptions.
bias (epidemiology); computer simulation; confidence intervals; HIV; incidence; Monte Carlo method; statistics
Abbreviations: AIDS, acquired immunodeficiency syndrome; CI, confidence interval; HIV, human immunodeficiency virus; STARHS, Serologic Testing Algorithm for Recent HIV Seroconversions
| INTRODUCTION |
|---|
|
|
|---|
The average incidence rate or incidence density of a disease is the number of new cases divided by the person-time at risk, or A/PT, where PT is the observed person-time at risk during which A new cases arose (1). Recently, a two-stage immunologic testing algorithm was developed for human immunodeficiency virus (HIV) that uses prevalent biologic specimens to estimate incidence in the following way (2). First, a highly sensitive enzyme immunoassay for HIV type 1 is applied to each specimen. Persons testing positive are considered HIV-infected, while those testing negative are considered HIV-uninfected. Second, a less sensitive or "detuned" enzyme immunoassay is applied to persons found to be positive on the highly sensitive assay. Persons who are positive on the less sensitive assay are considered to have an established HIV infection, while those negative on the less sensitive assay are considered to have a recent HIV infection. This approach is often referred to as the Serologic Testing Algorithm for Recent HIV Seroconversions (STARHS).
Measures of the uncertainty due to random error are typically presented in epidemiologic research in the form of a confidence interval (CI). Two methods have been widely used to estimate the CI for the incidence determined by the two-stage immunologic testing algorithm described above. The "standard" method ignores a component of the variability of the estimated incidence (3, 4) and therefore tends to underestimate the width of the CI to greater degrees as the hidden variability increases, resulting in overestimated precision. The second method, a Bonferroni adjustment of the box method (2, 5, 6), tends to provide an overly wide CI and will therefore underestimate the actual precision available. Here, we present an alternative method for calculation of the CI and introduce methods that allow the comparison of incidence rates across groups.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study population
Janssen et al. (2) tested for HIV type 1 infection in samples taken from 2,717,910 first-time American Red Cross blood donors in 32 collection regions in the United States between May 1993 and December 1996. They used an independent data set pooling 690 specimens from 104 HIV seroconverters (plasma donors (n = 38), patients at a Trinidad sexually transmitted disease clinic (n = 18), and participants in the San Francisco's Men's Health Study (n = 48)) to estimate the number of days between HIV seroconversion as measured by two immunoassays. The average number of days between the two assays was assumed to be normally distributed, with an estimated mean of 129 days and a 95 percent CI ranging from 109 to 149 (2); from this, we calculate a standard error of 10.2 days ([149 109]/3.92) and a coefficient of variation of approximately 8 percent (10.2/129).
Incidence via a two-stage immunologic testing algorithm
The formula for biomarker-based incidence using prevalent data (2) is
|
| (1) |
To gain further intuition, one can also express equation 1 as It=A/PPTt, where PPTt=(NB)(T/365.25) can be viewed as the pseudo-person-time during which A recent cases arose. The average number of days it takes for the less sensitive assay to concur with the highly sensitive assay provides the means by which to impute the unknown person-time. A justification for such pseudo-person-time can be made by borrowing ideas of backwards survival times (8). In the reexpression of equation 1, the B people with established HIV infection should be removed from the denominator of this pseudo-rate because they are not at risk for recent HIV seroconversion. The ratio T/365.25 is the proportion of a person-year that each subject contributes to the pseudo-person-time. One can see that if t = 0, there is no information on incidence because there is no "follow-up," as there is no systematic time lag between the two tests.
Say we have two estimates of It, one for an exposed group I
and one for an unexposed reference group I
. The difference and ratio of these incidence rates are simply ID
=I
I
and IR
=I
/I
, respectively. To our knowledge, CIs for the difference in such STARHS incidence estimates have not been previously described.
Confidence intervals
To correctly estimate a CI for It, one must account for the sources of random error in It. Assuming that N is large and A (i.e., the number of persons with recent HIV) is small, there are two sources of random error in It. First, the number of recent HIV infections A has associated random error that can be assumed to be Poisson-distributed, since A is small relative to N B. Second, the average number of days to concordance of the two immunoassays T has associated random error that may typically be assumed to be normally distributed, perhaps after an appropriate normalizing transformation.
A standard method for estimating the CI ignores the random error in T, treats T as a fixed value, and assumes that A is Poisson-distributed (3). Following the methods of Brookmeyer and Quinn (9) and Janssen et al. (2), we use a chi-squared quantile as an exact estimate of the Poisson distribution (10).
The Bonferroni adjustment (2) of the box method (11) replaces the lower and upper quantiles of
/2 and 1
/2 with
/(2k) and 1
/(2k), respectively, where k is the number of random variables in the estimator. In our example, k = 2 for A and T. This Bonferroni-box method estimates a CI for It accounting for the random error in the estimated T by plugging in the lower and upper 1
/k percent confidence limits (LCL and UCL) for T in place of T, such as
![]() |
![]() |
is the desired confidence, q
is the ath quantile from a chi-squared distribution with b degrees of freedom (10), and t [t+] is the
/(2k) [1
/(2k)] percentile for T (11). The box method can be seen as an intuitive way to account for the uncertainty in T through the use of a range of values in place of T. Viewed this way, the idea is to first calculate a standard CI for T and then use the extremes of this CI as the inputs for a pair of standard CIs for It; the resulting lowest and highest confidence limits are taken as the final CI for It. The box approach without the Bonferroni adjustment has been shown to be conservative across a broad range of scenarios, yielding an average type 1 error of 0.007 when
is set to 0.05 (11); extending the box method by means of the Bonferroni adjustment will widen the interval and further reduce the type 1 error at the expense of precision.
Finally, we implement a Monte Carlo method (12) wherein we draw Tj from a normal distribution with mean T = 129 and standard error 10.2 and calculate the pseudo-person-time PPTj using the reexpression of equation 1, but with Tj replacing T. We also explored standard errors of 0, 30, and 50 for T. Then, we draw Aj from a Poisson distribution with rate A, where A = 15, 24, 12, 18, and 69 across the five strata of our example, respectively. Third, we calculate I
using equation 1, but with Aj replacing A and the pseudo-person-time calculated for simulation draw j. We repeat these three steps J = 105 times. We take the 2.5th and 97.5th percentiles of the resulting distribution of I
as the lower and upper limits of a 95 percent CI. In appendix 1, we present a limited Monte Carlo simulation which supports our use of this Monte Carlo CI. In summary, for the four scenarios explored, the Monte Carlo method provided on average 94.3 percent CI coverage with a target of 95 percent, and the majority of the intervals that failed to cover had an upper limit below the true value of It. Our approach is essentially a parametric bootstrap (see Efron and Tibshirani (13), pages 5356), with the distinction that external data are used for T.
We also present the standard estimates of the CI for the incidence difference and incidence ratio, which both assume that T is fixed and follow from a normal approximation for the incidence difference and a log-normal approximation for the incidence ratio (see Rothman and Greenland (1), pages 238239) rather than employ an exact Poisson distribution. Finally, we also extend the Monte Carlo method to estimate the CI for the difference and ratio, which allows T to be uncertain. Standard methods are appropriate for estimating the incidence ratio when T is assumed to be the same across exposure groups, due to cancellation of errors. In the current analysis, we used SAS, version 9, throughout (SAS Institute, Inc., Cary, North Carolina); SAS program code for implementing the Monte Carlo approach is provided in appendix 2.
| RESULTS |
|---|
|
|
|---|
Table 1 reproduces the data presented by Janssen et al. (2). Our incidence estimates and Bonferroni-box results duplicate those of Janssen et al.'s table 4. Additionally, we present 95 percent CIs based on the standard and Monte Carlo approaches described above. The Bonferroni-box intervals are notably wider than the Monte Carlo intervals calculated with the standard error of T set to the observed value of 10.2. For instance, the Bonferroni-box CIs were 1.551.57 (e.g., 1.57 = [13.75 3.16]/[10.61 3.85]) times wider than the Monte Carlo 95 percent CIs. The Bonferroni-box CIs presented apply only to the observed scenario with a standard error of 10.2 days for T; Bonferroni-box CIs associated with larger standard errors would be wider still.
|
The standard CIs, which assume no random error in T, provide estimates similar to those from the Monte Carlo approach for all five strata of our example. However, if one were to have the same point estimate for T of 129 days but a larger 95 percent CI for T, the shortcomings of the standard approach would start to crystallize. Table 1 also provides the Monte Carlo 95 percent CIs for It assuming that the standard error of T is 30 days or 50 days, yielding 95 percent CIs for T of 70, 188 days and 31, 227 days (rather than the observed 109, 149 days). Specifically, the standard CIs for It do not change under any such perturbation, but the Monte Carlo CI estimates are widened appropriately to account for this increased random error. For instance, the standard 95 percent CIs were 0.260.34 (e.g., 0.34 = [7.87 2.33]/[17.9 1.67]) times narrower than the Monte Carlo 95 percent CIs with a standard error of 50 for T.
Incidence difference
Calculating from table 1, the incidence differences for 1994, 1995, and 1996 were 0.32, 4.72, and 2.33 per 105 person-years in comparison with 1993. With the standard error of T set to the observed value of 10.2, the standard 95 percent CIs for the incidence difference again appeared similar to the Monte Carlo 95 percent CIs. Increasing the standard error of T again clearly demonstrates the problem with the standard CI for the incidence difference. For instance, when the standard error of T is set to 50, the standard 95 percent CIs for the incidence difference remain the same but the Monte Carlo 95 percent CIs for the incidence difference are widened appropriately, yielding standard 95 percent CIs that are 0.490.67 (e.g., 0.67 = [5.55 {6.19}]/[7.88 {9.6}]) times narrower than the Monte Carlo intervals.
Incidence ratio
Calculating from table 1, the incidence ratios for 1994, 1995, and 1996 were 0.97, 0.49, and 0.75 in comparison with 1993. Standard approximate 95 percent CIs for the incidence ratio were 0.51, 1.84; 0.23, 1.04; and 0.38, 1.48, respectively. Monte Carlo 95 percent CIs (irrespective of the variance of T, since T cancels) were 0.51, 1.98; 0.21, 1.07; and 0.37, 1.58, respectively.
| DISCUSSION |
|---|
|
|
|---|
We have presented an alternative method for estimating a CI for incidence as estimated by prevalent data using the STARHS approach, as well as a method for estimating a CI for the STARHS incidence difference. Standard methods are appropriate for estimating the incidence ratio when T is assumed to be the same across exposure groups. The proposed Monte Carlo methods improve upon the standard approach of assuming that T, the average number of days to immunoassay concurrence, is fixed, as well as improve upon the Bonferroni-box method, which appears to overcompensate for the random error in T.
Alternatively, one could implement an analytic approach instead of the Monte Carlo approach proposed here. Brookmeyer (14), for example, placed a gamma prior on T in a Poisson distribution, which yields a compound distribution that is closely related to the negative binomial. In Brookmeyer's method, the resulting distribution is used to perform hypothesis tests, which are then inverted to obtain a CI. The method produces an "exact" CI for the biomarker-based incidence under the specific assumption of a gamma prior to describe the uncertainty in the average time T. The approach can be extended to other priors. For example, if one assumed that both the number of recent infections and the average time T were normally distributed, the ratio would have a closed form (15) and an exact CI would be calculable.
Further research is needed to understand under what circumstances the proposed methods fail to provide valid CIs. For example, the robustness of the proposed Monte Carlo approach to non-normally-distributed times between immunoassay concurrence requires further study; however, if data are available, one can employ standard normalizing transformations before obtaining T and its standard error. Approximate analytic solutions based on Fieller's theorem (11, 16) or the delta method (17) are also available. All of these analytic methods agreed well with our Monte Carlo approach for these example data (data not shown). When assessing incidence ratios, the random error in T cancels and standard approaches agree with Monte Carlo methods, as expected. If T differed for the exposed and the unexposed, the cancellation would probably not occur, and the Monte Carlo method would not necessarily even approximately equal the standard method.
There are advantages of the proposed Monte Carlo approach over analytic approaches (11, 14, 16, 17). First, the Monte Carlo approach can construct a CI for the difference in incidence rates nearly automatically, while it is much more difficult to do so with analytic approaches. Second, the Monte Carlo approach can be extended to explore the sensitivity of the results to assumptions. For example, one underlying assumption of the biomarker-based incidence method is that the average time T is the same in subpopulations, which may not be the case. Further, we assume that A is much smaller than N, so that A can be assumed to be Poisson. However, when A is large relative to N, one may want to assume that A, B, and N A B derive from a multinomial distribution. We also assume that A and T are independent, which is supported by the fact that the data for A and T are obtained from separate sources. Finally, we assume that A and B are measured without error. The Monte Carlo approach may be a useful tool that is relatively easy to implement for examining sensitivity to model assumptions and incorporating sources of uncertainty in biomarker-based incidence estimates.
In conclusion, when estimating incidence using prevalent data and the discordance in biomarker assays as an indicator of recent infection, 1) the Bonferroni-box method for estimating CIs is overly conservative and should be avoided and 2) the standard method of estimating CIs by ignoring the random error in the estimated lag in seroconversion should be avoided in principle and in practice whenever the coefficient of variation for the lag is large. Exactly how large the coefficient of variation must be to affect the resulting CI was not determined here. In our example, a coefficient of variation less than 10 percent had little impact on the resultant CI, but a coefficient of variation over 35 percent had a notable impact on the resultant CI. Given a correct parametric distributional assumption for T, the Monte Carlo approach presented here will typically provide an exact (within chosen simulation error) 1
CI.
| APPENDIX 1 |
|---|
|
|
|---|
Monte Carlo Simulation
To support our use of this Monte Carlo confidence interval (CI), we conducted a limited Monte Carlo simulation study with 5,000 draws from four scenarios. For each of the 5,000 Monte Carlo draws, the proposed method drew J = 10,000 nested Monte Carlo draws for estimation of the CI.
The first scenario mimics the 1993 data presented in table 1, with an incidence of 9.22 per 105 person-years and 15 recent seroconversions among 460,385 people without established human immunodeficiency virus infection. The mean time between immunoassay detections is T = 129 days, with a 95 percent CI of 109, 149, based on a true normal distribution with a standard error (SE) of 10.2 days. The second scenario widens the 95 percent CI to 70, 188, based on a true normal distribution with an SE of 30.
The third and fourth scenarios explore the robustness of the proposed method to non-normally-distributed T's. Specifically, we maintain the first two moments of the distribution of T (i.e., scenario 3: mean = 129, SE = 10.2, and scenario 4: mean = 129, SE = 30) but draw from a right triangular distribution with the mode equal to the minimum and skewness of approximately 0.5 in both scenario 3 and scenario 4 (i.e., the minimum and maximum values for the triangle in scenario 3 were 114.58 and 157.86, respectively, and the minimum and maximum values for the triangle in scenario 4 were 86.64 and 213.92). We also explored the use of log-normal, gamma, and chi-squared distributions but found the triangular distribution to produce the most skewness while maintaining the same first two moments.
For the first scenario (normal, SE = 10.2), the Monte Carlo approach provided a mean incidence of 9.23 ± 0.03 (SE) x 105. The 95 percent CI coverage was 93.8 ± 0.3 percent. The lower miss rate was 1.6 percent, and the upper miss rate was 4.6 percent. For the second scenario (normal, SE = 30), the Monte Carlo approach provided a mean incidence of 9.22 ± 0.03 x 105. The 95 percent CI coverage was 94.2 ± 0.3 percent. The lower miss rate was 1.2 percent, and the upper miss rate was 4.6 percent. For the third scenario (nonnormal, SE = 10.2), the Monte Carlo approach provided a mean incidence of 9.26 ± 0.03 x 105. The 95 percent CI coverage was 94.3 ± 0.3 percent. The lower miss rate was 1.4 percent, and the upper miss rate was 4.3 percent. For the fourth scenario (nonnormal, SE = 30), the Monte Carlo approach provided a mean incidence of 9.26 ± 0.03 x 105. The 95 percent CI coverage was 94.7 ± 0.3 percent. The lower miss rate was 1.2 percent, and the upper miss rate was 4 percent.
In summary, the proposed method worked well for the few scenarios we explored, where the first two moments of the distribution of T are constrained. This constraint is reasonable because at least the first two moments will be provided when calculating a STARHS (Serologic Testing Algorithm for Recent HIV Seroconversions) incidence estimate. The proposed method does appear to provide slightly subnominal coverage on the order of 94.3 percent rather than 95 percent, and there is an imbalance in the lower and upper confidence limit miss rates, which is common to ratio estimators (12).
| APPENDIX 2 |
|---|
|
|
|---|
SAS Macro Code for Implementing Monte Carlo Confidence Intervals for the Incidence, Incidence Difference, and Incidence Ratio
/* Macro to implement Monte Carlo confidence interval for biomarker-based incidence
![]() |
Example invocation:
|
|
![]() |
![]() |
| ACKNOWLEDGMENTS |
|---|
Drs. Stephen Cole and Haitao Chu were supported in part by the National Institutes of Health through the data coordinating centers of the Multicenter AIDS Cohort Study (UO1-AI-35043) and the Women's Interagency HIV Study (UO1-AI-42590).
The authors are grateful to Dr. Glen Satten for helpful comments on an earlier version of this article and to Dr. Swati Gupta for bringing the issue of biomarker-based variance estimation to their attention.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Rothman KJ and Greenland S. (1998) Modern epidemiology 2nd ed. (Lippincott-Raven, New York, NY).
- Janssen RS, Satten GA, Stramer SL, et al. (1998) New testing strategy to detect early HIV-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 280:428.
[Abstract/Free Full Text] - Pilcher CD, Fiscus SA, Nguyen TQ, et al. (2005) Detection of acute infections during HIV testing in North Carolina. N Engl J Med 352:187383.
[Abstract/Free Full Text] - Des Jarlais DC, Perlis T, Arasteh K, et al. (2005) HIV incidence among injection drug users in New York City, 1990 to 2002: use of serologic test algorithm to assess expansion of HIV prevention services. Am J Public Health 95:143944.
[Abstract/Free Full Text] - Schwarcz S, Kellogg T, McFarland W, et al. (2001) Differences in the temporal trends of HIV seroincidence and seroprevalence among sexually transmitted disease clinic patients, 1989 1998: application of the serologic testing algorithm for recent HIV seroconversion. Am J Epidemiol 153:92534.
[Abstract/Free Full Text] - Young CL, Hu DJ, Byers R, et al. (2003) Evaluation of a sensitive/less sensitive testing algorithm using the bioMerieux Vironostika-LS assay for detecting recent HIV-1 subtype B' or E infection in Thailand. AIDS Res Hum Retroviruses 19:4816.[CrossRef][Web of Science][Medline]
- Brookmeyer R, Quinn T, Shepherd M, et al. (1995) The AIDS epidemic in India: a new method for estimating current human immunodeficiency virus (HIV) incidence rates. Am J Epidemiol 142:70913.
[Abstract/Free Full Text] - Allison PD. (1985) Survival analysis of backwards recurrence times. J Am Stat Assoc 80:31522.[CrossRef]
- Brookmeyer R and Quinn TC. (1995) Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests. Am J Epidemiol 141:16672.
[Abstract/Free Full Text] - Ulm K. (1990) A simple method to calculate the confidence interval of a standardized mortality ratio (SMR). Am J Epidemiol 131:3735.
[Abstract/Free Full Text] - Briggs AH, Mooney CZ, Wonderling DE. (1999) Constructing confidence intervals for cost-effectiveness ratios: an evaluation of parametric and non-parametric techniques using Monte Carlo simulation. Stat Med 18:324562.[CrossRef][Web of Science][Medline]
- Greenland S. (2004) Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol 33:138997.
[Abstract/Free Full Text] - Efron B and Tibshirani R. (1993) An introduction to the bootstrap(Chapman and Hall Ltd, London, United Kingdom).
- Brookmeyer R. (1997) Accounting for follow-up bias in estimation of human immunodeficiency virus incidence rates. J R Stat Soc A 160:12740.[CrossRef]
- Hinkley DV. (1969) On the ratio of two correlated normal random variables. Biometrika 56:6359.
[Abstract/Free Full Text] - Fieller EC. (1954) Some problems in interval estimation. J R Stat Soc B 16:17585.
- Cox C. (1998) Delta method. In Armitage P and Colton T (Eds.). Encyclopedia of biostatistics(John Wiley & Sons, Inc, New York, NY) pp. 11257.
This article has been cited by other articles:
![]() |
S. R. Cole and H. Chu RE: "CONFIDENCE INTERVALS FOR BIOMARKER-BASED HUMAN IMMUNODEFICIENCY VIRUS INCIDENCE ESTIMATES AND DIFFERENCES USING PREVALENT DATA" Am. J. Epidemiol., October 1, 2007; 166(7): 861 - 862. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





