American Journal of Epidemiology Advance Access originally published online on November 16, 2006
American Journal of Epidemiology 2007 165(3):325-333; doi:10.1093/aje/kwk011
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ORIGINAL CONTRIBUTIONS |
Receiver Operating Characteristic Curve Inference from a Sample with a Limit of Detection
1 Division of Epidemiology, Statistics, and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD
2 Department of Mathematics and Statistics, American University, Washington, DC
Correspondence to Dr. Enrique F. Schisterman, Division of Epidemiology, Statistics, and Prevention Research, National Institute of Child Health and Human Development, 6100 Executive Boulevard, Bethesda, MD 20852 (e-mail: schistee{at}mail.nih.gov).
Received for publication February 2, 2006. Accepted for publication June 20, 2006.
| ABSTRACT |
|---|
|
|
|---|
The receiver operating characteristic curve is a commonly used tool for evaluating biomarker usefulness in clinical diagnosis of disease. Frequently, biomarkers being assessed have immeasurable or unreportable samples below some limit of detection. Ignoring observations below the limit of detection leads to negatively biased estimates of the area under the curve. Several correction methods are suggested in the areas of mean estimation and testing but nothing regarding the receiver operating characteristic curve or its summary measures. In this paper, the authors show that replacement values below the limit of detection, including those suggested, result in the same biased area under the curve when properly accounted for, but they also provide guidance on the usefulness of these values in limited situations. The authors demonstrate maximum likelihood techniques leading to asymptotically unbiased estimators of the area under the curve for both normally and gamma distributed biomarker levels. Confidence intervals are proposed, the coverage probability of which is scrutinized by simulation study. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods.
censoring; curve estimation; detection; maximum likelihood; receiver operating characteristic
Abbreviations: A, area under the curve; LOD, limit of detection; MLE, maximum likelihood estimate; p(c), specificity; q(c), sensitivity; RMSE, root mean square error; ROC, receiver operating characteristic
| INTRODUCTION |
|---|
|
|
|---|
Biomarkers are of ever increasing importance to the diagnosis and treatment of existing and emerging diseases. The magnitude of this importance is directly linked to the ability of a researcher to identify and demonstrate an association between biomarker levels and disease status. For example, diseased individuals may have higher levels of a biomarker than do nondiseased individuals. Therefore, the biomarker could be used as an indicator for disease. Levels above some cutpoint result in individuals being labeled as diseased and individuals with levels below the cutpoint being labeled as nondiseased.
For a cutpoint c, sensitivity (q(c)) is the probability of individuals having a positive test result given that he or she is truly diseased, and specificity (p(c)) is the probability of a negative test result given that he or she is truly healthy/nondiseased. Unless the biomarker perfectly identifies disease status, moving c up or down results in a tradeoff of increased sensitivity for decreased specificity or visa versa.
The receiver operating characteristic (ROC) curve, a mapping of sensitivity as a function of 1 specificity, is a frequently used device that facilitates easy comparison of the overall effectiveness, effectiveness at specified levels of false positives rates, and optimal diagnostic ability of different biomarkers (13). These comparisons take place through the summary measures, the most common of which is the area under the curve (A) (16). The curve itself originates at the point (0, 0), terminates at (1, 1), and is formed by plotting pairs of 1 specificity and sensitivity across all possible cutpoints. A curve that hugs the diagonal is indicative of a marker with little ability to classify disease status, as the diagonal indicates pure chance at any cutpoint.
The summary measure A reflects how well the biomarker performs across all cutpoints. A ranges from 0.5 (no better than chance) to 1 (perfectly determining disease status). An extensive body of literature exists on ROC curve methods, allowing researchers to better evaluate all types of biomarkers.
Estimates of a ROC curve can be either parametric or nonparametric (figure 1) (1, 3). Nonparametric curves are calculated by the ordering of observed diseased and nondiseased biomarker levels. Parametric curves are estimated on the basis of parameter estimates that govern assumed distributions of diseased and nondiseased biomarker levels. Estimates of summary measures are then constructed from these curves, with those based on a parametric curve also being functions of parameter estimates.
|
One obstacle to biomarker evaluation that researchers often face is where "nondetects" or missing data exist below some limit of detection (LOD), quantified as d. These could be due to a non-"gold standard," equipment limitations, or measurement error among other reasons. Regardless of the origin, they must be addressed when estimating a ROC curve and/or any subsequent summary measures for biomarker assessment.
The simplest way to address them is to omit the observations and assess the biomarker naively by use of only the truncated data set of quantified observations. Surely, this will yield a bias assessment of biomarker effectiveness, as we would be throwing away information.
Substituting a replacement value for unobservable data is a proposed alternative to provide a less biased assessment. This would create an artificial mass, and the question, "What value?", still exists. We could use the lower or upper bound of the interval of missing values, usually 0 and d, respectively. Approximations to the expected value of the interval, d/2 and
have also been proposed for distributions that are highly skewed and symmetric or slightly skewed, respectively (e.g., lognormal and normal) (7). It has been shown that all of these methods lead to biased estimation of the mean and standard deviation parameters, regression coefficients, and odds ratio, but their usage has yet to be explored in the context of the ROC curve (810).
It is clear that conventional parametric ROC methods (i.e., binormal curve) do not account for the mass and cannot be used here. Schisterman et al. (11) proposed a method for estimating the ROC curve and A for a biomarker with a natural mass or repeated observation at the same value. A LOD creates a mass that is not natural to the biomarker. We will demonstrate that any replacement value below the LOD is arbitrary in the sense that the estimated ROC curve and A are unaffected when properly accounting for the mass. It is important to note that, although the current discussion focuses on A, the same concerns apply to other ROC curve summary statistics.
Instead, in Materials and Methods we propose using maximum likelihood to estimate A. We demonstrate this technique for the binormal curve and also for gamma assumptions. In the section Variance and Confidence Intervals, confidence intervals are developed for each distributional assumption by applying the delta method, the simulated coverage probabilities of which are displayed in the section Simulation Study. An example is given in the section entitled Example using polychlorinated biphenyl levels to classify women with endometriosis to illustrate replacement and maximum likelihood techniques. The Discussion section provides a brief discussion of issues surrounding estimation based on data affected by a LOD.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Let Xi and Yj denote the biomarker levels of the ith diseased and jth nondiseased individuals for i = 1,...,nX and j = 1,...,nY. Assume that these levels are independent observations from continuous distributions
![]() |
![]() | (1) |
Now suppose that, for all i and j, biomarker values are subject to a LOD and are only quantifiable above d such that
|
| (2) |
Replacement values
One remedy for analysis based on data with a LOD is to replace the missing values with a number a such that
|
|
The A based on data with such a mass, denoted by AZ, can be written (11)
![]() |
, and the expected value below d are, then
![]() |
![]() |
|
|
Maximum likelihood
Another approach for handling LOD data is to utilize the maximum likelihood method. Consider a random sample {yj; j = 1, ..., n} from a random variable (Y) with distribution G(y;
Y) and unknown parameter
Y. Let {zj; j = 1, ..., n} denote observed values subject to the transformation in equation 2 for a fixed d, where k values are quantified and n k observations are censored below d. The likelihood function for parameter
Y given the sample of all zj can be viewed as a Bernoulli, where an observation's value is missing or not. If missing, we can obtain no additional information beyond its value being below d. If present, we can assess the likelihood of
Y given the value zj above d. Putting the zj in ascending order with N/A values leading, we can write the likelihood function as
![]() |
Y) is the probability density function of Y, and I is an indicator function taking 1 if true, 0 otherwise. To calculate the maximum likelihood estimate (MLE)
of
Y, simply maximize L with respect to the parameter. In the case where
Y is a vector parameter, maximize L with respect to each parameter while fixing all others.
By use of this approach, MLEs
and
are obtained for the unknown parameters
X and
Y that govern distributions F(x;
X) and G(y;
Y) for diseased and nondiseased biomarker levels, respectively. Substituting
and
for their respective parameters yields the estimate
= A(
,
), which is the MLE of A through the invariance property of MLEs.
Normal case.
Cohen (12) in 1950 and Gupta (13) in 1952 independently considered the case where a population was normally distributed and values were censored above and below, respectively, some value. Both developed MLEs for µ and
, and Harter and Moore (14) provided similar developments for samples censored above and below simultaneously.
With adaptation of these techniques to a biomarker that is censored solely below by a fixed d, the log-likelihood function for the controls, zY, using Gupta's notation, is
![]() |
Y =(d µY)/
Y, and
is the standard normal distribution function. Differentiating the log-likelihood function with respect to µY and
Y yields equations that can be combined such that |
|
![]() |
|
|
numerically and, subsequently,
by substitution.
A, the ROC curve formed by diseased and nondiseased biomarker levels that follow normal distributions, is commonly called a binormal curve with
|
|
|
|
Gamma case.
Similarly, consider the scenario where biomarker levels follow gamma distributions. The log-likelihood equation for these controls, zY, is
![]() |
![]() |
and
MLEs for
and ß.
Now consider the area under a ROC curve formed from the gamma distributions of diseased and nondiseased biomarker levels with shape and scale parameters
X, ßX and
Y, ßY, respectively. As in the "normal case" section, here we can simplify the area such that
|
|
|
|
can be calculated by substituting the MLEs
,
,
, and
for their respective parameters. | VARIANCE AND CONFIDENCE INTERVALS |
|---|
|
|
|---|
The estimator
is asymptotically normally distributed such that
Normal(0, 
), where N = nX + nY, and 
is the variance parameter. This result is left to the Appendix as well as the development of 
. This relation holds for not only biomarker values following normal or gamma distributions but also any
found via maximum likelihood estimation.
By use of this distributional development,
-level two-tailed confidence intervals for A are constructed from
![]() |

is unknown, using the estimator
in lieu of 
leads to approximate
-level confidence intervals for A. | SIMULATION STUDY |
|---|
|
|
|---|
Our results rely on asymptotics, especially the confidence intervals. For this reason, we performed a simulation study consisting of B = 2,000 independent samples of nx diseased and ny nondiseased (nx = ny = 50, 100, 200, 1,000) individuals to assess the maximum likelihood techniques over various sample sizes and shape scenarios.
The normal case was evaluated first with nondiseased values generated from a normal distribution with µY = 0 and 
= 1. Diseased values were obtained from a normal distribution with variance 
= 1,3 and mean µX corresponding to A = 0.6, 0.7, 0.8, 0.9.
Next, nondiseased biomarker levels were generated from a gamma distribution with
Y = 1.5 and ßY = 1. Samples of diseased levels were from a gamma with
X = 1.5,2,3 and ßX set to achieve A = 0.6, 0.7, 0.8, 0.9.
For each simulated group of samples, maximum likelihood techniques were used to find parameter MLEs, which were subsequently used to calculate
and 95 percent confidence intervals for A. These estimators were found before a LOD effect was introduced, no d, and then again for various d, resulting in increasing levels of unquantifiable data. The d selected corresponded to the 20, 40, 60, 80th quantiles of the nondiseased distribution. For clarity, the same d value determined the unquantifiable portions of both diseased and nondiseased values.
Summary measures bias and root mean square error (RMSE) were calculated over the 2,000 groups. The coverage probability of the confidence interval was also found by assessing the proportion of confidence intervals that contained the true A. Tables 1 and 2 display excerpts of these simulation results for the normal and gamma cases, respectively. The summaries of the other simulated scenarios are consistent with tables 1 and 2 but have been omitted for spatial considerations. Generally, as the sample sizes increase, the bias and RMSE decrease for normal and gamma distributions. The highest percentages of bias over both table 1 and table 2 are 1.3, 0.6, 0.4, and 0.1 percent, respectively, for nx = ny = 50, 100, 200, 1,000. The coverage probabilities of the 95 percent confidence intervals are nominal or within random departure from nominal with samples of 50 observations and improve as the number of observations increases. The only exceptions are slightly less than nominal coverage for A = 0.9 for the gamma, nx = ny = 50, 100, and possibly for the normal with nx = ny = 50.
|
|
As an aside to our simulation study of MLEs that account for a LOD, we considered using replacement values and conventional area estimation techniques that ignore a LOD for the same 2,000 iterations of every case of both the normal and gamma. We replaced with the values d/2 and
and recorded bias and RMSE. As expected,
performed slightly better than d/2 for normal, but d/2 was better for the gamma case because of the skewed shape of the gamma. Naïve replacement yields the percentages of bias that are roughly 0.3, 0.7, 1.4, and 4.3 percent for 20, 40, 60, and 80 percent missing for both the normal and gamma cases used in tables 1 and 2 and across sample sizes. For all cases, this bias was substantially greater, often from five to 10 times as large, than that yielded by using MLEs. The RMSE was generally comparable to that from MLEs with the exception that, for nx = ny = 1,000 when 60 and 80 percent are missing, the RMSE from replacement can be from 30 percent to three times larger than that of the MLEs. Also, for the gamma case, bias and RMSE increased as
X increased, whereas they were relatively constant for MLEs. | EXAMPLE |
|---|
|
|
|---|
Endometriosis is a gynecologic disorder that occurs primarily in women of reproductive age. It is a disease exclusive to species that menstruate, such as humans and primates (17). The weight of evidence underlying an association between dioxin and polychlorinated biphenyls and endometriosis includes data from experimental animal, primate, and human studies.
Using an incident case-control study of 28 cases and 51 controls, we evaluated how various polychlorinated biphenyls classified cases and controls with endometriosis. Investigators were interested specifically in polychlorinated biphenyl 114. The LOD experimentally set at d = 0.005 resulted in a censoring of 76 percent of the controls and 36 percent of the cases. Empirical analysis leads to the nonsmooth ROC curve in figure 1. The straight line in the upper portion of the empirical curve corresponds to the mass in the data created by censoring. This ROC curve led to an area point estimate of 0.701 (95 percent confidence interval: 0.576, 0.827). We also applied the maximum likelihood techniques developed here. Quantile plots showed that normal distributions would be very poor choices and that gamma distributions with parameters equal to MLEs fit well. Moreover, polychlorinated biphenyl levels are naturally restricted above zero, which is intrinsic to the gamma and not to the normal distribution. The smooth ROC curve in figure 1 is based on cases and controls following gamma distributions with MLEs substituted for parameters. The area under this curve is
= 0.777 (95 percent confidence interval: 0.655, 0.899).
The fact that
is greater than the empirical estimate reflects a greater perceived discriminating capability of polychlorinated biphenyl 114. The difference is a result of the empirical estimate disregarding possible differentiation below d, while
includes the possibility. Including the possibility that levels below d behave in a manner consistent with the levels above d yields a substantial increase in the overall ability of polychlorinated biphenyl 114 to classify women with and without endometriosis. This increase may warrant additional resources for a larger sample size for more precise estimation or improved laboratory methods for better measurement of samples.
| DISCUSSION |
|---|
|
|
|---|
In this paper, we considered the task of estimating the area under the ROC curve of a biomarker whose measurement is unattainable below a limit of detection for some reason. A review of the literature shows that estimation (distributional parameters, odds ratio, regression) based on such data focuses primarily on replacement values substituted for unquantified observations and maximum likelihood techniques. Replacement values across topics were shown to yield biased estimation. Maximum likelihood techniques, though more computationally intensive, result in asymptotically unbiased estimation. We evaluated both techniques in the context of biomarker evaluation via the area under the ROC curve, specifically A, but other summary measures could be handled similarly. We proved analytically that any replacement value chosen causes bias in a proper estimate of A and that
is asymptotically unbiased and normally distributed. Our assessment of the asymptotics through simulation study of varying sample sizes confirmed the unbiasedness of
and nominal coverage probability of 95 percent confidence intervals for A.
To use this method and calculate
, one can write the likelihood functions described in the "normal case" and "gamma case" sections herein and maximize (or minimize log likelihood) fairly easily in SAS (SAS Institute, Inc., Cary, North Carolina), S-PLUS (e.g., nlminb) (Insightful Corporation, Seattle, Washington), R (e.g., optim) (Comprehensive R Archive Network, Department of Statistics and Mathematics, Vienna University of Economics and Business Administration, Vienna, Austria), or other software. The calculation of the asymptotic variance and a confidence interval will require additional programming; this also is straightforward using this same software and the information in our Appendix.
We also assessed the use of replacement values naively in standard methods. This type of estimation is theoretically incorrect, as distributional assumptions do not account for the mass. Simply, the area calculated is under the wrong curve. However, replacement values used in this manner yielded estimates of A comparable to
in both the normal and gamma cases that we evaluated here for small degrees of censoring, 20 and 40 percent missing. In these cases, this technique could provide a practical solution to computational limitations of equipment or available software.
The calculation of MLEs and, subsequently,
requires distributional assumptions. Considering both normal and gamma distributions gives flexibility in this assumption. In addition, while not constructed here, other continuous distributions (i.e., lognormal, Weibull) could be handled similarly if necessary.
We assessed the robustness of our methods to departure from normality and gamma by misspecifying Student's t distributed data (5, 10, and 25 df) as if they were normally distributed and lognormally distributed data as if they were gamma. The means and variances of the alternative distributions were matched to those of the normals and gammas in our original simulations. As expected, misspecifying the true distribution of the data does introduce bias; however, the percentage of bias remained below 3 percent. The bias of
ranged from about 0.02 to 0.02, with the vast majority in both normal and gamma cases having bias from 0.01 to 0.01. The RMSE was comparable, larger by approximately 5 percent, to that based on data from actual normal and gamma distributions. Coverage probability ranged from nominal to nearly nominal when nx = ny = 50 but decreased as nx and ny increased. Coverage probability is less sensitive to misspecification of distributional assumptions when for data subject to LOD than for data unaffected by a LOD when MLEs are used. The gamma case was less affected than the normal case in this manner. Overall,
proved not to be overly sensitive to departures from reasonable distributional assumptions.
It is important to note that, when measurements are subject to a LOD due to instrument sensitivity, observations above the LOD might also be subject to the measurement error. Although we do not address it here, in practice this measurement error should also be considered and, if present, corrected for.
Accounting for a biomarker's potential discriminating ability is important when comparing biomarkers based on overall discriminating ability. Consider biomarkers affected and unaffected by a LOD, where underestimation of the affected marker may lead to choosing the less discriminatory biomarker with no LOD. Improved measurement techniques may have been warranted if the biomarker's potential had been considered through MLEs.
With biomarkers becoming more and more integral to disease diagnosis and treatment, their assessment is linked to effective, accurate statistical techniques. Limits of detection arise for a variety of reasons, resulting in unquantifiable observations. The maximum likelihood estimators developed here properly account for their "missingness" and provide investigators with consistent estimates of biomarkers' true discriminating capabilities.
| APPENDIX |
|---|
|
|
|---|
Let us consider biomarker levels of diseased, X, and nondiseased, Y, individuals as independent random variables that have continuous distributions F(x;
X) and G(y;
Y), respectively, where
X and
Y are vector parameters. Hence, the ROC curve formed has an area under the curve A(
X,
Y) = P(X > Y;
X,
Y).
Suppose random samples {Xi, i = 1, ..., nX} and {Yj, j = 1, ..., nY} yield kX and kY measured, quantified values and nX kX and nY kY biomarker levels that are unquantifiable (N/A) below some fixed limit of detection, d. Thus, the observations are transformed such that
|
|
Let
and
be MLEs based on these censored data, {ZX1,...,ZXnX} and {ZY1,...,ZYnY}, respectively. In order to investigate the asymptotic distribution of
, we apply the usual Taylor quadratic expansion of A around the parameters
![]() | (A1) |
0 as T
. Therefore, since
and
are MLEs, one can show 
is obtained by
|
| (A2) |
|
|
and nY
. The covariance matrices of MLEs of unknown vectors of parameters are evaluated by the inverse of the Fisher information matrices, for example, IX, IY, i.e.,
Consider the case where F and G are normal distributions with vector parameters of mean and standard deviation
X = {µX,
X} and
Y = {µY,·
Y}, respectively. Let
and
be the standard normal distribution and density functions, respectively. The MLEs
and
are calculated from a sample with censoring below a fixed d and lead to the estimator of the area under a binormal curve
![]() | (A3) |
is consistent and normally distributed. The variance of
is found by substituting the partial derivatives
![]() | (A4) |
![]() | (A5) |
(
) (13). By applying equations A4 and A5 with
Now, consider the case where F and G are gamma distributions with vector parameters
X = {
X, ßX} and
Y = {
Y, ßY}, respectively, where
denotes the shape parameter and ß denotes scale. MLEs
and
based on a sample with left censoring are substituted into the area under the bigamma curve
|
| (A6) |
(n) = (n 1)! to form the estimator
(16). This
is consistent and asymptotically normally distributed. The variance of
in equation A6 is formed by substituting the partial derivatives
![]() | (A7) |
|
| (A8) |
![]() | (A9) |
| ACKNOWLEDGMENTS |
|---|
This research was supported by the Intramural Research Program of the Epidemiology Branch, Division of Epidemiology, Statistics, and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Kramer CK. (1992) Evaluating medical tests. (Sage Publications, Inc, Newbury Park, CA).
- Zweig MH and Campbell G. (1993) Reciever operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:56177.
[Abstract/Free Full Text] - Zhou XH, Obuchowski NA, McClish DK. (2002) Statistical methods in diagnostic medicine. (Wiley & Sons Interscience, New York, NY).
- Faraggi D. (2003) Adjusting ROC curves and realted indices for covariates. J R Stat Soc (D) 52:17992.
- Schisterman EF, Faraggi D, Reiser B. (2004) Adjusting the generalized ROC curve for covariates. Stat Med 23:331931.[CrossRef][ISI][Medline]
- Youden WJ. (1950) Index for rating diagnostic tests. Cancer 3:325.[CrossRef][ISI][Medline]
- Hornung RW and Reed LD. (1990) Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg 5:4651.
- Haas CH and Scheff PA. (1990) Estimation of averages in truncated samples. Environ Sci Technol 24:91219.[CrossRef]
- Singh A and Nocerino J. (2002) Robust estimation of mean and variance using environmental data sets with below detection limit observations. Chemometrics Intell Lab Syst 60:6986.[CrossRef]
- Lynn HS. (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20:3345.[CrossRef][ISI][Medline]
- Schisterman EF, Reiser B, Faraggi D. (2006) ROC analysis for markers with mass at zero. Stat Med 25:62338.[CrossRef][ISI][Medline]
- Cohen AC. (1950) Estimating the mean and variance of normal populations from singly truncated and doubly truncated samples. Ann Math Stat 21:55769.
- Gupta AK. (1952) Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39:26073.
[Free Full Text] - Harter HL and Moore AH. (1966) Iterative maximum-likelihood estimation of the parameters of normal populations from singly and doubly censored samples. Biometrika 53:20513.
[Abstract/Free Full Text] - Harter HL and Moore AH. (1967) Asymptotic variances and covariances of maximum-likelihood estimators, from censored samples, of the parameters of Weibull and gamma populations. Ann Math Stat 38:55770.
- Faraggi D, Reiser B, Schisterman E. (2003) ROC curve analysis for biomarkers based on pooled assessments. Stat Med 22:251527.[CrossRef][ISI][Medline]
- Louis GM, Weiner JM, Whitcomb BW, et al. (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20:27985.
[Abstract/Free Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

















