American Journal of Epidemiology Advance Access originally published online on August 16, 2007
American Journal of Epidemiology 2007 166(10):1220-1229; doi:10.1093/aje/kwm188
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PRACTICE OF EPIDEMIOLOGY |
An Evaluation of Classification Rules Based on Date of Symptom Onset to Identify Health-Care–associated Infections
1 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
2 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
3 Department of Hospital Epidemiology and Infection Control and Division of Infectious Diseases, Johns Hopkins Medical Institutions, Baltimore, MD
Correspondence to Justin Lessler, 615 North Wolfe Street, Box 352, Baltimore, MD 21224 (e-mail: jlessler{at}jhsph.edu).
Received for publication December 4, 2006. Accepted for publication May 25, 2007.
| ABSTRACT |
|---|
|
|
|---|
The date of symptom onset is often used to distinguish health-care–associated from community-acquired infections. Those patients developing symptoms early in an inpatient stay are considered to have community-acquired infection, while those developing symptoms later are considered nosocomially infected. The authors evaluate the performance of this approach, showing how misclassification rates depend on the disease incubation period and the incidence rate ratio of infection among inpatients versus community members. The authors provide quantitative results for selecting classification rules that designate infections as health care associated or community acquired. These techniques allow the selection of disease-specific cutoffs to distinguish community- from nosocomially acquired infections that perform well for important illnesses. For example, a rule classifying those who develop flu symptoms in the first 1.5 days of their hospital stay as having community-acquired influenza and those developing symptoms later as having nosocomial infection has a positive predictive value and a negative predictive value of at least 87%. A cutoff of 6 days will identify community-acquired Legionnaires' disease with a positive predictive value and a negative predictive value of at least 77%. These results increase the utility of classifying infections by use of the date of onset by providing theoretically sound measures of performance, and they are applicable beyond the hospital setting.
communicable diseases; community-acquired infections; cross infection; infection control
Abbreviations: MRSA, methicillin-resistant Staphylococcus aureus; NPV, negative predictive value; PPV, positive predictive value
| INTRODUCTION |
|---|
|
|
|---|
The Institute of Medicine report, To Err Is Human: Building a Safer Health System, highlighted the health risks associated with medical care (1). Important among these is the risk of acquiring an infection as a result of using health-care services. The burden of health-care–associated, or nosocomial, infections in the United States has been well documented, with over 2 million infections resulting in 250,000 deaths annually (2). Although studies of the impact of health-care–acquired infections have centered on specific sites of infection, the role of health-care institutions in facilitating transmission of disease is becoming increasingly evident. The global outbreak of severe acute respiratory syndrome known as "SARS" in 2003 demonstrated how hospitals could function as "hot zones," where intra-hospital transmission between patients and health-care workers drives disease spread (3–6).
A key task in the study and control of nosocomial infections is identifying cases of nosocomially acquired disease. Classification of cases as nosocomial is based on an understanding of the transmission patterns, incubation period, and communicability period of disease. Although molecular techniques and detailed epidemiologic investigation can be valuable tools in this task, the decision to classify an infection as nosocomially acquired will largely, if not solely, be based on date of symptom onset.
For diseases with short incubation periods, it is clear that those who develop symptoms after a long stay in an inpatient facility must have acquired the illness while in that facility. On the other hand, patients who develop symptoms early in their stay were likely infected before entering the facility. Hospitals use this fact to classify patients as having hospital-acquired or community-acquired infection (7). Patients who develop symptoms before having been in the hospital for a disease-specific period are considered to have community-acquired infection, and those infections producing symptoms after this period are considered to be nosocomially acquired (figure 1). We refer to this period early in the hospital stay as the "classifying window."
|
Misclassification of the source of infection can lead to poor assessments of disease transmission within the health-care setting, thus damaging infection control initiatives and clouding the results of research. We have developed a probabilistic approach to determining the optimal length of the classifying window and the error rates associated with its use. The performance of the classifying window as a method for identifying nosocomial infection depends on the incubation period of a disease and the incidence rate ratio of infection in the hospital versus the community. By use of this information, the length of the classifying window that achieves a desired performance can be found.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The positive and negative predictive values of the classifying window
Classification based on date of symptom onset can be considered a test for community-acquired infection. This test is positive when symptoms begin during the classifying window and negative when they begin outside the window. The performance of this test can be characterized by use of the familiar metrics of positive predicted value (PPV), in this case the percentage of cases developing symptoms in the classifying window that acquired infection in the community, and negative predictive value (NPV), the percentage of cases developing symptoms after the classifying window that have nosocomially acquired infection.
A mathematical expression for the PPV and the NPV can be derived that involves only the cumulative distribution function for the incubation period of the disease of interest and the incidence rate ratio of infection between inpatients and the community (Appendix 1). The resulting equation for the PPV is as follows:
|
|
To determine the exact equation for the NPV, we need to know how long patients stay in the hospital, but the relation between length of hospital stay and risk of infection may be confounded by factors such as underlying medical conditions. Fortunately, it is possible to derive a lower limit for the NPV that depends only on the length of the classifying window and the cumulative distribution function of the incubation period for the disease of interest (Appendix 1):
|
|
The derived formulas were implemented in the R statistical computing language (8), which was used to calculate values for all tables and figures.
Software application
As an aid to those who wish to incorporate the methodology presented here into their protocols, we have implemented these techniques in a software application, freely available from http://www.biostat.jhsph.edu/research/software.shtml.
| RESULTS |
|---|
|
|
|---|
The optimal length of the classifying window
The optimal length of the classifying window depends on the relative importance of the PPV and NPV in the context in which it is being used. Infection prevention and control practitioners interested in identifying cases of probable nosocomial transmission for more detailed investigation may want to favor PPV over NPV in order to have a low probability of missing nosocomial cases (since the probability of missing a nosocomial infection equals 1 – PPV). In contrast, those interested in evaluating the performance of infection control procedures may want to balance PPV and NPV. For any desired PPV and NPV, there is a (possibly empty) range of window lengths that will obtain the desired level of performance, designated by the desired performance on the two scales (PPV-NPV). So the range of window lengths with a PPV of at least 70 percent and an NPV of at least 70 percent is the 70-70 range, and the range of window lengths with a PPV of at least 60 percent and an NPV of at least 70 percent would be the 60-70 range (figure 2). Not all levels of performance are obtainable.
|
In his classic paper on the incubation period of infectious disease, Phillip Sartwell (9) showed that, for most diseases, the incubation period is well characterized by a lognormal distribution. This distribution is defined by the disease's median incubation period, m, and dispersion factor, e
. The dispersion factor used by Sartwell is the antilogarithm of the standard deviation of the log incubation periods and has the property that 68 percent of cases are in the range
|
As an example, consider influenza in adults. To find the incubation period of influenza, we refer to a natural experiment reported in 1979 by Moser et al. (10) in which 72 percent of the passengers on an airliner became infected within several hours. The median incubation period in this outbreak was 1.4 days, and the dispersion factor was 1.4. Using these data and referring to table 1, we see that to achieve 80 percent PPV and 80 percent NPV we should use a classifying window of between 1.5 and 1.8 days in length.
In some cases, it is possible to find a classifying window length that achieves the desired level of performance for any dispersion factor where that performance is obtainable. To identify this window length, it is necessary to know only the median incubation period of the disease and that the dispersion factor is low enough to obtain the desired level of performance. This may be useful, because median incubation periods are commonly reported, but the variance of the incubation period is not. Table 2 shows window lengths and maximum dispersion factors for some situations that may be of common interest.
|
An alternate method for selecting the optimal window length is to specify some general performance criteria, such as having equal performance in identifying community and nosocomial infections (PPV = NPV) or classifying community-acquired infections with an accuracy of 90 percent (PPV = 90). The classifying window length with the best overall performance that meets the desired performance criteria can then be found, and its overall performance determined. Table 3 shows the maximum performance that can be obtained meeting some reasonable criteria (equal PPV and NPV, PPV of 90 percent, NPV of 90 percent) and the window lengths that will achieve this performance. This table is simplified somewhat from previous tables by the fact that the ratio of the optimal window length to the median incubation period,
, is relatively constant (within a tenth of a day). That is, as the median incubation period changes, the window length can be proportionally rescaled, maintaining the performance characteristics of the original window length. This fact allows us to offer guidelines for window length independent of the median incubation period. To find the classifying window length achieving this performance,
is multiplied by the median incubation period (m) (i.e., w =
m).
|
The effect of changes in the dispersion of the incubation period and the incidence rate ratio of nosocomial versus community infection on performance and optimal window length is predictable (figure 3). As the incidence rate ratio increases, the best possible performance is obtained by a shorter window length. As the incidence rate ratio decreases, the best possible performance is obtained by a longer window length. In general, more extreme values of the incidence rate ratio will lead to moderate improvements in performance due to a higher prior likelihood that disease is nosocomially or community acquired. This is analogous to the different PPVs of a screening test with fixed specificity for different prevalences of a disease in the population. As the dispersion factor increases, the best possible performance decreases, and the optimal window length increases. The decrease in performance is expected because of decreasing certainty about the date of acquisition. The increase in window length is because, as the dispersion factor increases, the number of individuals with a longer incubation period increases.
|
Accounting for epidemiologic evidence
The incidence rate ratio of infection among inpatients versus the community, R, has so far been presented as a constant measure. This value can be more generally viewed as a measure of situation-specific prior beliefs about the relative risk of nosocomial compared with community infection. More precisely, R is a ratio of incidence rates representing our belief, at the moment of admission into the hospital, about the incidence rate of infection with a particular disease among inpatients compared with the incidence rate of infection among community members similar to the patient being admitted. This prior belief can be adjusted on the basis of evidence, such as patient activities in the community or the presence of nonspecific symptoms at entry.
The general technique for adjusting R is to use our knowledge or beliefs about how the available evidence increases or decreases the chances that a patient acquired infection before admission or will acquire it during his/her stay. In general, given an incidence rate ratio R, the adjusted rate ratio based on new evidence, R*, as shown in Appendix 2, is as follows:
|
|
This method of adjusting our prior beliefs is based upon beliefs or knowledge about incidence rate ratios, but it can be shown that, in this setting, relative risks can be considered as equivalent to incidence rate ratios (Appendix 2). Hence, we can use relative risks in the equation above, providing that they are not based on a time-dependent exposure. For example, if a particular class of patients is known to be at elevated risk of infection due to long hospital stays, then adjusting R would not be appropriate.
The following example illustrates the process of adjusting our performance estimates on the basis of epidemiologic evidence.
A patient enters the hospital with a fever. Nine days after entering the hospital, the patient develops respiratory symptoms that prove to be due to Legionnaires' disease. Interview of the patient reveals that he lives in an apartment building with a heating and ventilation system that has not been upgraded since 1976.
Upon admission, this patient was more likely to have Legionnaires' disease than was the average patient. Living in a building with an old ventilation system is a known risk factor for exposure to Legionella (relative risk
3.3), and fever is a common early symptom of Legionnaires' disease (11, 12). On the basis of this combined evidence, we believe that the patient is 10 times more likely to have acquired Legionnaires' disease in the community than one admitted without these risk factors (RC = 10). If our general belief is that the rates of infection with Legionnaires' disease in the hospital and the community are equal (R = 1) and if we do not believe that there is current excess hospital risk (RN = 1), our new estimate, R*, is
. Using data from Fraser et al. (13) to determine the median incubation period and the dispersion factor for Legionnaires' disease, we obtain values of 5 days and 1.8, respectively (11). Referring to table 3, we see that a classifying window length of three times the median incubation period, or 15 days, will give us an equal PPV and NPV of 86 percent. Hence, we classify this patient as having community-acquired infection. With no evidence, we would have classified this infection as nosocomial on the basis of a classifying window length of 6 days.
It is important that we use a properly adjusted estimate of the incidence rate ratio in our analysis, as using a classifying window based on an incorrect value of the incidence rate ratio can lead to substantial decreases in PPV and NPV (figure 4). The extent of this decrease varies with the ratio of the true incidence rate ratio (RTrue) and the assumed incidence rate ratio used to choose the classifying window length (RWin). When this ratio is less than one, the NPV is decreased from the predicted performance. When this ratio is greater than one, the PPV is decreased. The extent that the performance decreases depends on the dispersion factor and the true incidence rate ratio. At higher dispersion factors, the performance decreases faster. Although there is some variation based on the true incidence rate ratio, the extent of this variation is small (figure 4). In general, if RTrue/RWin is between one half and two, the drop in PPV or NPV will be less than 10 percent. For larger differences, the drop in performance may be substantial.
|
| DISCUSSION |
|---|
|
|
|---|
This analysis allows for a more principled and systematic approach to symptomatic surveillance for nosocomial infection than is currently common. Consider the recommendations of the Centers for Disease Control and Prevention for identifying nosocomial Legionnaires' disease:
... laboratory-confirmed legionellosis that occurs in a patient who has been hospitalized continuously for greater than or equal to 10 days before the onset of illness is considered a definite case of nosocomial Legionnaires' disease, and laboratory-confirmed infection that occurs 2–9 days after hospital admission is a possible case of the disease (14, p. 30).
These recommendations are based on the fact that Legionnaires' disease rarely has an incubation period of greater than 10 days. Although somewhat accurate, this approach lacks any indication of expected error rates. An alternative approach is to use the tables presented here to identify appropriate window lengths with known error rates. Referring to the original report on the 1977 outbreak of Legionnaires' disease, we find that the median incubation period is approximately 5 days with a dispersion factor between 1.7 and 1.8 (13). Lacking information to the contrary, we assume an equal risk of infection in the community and in the hospital. Referring to table 3, we see that, if we want our classifying window to have a PPV of 90 percent, we should use a classifying window length of 4 days, and if we want an NPV of 90 percent, we should use a window of 8.5 days. This allows a more detailed recommendation with some quantitative indication of accuracy:
... laboratory-confirmed legionellosis that occurs in a patient who has been hospitalized continuously for greater than or equal to 9 days before the onset of illness is nosocomially acquired with high probability (90 percent). Laboratory-confirmed Legionnaires' disease that occurs in the first 4 days of a patient's hospital stay can be considered community acquired (90 percent chance), and infections after this period are likely nosocomial (63 percent).
More detailed recommendations such as these will give infection control practitioners a better sense of the accuracy of their assessments and the caution with which they should be viewed. Having accurate recommendations with known performance becomes especially important as more states adopt legislation requiring the reporting of health-care–associated infections (15).
Multidrug-resistant organisms such as methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant enterococci are of particular concern in health-care settings (16). These organisms can colonize individuals for long periods of time without causing symptoms, but colonized patients are at elevated risk for pathogenic infections of the colonizing organism after undergoing medical procedures (17–19). For these reasons, the date of onset may not provide useful information on where these infections were originally acquired, but the techniques described in this paper may provide useful information as to whether pathogenic infection is associated with hospitalization or a particular procedure. For example, a patient develops an MRSA bloodstream infection, and we wish to determine if it is associated with a recent catheter port. By considering the insertion of the catheter port as the time of "entry," the techniques presented can be adapted to develop a classification rule for distinguishing between infections associated with the catheter port and those that are not, but we lack the information on the incubation period of MRSA necessary to perform this analysis. In order for the date of onset to be useful in these situations, research into the site-specific incubation periods of bacterial infections is needed.
Although this paper has focused on health-care–associated infections, particularly those associated with hospitalization, the analysis has more general application. There are numerous situations where a clinician or researcher may need to distinguish between infection acquired before or after entry into some facility. A physician on a cruise may want to know whether a patient with hepatitis A represents an isolated case who came on board ill, or if she was infected on board and indicates a contaminated food supply. While the specific types of evidence and risks involved vary across situations, our analysis can aid anyone needing to make such an evaluation.
This analysis suggests two important avenues for future work. First, by use of modern molecular techniques, it should be possible to evaluate the predictions of this model by comparing the genotype of infections that are presumptively community acquired with strains known to be circulating in the hospital. Second, the analysis here is focused mainly on patients who develop symptoms while still in the hospital. A substantial number of nosocomial infections may not result in symptoms until the patient leaves the hospital. The mathematical analysis can be extended to evaluate the probable source of these patients' infections, but more work is necessary to make the extended analysis practical.
In both health-care practice and public health research, it is essential to accurately characterize populations in order to form effective policies and design valid studies. The probabilistic analysis presented here provides a method for using empirical evidence about the incubation period of disease to identify nosocomial infection by date of symptom onset. Vague classification systems based on incubation periods and date of symptom onset have not provided a system of classification with the quantitative performance measures we typically demand from our assays. The analysis in this paper should help to remedy this situation, providing theoretically sound measures of the performance of classification criteria.
Technologic improvement provides new tools for assessing the source of infection, but basic epidemiologic information will always play a key role in determining that source. For the foreseeable future, the date of onset will be the first and fastest indicator of whether infection is community or nosocomially acquired. The results above should improve the utility of this assessment to researchers and infection control officers alike.
| APPENDIX 1 |
|---|
|
|
|---|
Classifying Window Performance
The classifying window can be considered a test for community-acquired infection, where the test is positive if symptoms develop in the classifying window. We can now consider the PPV and the NPV of this test. The PPV value of the test is as follows:
|
|
Note:
![]() |
Hence:
![]() |
![]() |
![]() |
|
|
) requires knowledge of the time that patients stay in the hospital, but we can place an upper limit on this value that does not require this information: |
|
![]() |
![]() |
![]() |
![]() |
| APPENDIX 2 |
|---|
|
|
|---|
Incorporating Evidence
This analysis can be extended to incorporate situation-specific knowledge by replacing R with R*, which represents our belief about the incidence rate ratio of inpatient versus community-acquired infection given the patient's situation at the time of admission:
|
|
In this situation, it is not necessary that we distinguish between incidence rate ratios and relative risks. Suppose we believe that those who present with some symptom Y at admission are R+ times more likely to have infection than those who do not:
|
|
|
|
| ACKNOWLEDGMENTS |
|---|
The authors thank Stephen Cole for his valuable advice in writing this paper.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Institute of Medicine. To err is human: building a safer health system (2000) Washington, DC: The National Academies Press.
- Public health focus: surveillance, prevention, and control of nosocomial infections. MMWR Morb Mortal Wkly Rep (1992) 41:783–7.[Medline]
- Loeb M, McGeer A, Henry B, et al. SARS among critical care nurses, Toronto. Emerg Infect Dis (2004) 10:251–5.[Web of Science][Medline]
- Loutfy MR, Wallington T, Rutledge T, et al. Hospital preparedness and SARS. Emerg Infect Dis (2004) 10:771–6.[Web of Science][Medline]
- McDonald LC, Simor AE, Su IJ, et al. SARS in healthcare facilities, Toronto and Taiwan. Emerg Infect Dis (2004) 10:777–81.[Web of Science][Medline]
- Poutanen SM, McGeer AJ. Transmission and control of SARS. Curr Infect Dis Rep (2004) 6:220–7.[Medline]
- Karanfil LV, Conlon M, Lykens K, et al. Reducing the rate of nosocomially transmitted respiratory syncytial virus. Am J Infect Control (1999) 27:91–6.[CrossRef][Web of Science][Medline]
- RDevelopment Core Team. R: a language and environment for statistical computing (2007) Vienna, Austria: R Foundation for Statistical Computing. (http://www.R-project.org).
- Sartwell PE. The distribution of incubation periods of infectious disease. 1949. Am J Epidemiol (1995) 141:386–94.
[Free Full Text] - Moser MR, Bender TR, Margolis HS, et al. An outbreak of influenza aboard a commercial airliner. Am J Epidemiol (1979) 110:1–6.
[Abstract/Free Full Text] - Tsai TF, Finn DR, Plikaytis BD, et al. Legionnaires' disease: clinical features of the epidemic in Philadelphia. Ann Intern Med (1979) 90:509–17.[CrossRef][Web of Science][Medline]
- Borella P, Montagna MT, Spica VR, et al. Legionella infection risk from domestic hot water. Emerg Infect Dis (2004) 10:457–64.[Web of Science][Medline]
- Fraser DW, Tsai TR, Orenstein W, et al. Legionnaires' disease: description of an epidemic of pneumonia. N Engl J Med (1977) 297:1189–97.[Abstract]
- Guidelines for prevention of nosocomial pneumonia. Centers for Disease Control and Prevention. MMWR Recomm Rep (1997) 46:1–79.[Medline]
- McKibben L, Horan TC, Tokars JI, et al. Guidance on public reporting of healthcare-associated infections: recommendations of the Healthcare Infection Control Practices Advisory Committee. Infect Control Hosp Epidemiol (2005) 26:580–7.[CrossRef][Web of Science][Medline]
- Weinstein RA. Nosocomial infection update. Emerg Infect Dis (1998) 4:416–20.[Web of Science][Medline]
- Eggimann P, Pittet D. Infection control in the ICU. Chest (2001) 120:2059–93.[CrossRef][Web of Science][Medline]
- Wertheim HF, Melles DC, Vos MC, et al. The role of nasal carriage in Staphylococcus aureus infections. Lancet Infect Dis (2005) 5:751–62.[CrossRef][Web of Science][Medline]
- Kluytmans JA, val Belkum A, Verbrugh H. Nasal carriage of Staphylococcus aureus: epidemiology, underlying mechanisms and associated risks. Clin Microbiol Rev (1997) 10:505–20.[Abstract]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











