American Journal of Epidemiology Advance Access originally published online on September 18, 2008
American Journal of Epidemiology 2008 168(9):1082-1090; doi:10.1093/aje/kwn220
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PRACTICE OF EPIDEMIOLOGY |
Using Time-Use Data to Parameterize Models for the Spread of Close-Contact Infectious Diseases
Correspondence to Emilio Zagheni, Department of Demography, University of California, Berkeley, 2232 Piedmont Avenue, Berkeley, CA 94720-2120 (e-mail: emilioz{at}demog.berkeley.edu).
Received for publication January 2, 2008. Accepted for publication June 11, 2008.
| ABSTRACT |
|---|
|
|
|---|
Social contact patterns are a critical explanatory factor of the spread of close-contact infectious agents. Both indirect (via observed epidemiologic data) and direct (via diaries that record at-risk events) approaches to the measurement of contacts by age have been proposed in the literature. In this paper, the authors discuss the possibilities offered by time-use surveys to measure contact patterns and to explain observed seroprevalence profiles. The authors first develop a methodology to estimate time-of-exposure matrices, and then they apply it to time-use data for the United States (1987–2003). Finally, the authors estimate age-specific transmission parameters for varicella, commonly known as "chickenpox," from age-specific time-of-exposure and seroprevalence data (United States, 1988–1994). The estimated time-of-exposure matrix reveals a strong element of assortativeness by age. In addition, there are peaks of exposure between people who were born one generation apart (i.e., parents and their children). Models based on the estimated age-specific transmission parameters fit the observed patterns of infection of endemically circulating varicella in a satisfactory way. The availability of time-use data for a large number of countries and their potential to supplement contact surveys make the methods developed extremely valuable and suitable for implementation in several different contexts.
chickenpox; communicable diseases; data interpretation, statistical; endemic diseases
Abbreviations: ATUS, American Time Use Survey; PTM, proportionate time mixing
| INTRODUCTION |
|---|
|
|
|---|
Several diseases, such as measles, mumps, rubella, and influenza, are transmitted by the respiratory or close-contact route. Social mixing patterns are therefore critical factors in the explanation of the transmission dynamics of a large number of infectious agents (1, 2). Knowledge of contact patterns is thus critical to devise containment strategies of a new, potentially devastating, infectious agent, such as a pandemic influenza, or to design effective control measures for endemic diseases. In particular, it is important to identify specific groups in a population that should be targeted by vaccination (3–5). In spite of the relevance of the subject, knowledge about the contact mechanisms underlying the diffusion of close-contact infectious diseases is still limited (6).
The contact structure in mathematical models of close-contact infections has usually been estimated indirectly. In populations stratified by specific individuals attributes, such as age, the transmission rates between age groups, which form the "who-acquires-infection-from-whom" matrix, are traditionally estimated by calibrating models to epidemiologic data, under suitable simplifying assumptions that allow reduction of the number of unknown parameters in order to make them estimable (7). Indirect approaches provide estimates of adequate contacts (8, 9), that is, the product of contacts and the corresponding risk of infection per single contact.
Recently, "direct" approaches, that is, aimed at directly filling in the elements of the contact matrix, have been increasingly used in order to overcome some limitations of "indirect" methods (10–13). The basic idea is to define an "at-risk" contact, for instance, a two-way conversation, and to collect data from sample surveys. Usually, survey respondents are asked to complete diaries about the conversations they have throughout a randomly assigned day. Alternatively, contact matrices, together with average durations of contacts, have been estimated from secondary data sources, such as transportation surveys (14).
In this paper, we systematically explore the possibility of using a different source of data, that is, time-use data, to obtain information on contact patterns. First, we propose and discuss a general method to estimate "time-of-exposure" matrices from time-use sample surveys. Then, we apply the method to data for the United States, and we test the ability of the model based on time-use surveys to fit serologic data for varicella.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data
Our analysis is based on 2 different kinds of data sources, both for the United States: time-use data and seroprevalence data. Information on time use comes from the 2003 American Time Use Survey (ATUS), the 1989–1990 Activity Pattern Survey of California Children, and the 1987–1988 Activity Pattern Survey of Californians. In these surveys, respondents retrospectively record the chronologic sequence and duration of their daily activities in the form of diaries. For most activities, respondents are also asked to provide information on where the activity took place.
Seroprevalence data for varicella in the United States are obtained from the National Health and Nutrition Examination Surveys (1988–1994) and a module of the Behavioral Risk Factor Surveillance System (1991), which is a random-digit–dialed telephone survey sponsored by the Centers for Disease Control and Prevention. Details about the data sets analyzed and their consistency are provided as supplementary material to this paper. (This supplementary material is described in the text, two Web figures, and bibliography posted on the Journal's website (http://aje.oxfordjournals.org/).)
Estimation of age-specific time of exposure
In this section, we set up a methodology to estimate the daily amount of time that people spend together, on average, according to their age.
Several time-use surveys ask the respondent to record some information about the presence of other people during each of the activities the respondent did throughout the day. This is useful information, because it gives, for each activity, the average time spent by the respondents alone or in the presence of somebody else. In particular, the ATUS provides information about the members of the same household as the respondent who were present during the activity (e.g., their age and their relationship to the respondent). This means that we can estimate, directly from the survey, the time of exposure between members of the same households, by age. We represent the exposure by means of a matrix, F, whose entry ij is the total time that people in the age group i spend with their household members in the age group j throughout an average day.
In addition to household members, people spend time with nonhousehold members during any activity considered, but the data available do not allow us to identify them. Consider, for instance, such activities as being on public transportation, at work, or in a restaurant: In these cases, people are likely to have contacts—in the sense of sharing the same room or having conversations—with a number of other people. In order to estimate the amount of time that people might spend with nonhousehold members, according to their age and during any such activity, we need to develop an indirect approach.
Our method is based on the assumption that we call "proportionate time mixing" (PTM) at the level of single activity/location and time slot. In other words, we assume that, for single activity/location and small time intervals, people allocate their time to the other participants in the activity proportionally to their relative participation in the activity. A matrix that represents the overall time of exposure to people by age, E, can be obtained by aggregating the matrices computed under the hypothesis of PTM at the level of single activity/location and time slot, T, and the time of exposure to household members matrix F.
We divide the whole day into 1,440 time slots, each of which consists of 1 minute. The number of people belonging to the age group i that are in the location h during the time slot z is then equal to the number of minutes spent by the population in the age group i, in the location h, and during that particular time slot z of the day considered. We refer to this quantity as kihz, and we interpret it as a measure of person-minutes. If we let n be the number of age groups in the population, then
is the total amount of time spent by the population in the location h during the time slot z. Conversely,
represents the total amount of time spent by people in the age group i in the location h throughout the entire day. When data come from a sample survey, the quantity kihz is obtained by multiplying the number of respondents in the age group i that are in the location h during the time slot z by their respective sample weights. The assumption of PTM at the level of single activity/location and time slot implies that the time of exposure of people in the age group i to people in the age group j, in the location h and for the time slot z,
, can be computed as follows:
|
| (1) |
The idea is that, for each location h and during the particular time slot z, people in the age group i divide their time with people in the age group j according to the relative participation of people in the age group j in the activity that takes place in the location h and during the time slot z. Consider, for example, the location "pub" and a population of 3 age groups: Assume that there are 100 people at pubs between 8:00 PM and 8:01 PM and that 20 of them belong to age group 1 (thus accounting for 20% of the population of "pub-goers" in the specific time interval), 50 belong to age group 2, and 30 belong to age group 3. By applying equation 1 to this location and time slot, we get a symmetric time-of-exposure matrix, such that people in age group 1 spend 20 x 0.2 = 4 minutes with people of the same age, 20 x 0.5 = 10 minutes with those in age group 2, and 20 x 0.3 = 6 minutes with those in age group 3. The entries of the matrix for the other age groups are computed analogously.
The total time of exposure of people in the age group i to people in the age group j, for all locations and throughout an entire day, is obtained by aggregating over activities/locations and time slots as follows:
|
| (2) |
This method is based on the assumption of random allocation of time to people in different age groups within each time slot and activity/location: Assortativeness by age emerges from the fact that people of similar ages tend to do the same activities and tend to schedule those activities during the same time slots.
For the location "school," we have extra information related to the educational system of the country. We know, for instance, that students may attend classes in physically separated buildings according to the grade they are enrolled in and whether it is elementary, middle, or high school. We take this extra information into account by segregating the population into subpopulations of people who are "eligible to have contacts at school." In addition, the fact that respondents are supposed to record the physical location where they engage in activities (e.g., "school") and not only the activity they perform (e.g., "teaching") ensures that we take into account mixing between students and workers at school.
By summing up the matrices T and F, we get the matrix E, that is, an estimate of the overall daily time of exposure between people of different age classes. By construction, at the population level, the total time of exposure of people in age group i to those in j (eij) must be equal to the total time of exposure of people in the age group j to those in i (eji). If there are wi individuals in age group i and wj individuals in age group j, then eij = aijwj = ajiwi, where aij is the average daily time of exposure of an individual in the age group j to people in the age group i. The quantity aij is a fundamental variable for our purposes and represents the time-use version of the average number of contacts by age group that has been proposed in the literature to measure mixing patterns (6, 10, 11, 13).
Estimation of age-specific transmission parameters
The time-use approach relies on the assumption that the number of potentially infectious contacts between people of different age classes is proportional to their time of exposure. The proportionality factor, indicated by q, is disease specific and measures the level of infectivity of the disease (11). Within the time-use framework, q represents the probability of transmission per time unit (e.g., minute) of exposure. Given an estimate of the average time of exposure of an individual in the age group j to people in the age group i, aij, and an estimate of the disease-specific infectivity parameter q, we can write a time-use version of the "next generation matrix" (11, 15) as N = (nij) = (qaij). Within our framework, N gives the age-specific amount of potential transmission time per person.
In order to estimate the infectivity parameter q, we keep the elements aij fixed at their estimates based on time-use data, and we choose the value of q that maximizes the likelihood of observing the age-specific proportion of people immune to varicella in the United States prior to the introduction of the vaccine (for details on the methodology, refer to reference 11). Confidence intervals for q, R0, and the fit-to-prevalence data are derived using a bootstrap technique (16). There are 3 main sources of uncertainty associated with the estimates in the model. The first source is related to the variability of the estimates of the underlying time-of-exposure matrix. A second source is associated with the variability of the estimates of the transmission parameter, given a time-of-exposure matrix. Finally, there is uncertainty in the estimates of the seroprevalence profile by age. Our estimates of uncertainty based on a resampling technique account for the 3 sources of uncertainty.
| RESULTS |
|---|
|
|
|---|
Age-specific time of exposure
The estimated time of exposure reveals a strong element of assortativeness by age: People tend to spend more time with individuals of the same age. Figures 1 and 2 show contour plots of the average time-of-exposure matrices for the locations "school" and "workplace," respectively. These two matrices give a relevant contribution to the overall time-of-exposure matrix and account for most of the exposure between young people and adults, respectively. The other locations considered are "public transportation," "restaurant," "grocery store/mall," and "other public building." When the respondent is sleeping at night, no data are collected about his/her location.
|
|
Our results show a strong assortative pattern by age for children and teenagers while at school. People in the age group 5–9 years spend on average 98 minutes per day with people in the same age group at school, people in the age group 10–14 years spend on average 113 minutes per day with people in the same age group at school, and people in the age group 15–19 years spend on average 142 minutes with people in the same age group at school. Children divide their time at school with other children, teachers, childcare workers, and so on. We obtain estimates that represent the relative distribution of exposure to these people, and we observe a higher level of exposure to people of the same age. A slightly different pattern characterizes the location workplace (Figure 2), for which we observe that the time of exposure between adults still shows an assortative element by age but is less pronounced. In this case, the distribution of exposure is less concentrated around the diagonal elements of the time-of-exposure matrix.
Estimates of time of exposure between household members (i.e., the matrix F) introduce a second important element to the average time-of-exposure matrix—the presence of peaks on those elements that give the exposure between people who were born 1 generation apart (Figure 3). These peaks are related to the fact that people within a household tend to spend time with either those members who belong to the same generation (e.g., siblings, wife, husband) or those who are in a generation ahead or behind (e.g., parents and their children). As a result, the time-of-exposure matrix for household members, F, shows peaks on both the main diagonal and on 1 of the subdiagonals, according to the time lapse between generations.
|
A graphical representation of the overall average time of exposure of an individual in the age group j to people in the age group i, aij, is given in Figure 4. We observe the highest level of assortativeness among people in the age range 10–19 years. According to our estimates, on average, they daily expose themselves to each other for a little less than 4 hours. We obtain similar numbers for the average time of exposure between children who are less than 10 years of age and people who are in the age range 25–39 years. We observe a high level of exposure, also, between people belonging to the same age group who are 65 or more years of age. Table 1 reports the estimated values (in minutes) of exposure between age groups.
|
|
Our time-of-exposure matrices are qualitatively similar to contact matrices obtained from contact surveys (11, 13). The two approaches reveal both assortativeness and peaks in exposure between people who are 1 generation apart (i.e., household contacts). Quantitatively, contact matrices tend to be more assortative than time-of-exposure matrices and, in that sense, they may be associated with larger values of R0. On the other hand, they give little importance to household contacts, which are counted just as any other contact, whereas time-of-exposure matrices emphasize them.
Fitting time-of-exposure matrices to the US serologic profile of varicella-zoster virus
The infectivity parameter q for varicella, which maximizes the likelihood of observing the proportions immune to the infection, given the average time-of-exposure matrix, is 0.00133 (95% confidence interval: 0.00106, 0.00160). The parameter q is disease specific and indicates how contagious the infection is. If we multiply the parameter q by the average time-of-exposure matrix given in Table 1, we obtain a time-use version of a next generation matrix for varicella in the United States. We estimate a value for R0 equal to 7.32 (95% confidence interval: 5.31, 9.32). This value is consistent with other estimates of R0 for varicella obtained from different models (17).
Figure 5 shows the fit of the model based on time-use data to age-specific immunity to varicella in the United States before the introduction of vaccination. The fit of the model to seroprevalence data is good overall. However, we observe a delayed rise in predicted seroprevalence from the model for the older-child age groups. This could be explained by the fact that the time-of-exposure matrix underestimates the level of assortativeness in contacts for older children and teenagers. For instance, when we compare the estimated time-of-exposure matrix with contact matrices estimated from contact surveys in European countries (13), we observe similar levels of assortativeness for the adult population. However, contact matrices show a stronger element of assortativeness for children and teenagers. An alternative explanation is that the choice of a parsimonious model based on only 1 age-independent transmission parameter q hides the fact that the transmissibility of the infection between age groups may be age dependent.
|
| DISCUSSION |
|---|
|
|
|---|
The measurement of human contact patterns has been pursued, in the literature, by applying both indirect (2) and direct (6, 13) methods. In both cases, the relevant epidemiologic variable is the number of contacts by age. Our paper considers the age-specific time of exposure as a relevant proxy for modeling close contacts by age and provides methods to estimate the time-of-exposure and mixing matrices from data collected through time-use sample surveys.
The assumption that time of exposure matters for transmission is crucial to our methodology. Because we do not know with accuracy what the routes of transmission are for varicella and other diseases, we are basically assuming that the time of exposure correlates with potentially infectious contacts or "at-risk" events. In other words, we assume that being in the same setting at the same time is an appropriate social proxy for the routes of transmission. In some cases, the assumption is certainly appropriate. For low-transmissibility infections, it is likely that the transmission of the pathogen is related to the repetition of close contacts that occur throughout an interval of exposure. This would explain the effectiveness of intrahousehold transmission, for instance. When the disease is highly transmissible, the accuracy of the time-use approach may be questioned. Nonetheless, work in progress for Italy, for which we have serologic data and both time-use and contact data, revealed that the model based on time-use data performed relatively well and very similarly to the model based on contact surveys.
In this paper, we first develop a methodology to estimate time-of-exposure matrices, and then we apply it to time-use data for the United States. We document a strong heterogeneity in exposure patterns by age, in addition to a relevant element of assortativeness by age and considerable exposure between people who were born 1 generation apart (mainly explained by intrahousehold exposure). From a theoretical viewpoint, our approach has analogies with the structured mixing scheme (18). We observe that age-specific exposure is not random but that it is structured from the set of socioeconomic and institutional factors that determine the allocation of an individual's time to life's activities.
In the second part of this paper, we fit a model based on age-specific time of exposure to US seroprevalence data for varicella. The fit is quite good, yet we chose a highly parsimonious model with only 1 transmission parameter (11). More complex models that account for the fact that some individuals may be more infectious than others (e.g., age-dependent transmission rates) or that different activities may lead to different chances of infection (e.g., activity-specific transmission rates) could be used. However, it is noteworthy that a simple approach, based on 1 transmission parameter, given a time-of-exposure matrix, performs so well on real data. This is encouraging because there is a huge amount of information on time use that is immediately available for a large group of countries. In addition, time-use data sets are in most cases free or inexpensive and have been harmonized to allow for comparative studies (e.g., the Multinational Time-Use Study).
Our method can be applied also in those situations for which data on time use are not as detailed as they are for the United States. For several time-use surveys, for instance, data on the exact time of beginning and ending of activities are not available. We know only the total time spent by the respondent in each location throughout a day. In those cases, we can still compute the overall time of exposure between people of different age groups but at a lower level of disaggregation. We expect the time-of-exposure matrix obtained from these data to be less assortative because we cannot account for the fact that people of the same age groups tend to schedule their activities during the same time slots, but we can capture only the fact that people of the same age groups tend to do the same activities throughout an average day. We can use data from ATUS to compute a mixing matrix based on aggregated values by location. The deviation of this matrix from the homogeneous mixing one accounts for 98% of the deviation, in terms of the Frobenius norm, of the respective mixing matrix computed under PTM at the level of both single activity/location and time slot from the homogeneous mixing matrix. One important consequence of this result is that reliable matrices, from a statistical point of view, can be computed from data sets that are less detailed than ATUS. A second relevant fact to notice is that enlarging the time slot from the shortest (1 minute) to the longest (1 day) does not change the mixing pattern substantially. This evidence supports the reliability of the estimates based on disaggregation to single-minute time slots, despite the fact that, for some locations, during some time slots, these estimates could be highly stochastic. A third observation is that the time-of-exposure matrix that we estimate based on the PTM at the level of both single activity/location and time slot may still underestimate the level of assortativeness in the population. As a matter of fact, it ultimately depends on the assumption of PTM, and it may not account for several kinds of assortative mixing. For example, if one pub appealed to teenagers, whereas another one appealed to older people, there would be assortative mixing that is not reflected in the time-of-exposure matrix obtained with our method.
Time-use data are particularly interesting because they potentially supplement contact surveys. As a matter of fact, a method based on time-use data allows for measuring those contacts that standard definitions of "at-risk" events, such as a 2-way conversation, do not capture (e.g., being on public transportation, in a mall, and so on). However, the method that we propose does not account for differences in densities in various locations. By scaling the exposure between age groups in a location by the overall amount of person-minutes spent in the location, our method implements frequency-dependent mixing (19), consistent with a behavioral model in which individuals have a constant rate of contacting others, regardless of their density.
Our methodology does not take into account the different nature of contacts in different settings. For instance, contacts in a household may be more intimate and involve physical proximity, whereas contacts in a public building may be more distant. This limitation could be overcome by looking at time-of-exposure matrices disaggregated by activity/location. We foresee further research to be pursued to investigate time-of-exposure patterns in specific settings. This may improve our understanding of where transmission occurs and what locations should be targeted by specific interventions (e.g., school closure). We also think that more research is needed to supplement time-use data with contact surveys and epidemiologic data that can inform on transmission characteristics in different settings and for different types of individuals.
| ACKNOWLEDGMENTS |
|---|
Author affiliations: Department of Demography, University of California, Berkeley, California (Emilio Zagheni); Università Bocconi, Milano, Italia (Francesco C. Billari); Università di Pisa, Pisa, Italia (Piero Manfredi); Health Protection Agency, London, United Kingdom (Alessia Melegaro); Laboratoire National de Santé, Luxembourg, Luxembourg (Joel Mossong); and Health Protection Agency, London, United Kingdom (W. John Edmunds).
This study is part of POLYMOD, a European Commission project funded within the Sixth Framework Programme (contract SSP22-CT-2004502084).
This work has benefited from comments by those associated with the POLYMOD project and those attending the brown-bag seminar at the Department of Demography, University of California, Berkeley.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Fine PEM, Clarkson JA. Measles in England and Wales. I. An analysis of factors underlying seasonal patterns. Int J Epidemiol. (1982) 11(1):5–14.
[Abstract/Free Full Text] - Anderson RM, May RM. Age-related changes in the rate of disease transmission: implications for the design of vaccination programmes. J Hyg (Lond). (1985) 94(3):365–436.[Medline]
- Longini IM Jr, Halloran E, Nizam A, et al. Containing pandemic influenza with antiviral agents. Am J Epidemiol. (2004) 159(7):623–633.
[Abstract/Free Full Text] - Halloran ME, Longini IM Jr, Nizam A, et al. Containing bioterrorist smallpox. Science (2002) 298(5597):1428–1432.
[Abstract/Free Full Text] - Ferguson NM, Keeling MJ, Edmunds WJ, et al. Planning for smallpox outbreaks. Nature (2003) 425(6959):681–685.[CrossRef][Web of Science][Medline]
- Wallinga J, Edmunds WJ, Kretzschmar M. Perspective: human contact patterns and the spread of airborne infectious diseases. Trends Microbiol. (1999) 7(9):372–377.[CrossRef][Web of Science][Medline]
- Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control (1991) Oxford, United Kingdom: Oxford University Press.
- Hethcote HW. Modeling heterogeneous mixing in infectious disease dynamics. In: Models for Infectious Human Diseases: Their Structure and Relation to Data—Isham V, Medley G, eds. (1996) Cambridge, United Kingdom: Cambridge University Press. 215–238.
- Jacquez JA, Koopman J, Simon CP, et al. Modeling and the analysis of HIV transmission: the effect of contact patterns. Math Biosci. (1988) 92:119–199.[CrossRef][Web of Science]
- Edmunds WJ, O'Callaghan CJ, Nokes DJ. Who mixes with whom? A method to determine the contact patterns of adults that may lead to the spread of airborne infections. Proc Biol Sci. (1997) 264(1384):949–957.
[Abstract/Free Full Text] - Wallinga J, Teunis P, Kretzschmar M. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. Am J Epidemiol. (2006) 164(10):936–944.
[Abstract/Free Full Text] - Beutels P, Shkedy Z, Aerts M, et al. Social mixing patterns for transmission models of close contact infections: exploring self-evaluation and diary-based data collection through a Web-based interface. Epidemiol Infect. (2006) 134(6):1158–1166.[CrossRef][Medline]
- Mossong J, Hens N, Jit M, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. (2008) 5(3):e74.[CrossRef][Medline]
- Del Valle SY, Hyman JM, Hethcote HW, et al. Mixing patterns between age groups in social networks. Soc Networks (2007) 29(4):539–554.[CrossRef][Web of Science]
- Diekmann O, Heesterbeek JAP. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation (2000) New York, NY: John Wiley and Sons, Inc.
- Efron B, Tibshirani RJ. An introduction to the bootstrap. In: Monographs on Statistics and Applied Probability (1993) Vol 57. New York, NY: Chapman & Hall.
- Whitaker HJ, Farrington CP. Infections with varying contact rates: application to varicella. Biometrics (2004) 60(3):615–623.[CrossRef][Web of Science][Medline]
- Jacquez JA, Simon CP, Koopman J. Structured mixing: heterogeneous mixing by the definition of activity groups. In: Springer Lecture Notes in Biomathematics. Mathematical and Statistical Approaches to AIDS Epidemiology (1990) New York, NY: Springer-Verlag New York, Inc. 301–315.
- Cauchemez S, Carrat F, Viboud C, et al. A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal data. Stat Med. (2004) 23(22):3469–3487.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





