American Journal of Epidemiology Advance Access originally published online on June 24, 2007
American Journal of Epidemiology 2007 166(6):717-723; doi:10.1093/aje/kwm131
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ORIGINAL CONTRIBUTIONS |
Birth Order and Sibship Size: Evaluation of the Role of Selection Bias in a Case-Control Study of Non-Hodgkin's Lymphoma
1 Epidemiology and Genetics Unit, Department of Health Sciences, University of York, York, United Kingdom
2 Department of Social Policy and Social Work, University of York, York, United Kingdom
Correspondence to Dr. Eve Roman, Epidemiology and Genetics Unit, Department of Health Sciences, Seebohm Rowntree Building, University of York, York YO10 5DD, United Kingdom (e-mail: Eve.Roman{at}egu.york.ac.uk).
Received for publication May 23, 2006. Accepted for publication March 19, 2007.
| ABSTRACT |
|---|
|
|
|---|
Substantial heterogeneity has been observed among case-control studies investigating associations between non-Hodgkin's lymphoma and familial characteristics, such as birth order and sibship size. The potential role of selection bias in explaining such heterogeneity is considered within this study. Selection bias according to familial characteristics and socioeconomic status is investigated within a United Kingdom-based case-control study of non-Hodgkin's lymphoma diagnosed during 1998–2001. Reported distributions of birth order and maternal age are each compared with expected reference distributions derived using national birth statistics from the United Kingdom. A method is detailed in which yearly data are used to derive expected distributions, taking account of variability in birth statistics over time. Census data are used to reweight both the case and control study populations such that they are comparable with the general population with regard to socioeconomic status. The authors found little support for an association between non-Hodgkin's lymphoma and birth order or family size and little evidence for an influence of selection bias. However, the findings suggest that between-study heterogeneity could be explained by selection biases that influence the demographic characteristics of participants.
birth order; case-control studies; lymphoma, non-Hodgkin; selection bias; siblings
Abbreviations: ONS, Office for National Statistics
| INTRODUCTION |
|---|
|
|
|---|
Many factors may influence the validity and precision of epidemiologic studies. The importance of recruiting sufficient participants to limit the influence of random sampling error, using measures that provide valid classification of both outcomes and exposures and taking account of confounding factors or systematic biases such as selection bias, is well recognized (1). The association among non-Hodgkin's lymphoma, sibship size, and birth order has been subject to many investigations (2–9), most following a case-control design, yet their findings are inconsistent. Of the potential explanations for the heterogeneity observed among these studies, selection bias, which has received little attention thus far, is the focus of the present report.
Selection bias occurs in epidemiologic studies when there are systematic differences between the participants selected for the study and the populations that they represent. Selection bias is often introduced through differential study participation according to the experience of the outcome and exposures under investigation (10). For example, the association between socioeconomic status and participation in epidemiologic studies is well documented, with people living in more affluent areas being more likely to participate, particularly as unaffected controls (11). Although many case-control studies have carefully considered the influence of selection bias on their study estimates, for example (11–18), this bias frequently remains unaddressed.
In detailing methods for controlling for selection bias in case-control studies, Greenland (10) and Hernán et al. (19) have given the following indications. If selection factors are antecedents of both the outcome and the exposures under study and are measured on all study subjects, selection bias may be controlled as if it were a confounding factor within the study analysis. More usually, as well as selection factors being measured in all study subjects, knowledge is required of the joint distribution of selection factors, exposure, and disease in the entire source population or, equivalently, the selection probabilities according to each level of the factors affecting selection. As this information is external to usual study data collection, its lack of availability often provides an immediate obstacle to controlling for selection bias. However, a diversity of practical strategies has been used in seeking to obtain information to consider or analytically control for selection bias. Such strategies have included investigation of individuals identified within the study sample who did not participate (15–18), derivation of the represented population distribution of characteristics related to selection using external data such as population censuses (11–14), and conjecture of the magnitude of selectivity in sensitivity analyses (10).
This study presents the association between non-Hodgkin's lymphoma and birth order, sibship size, and parental ages using data from a United Kingdom-based case-control study and examines the potential for selection bias due to differential participation according to familial characteristics (13, 14) and socioeconomic status (11). In a comparison of the recruited control population with reference distributions for birth order and maternal age derived from United Kingdom national birth statistics, consideration is made of whether the control population is representative with respect to familial characteristics. Potential bias in participation according to socioeconomic status is investigated using United Kingdom census data to reweight the study data to be comparable with the general population (19).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study data
A total of 699 Caucasian cases of non-Hodgkin's lymphoma and 742 Caucasian controls from a United Kingdom-based, case-control study were included in the analysis. Full details of the study are provided by Willett et al. (20). The study recruited cases of lymphoma diagnosed at ages 18–64 years, during the years 1998–2001, while normally resident in regions of the North and Southwest of England. For each case, a person matched on age and sex with no history of lymphoma or leukemia was selected as a control from local primary care registers. The study was conducted with the ethical approval of the United Kingdom multiregional ethical committee, and informed consent was given by all participants. The overall study participation rates were 75 percent for identified lymphoma cases and 71 percent for successfully contacted controls.
All participants took part in a face-to-face interview and provided information about their family in a questionnaire completed prior to the interview. The questionnaire included the date of birth of each first-degree relative, from which parental ages at the time of birth, total sibship size, and birth order were derived for each study participant.
By use of data from the 1991 United Kingdom census, a Townsend score including proportions of unemployment, car and home ownership, and overcrowding was computed for each small area enumeration district (21). The Townsend scores were categorized into quintiles to create a deprivation indicator, with high scores corresponding to increased levels of deprivation, indicating lower socioeconomic status. A score from this indicator was assigned to each participant using the address at which he or she was resident at the date of diagnosis for cases or the date corresponding to the matched case diagnosis date for controls (20). Weights to apply to each of the case and control populations such that their deprivation distributions would be representative of the national population of corresponding age structure were derived using the 1991 United Kingdom census data (19).
Expected distributions of birth order and maternal age
It may be demonstrated theoretically that, as long as sampling is independent of an individual's birth order, number of siblings, or inclusion of other siblings and the sampling fraction is very small, the distribution of birth order within a random sample will be representative of the population from which the sample is taken (22). Thus, in the derivations of expected distributions of birth order for the case and control populations, they may each be considered as being frequency samples by year from the population of the United Kingdom.
Yearly estimates of the true distribution of birth order, using the methodology of Smallwood (23), were obtained for births in England and Wales from 1938 onward by contacting the Office for National Statistics (ONS) (http://www.statistics.gov.uk). This distribution has varied significantly over time, which has important implications for its appropriate use (24, 25). Because of lack of comparable population data, 129 cases and 131 controls born prior to 1938 were excluded from this analysis. Prior to calculation of expected distributions, the study data were reweighted by use of the census-derived weights, such that the deprivation distributions of each of the case and control populations were representative of the national population. The expected distributions of birth order, for the national population with age structure corresponding to each of the case and control study populations, were calculated as averages of the yearly national population birth order distribution estimates, weighted respectively by the frequency of case and control participants from each birth year.
Expected distributions of maternal age were made using ONS data, recording the yearly distribution of maternal age for births from 1938 onward following the same principles as detailed for birth order. The reported and expected distributions of birth order and maternal age were compared for each of the case and control populations by use of the chi-squared test and the chi-squared test for trend (26).
Analysis of association
Association between case-control status and each of the familial characteristics was estimated using unconditional logistic regression, considering each characteristic as a continuous and a categorical variable. Analyses were adjusted for age, sex, and study region, and statistical significance was estimated using the likelihood ratio test. Analyses in which socioeconomic status was not considered were compared with analyses in which potential selection bias was accounted for by reweighting each of the case and control populations using the census-derived weights, such that they were representative of the national population.
Correlation between deprivation and familial characteristics was evaluated within the control population using the Spearman rank correlation coefficient. All analyses were conducted using STATA statistical software (27).
| RESULTS |
|---|
|
|
|---|
Characteristics of the participating case and control populations are presented in table 1. Little difference in the deprivation distributions between the case and control populations was observed (p = 0.60). The expected distribution of deprivation categories for a national population following the age distribution of the case and matched control populations, derived from the 1991 United Kingdom census, was as follows: category 1 (most affluent), 24 percent; category 2, 21 percent; category 3, 20 percent; category 4, 19 percent; and category 5 (most deprived), 17 percent. Both the participating case and control populations were less deprived than the comparable general population.
|
Comparisons of the reported with the expected distributions of birth order and maternal age for the case and control populations are presented in table 2. Although some variations in birth order were observed for each of the case and control populations compared with their respective expectations, these differences were not statistically significant. Little evidence was provided for a differential distribution of maternal age between the control population and its corresponding expected distribution. However, some evidence was provided of a tendency for cases to report older maternal ages than would be expected (p = 0.002).
|
The comparison of the case and control groups, presented in table 3, provided little evidence to suggest any association between non-Hodgkin's lymphoma and familial characteristics. Some variability was observed between the analysis in which socioeconomic status was not considered and the analysis in which the case and control populations were each reweighted according to deprivation, yet neither provided estimates indicating significant association. The variability between the analyses indicated a slight influence of selection bias according to socioeconomic status, which was not sufficient to substantially change the interpretation of the results.
|
Association between deprivation and familial characteristics was apparent within the control population. Spearman's rank correlation coefficients of –0.13 (p < 0.001) and –0.10 (p = 0.01) were estimated for maternal age and paternal age, respectively. This indicated that control participants of lower socioeconomic status tended to have been born to younger parents. Size of sibship and birth order were each positively correlated with the deprivation indicator among controls, with Spearman's rank correlation coefficients of 0.19 (p < 0.001) and 0.09 (p = 0.02), respectively, indicating that controls of lower socioeconomic status tended to be from larger sibships and more likely to be of later birth order.
| DISCUSSION |
|---|
|
|
|---|
While there was some indication in our study that the maternal ages reported by cases tended to be increased compared with the expected distribution of maternal ages, no significant association between non-Hodgkin's lymphoma and maternal age was apparent in our case-control comparison. Similarly, there was no evidence for association between non-Hodgkin's lymphoma and any of the other familial characteristics studied.
In evaluating whether we had recruited a control population who were representative of the general population, we compared birth order and maternal age with ONS birth statistics, demonstrating that the control population did not differ substantially for either of these characteristics. We derived the expected distributions using ONS data on a yearly basis, incorporating variability over time, and took account of differences in socioeconomic status between the study and reference populations using the United Kingdom census. We note that this methodology was not applicable for paternal age, which is not routinely recorded for all fathers, nor size of sibship, which depends on later sibling births.
We clarified that our case-control comparisons had not been affected by selection bias according to socioeconomic status by reweighting each of the case and control data to correspond to United Kingdom census data. The difference in socioeconomic status between the cases and controls, however, was small. Study participation was high with rates of 75 percent for identified lymphoma cases and 71 percent for successfully contacted controls. Cases did not participate mainly because they had died, were too ill to take part, had insufficient command of English, or could not be traced, whereas controls refused to participate. Despite incomplete participation for either population, the method of control selection in which controls were matched to cases by local primary care provider, as well as by age and sex, may have helped to ensure comparability of socioeconomic status between the cases and the controls.
Familial characteristics were classified by use of self-reported data usually prepared by the study participants prior to the interview. The completeness of the data was high, and thorough checks were implemented to ensure consistency in the reported data. However, errors in reporting this information cannot be ruled out. Within the analysis a deprivation score based upon the small area census enumeration district in which the participant was resident was used to classify socioeconomic status. Although for some individuals this deprivation score may be a less accurate indication of socioeconomic status than an occupationally based classification, it is more readily assigned to people who are not currently employed, thus providing improved coverage in classifying the socioeconomic status of the study population (28).
In studying the association between parental age and risk of cancer in their offspring using the Swedish Family Cancer Database, the authors found no association for all lymphomas (29); however, no prior research evidence specifically investigating non-Hodgkin's lymphoma was identified. Studies of association between non-Hodgkin's lymphoma and sibship size or birth order have been more frequent, and thus pooled analysis of case-control studies has been proposed within the InterLymph Consortium (9, 30). In a previous study of adult non-Hodgkin's lymphoma and familial characteristics, Grulich et al. (7) described an association between birth order and non-Hodgkin's lymphoma in which children of early birth order were at decreased risk of developing adult non-Hodgkin's lymphoma compared with children of later birth order. A similar trend was described according to the number of other children who lived in the household during the participant's childhood. Consistent with this finding, Bracci et al. (9) described an association whereby earlier birth order or fewer siblings each corresponded to a decreased risk of non-Hodgkin's lymphoma. Conversely, an increased risk of non-Hodgkin's lymphoma for people from single-child families was demonstrated by Cartwright et al. (3). Studies by Altieri et al. (8), Becker et al. (6), Vineis et al. (5), and Paffenbarger et al. (2) found no statistically significant association between adult non-Hodgkin's lymphoma and siblings or other children in the household.
In interpreting the observed heterogeneity, we consider differences between the study designs. Altieri et al. (8) used data from the Swedish Family Cancer Database, identifying non-Hodgkin's lymphoma cases and classifying familial characteristics using linked population data sources. In considering a full population, selection biases were excluded, and the study benefited from a large sample of 7,007 non-Hodgkin's lymphoma cases. Paffenbarger et al. (2) utilized college records and long-term tracking for US university alumni. The study, however, was restricted to a population of male university alumni, considered only deaths from non-Hodgkin's lymphoma as opposed to non-Hodgkin's lymphoma incidence, and was limited in power by observing only 89 non-Hodgkin's lymphoma decedents. Neither of these studies indicated association between non-Hodgkin's lymphoma and number of siblings or birth order.
Each of the other studies followed a case-control design in which cases were typically ascertained from cancer registration or treatment centers. The study sizes varied, influencing the precision with which the estimates of association were made. The smaller of the studies included 437 (3), 585 (6), and 704 (7) non-Hodgkin's lymphoma cases, respectively, compared with the larger studies that included 1,304 (9) and 1,388 (5) non-Hodgkin's lymphoma cases, respectively. The studies varied in both the sources of the controls and the approaches taken to try to ensure that the control population was representative of the population from which the cases were identified. In the study described by Grulich et al. (7), controls were randomly selected from state electoral rolls and were frequency matched to cases by age, sex, and state of residence. In the study described by Bracci et al. (9), controls were identified by random digit dialing, supplemented by random sampling of Health Care Financing Administration files for participants aged more than 65 years, and were frequency matched to cases by age, sex, and county of residence. Cartwright et al. (3) utilized hospital-based inpatient controls with a wide variety of nonmalignant conditions, matched to cases by residential health district, age, and sex. Becker et al. (6) selected age-, sex-, and study area-matched controls from population registers. Vineis et al. (5) selected controls from demographic files in areas where these were available and National Health Service records in other areas, taking samples stratified according to sex and age. In addition to an individual's decisions on whether to take part or not, selection bias may be effective at the point at which the target sample is determined. Thus, the coverage of the population by the register utilized or that can be attained through random digit dialing may be strongly influential, as well as factors related to study recruitment practices.
Few data have been presented that quantify selection according to familial characteristics. Glaser et al. (14) demonstrated a strong association between birth order and participation among controls, where selection was by random digit dialing. The association was such that there was a significant excess of early birth order individuals participating, compared with those who did not participate. In a situation where such selectivity was weaker for cases than for controls, early birth order would appear to be protective against the outcome, a trend which is consistent with the study results described by Grulich et al. (7) and Bracci et al. (9). Without such information for the case population, selection bias factors cannot be formally conjectured (10); however, these results may go some way toward suggesting an explanation for heterogeneity in published results to date.
The reasons for selection bias according to birth order are neither immediately apparent nor well documented. We suggest the association between familial characteristics and socioeconomic status as one potential explanation. This association has been established by historical trends in family building in the United Kingdom, where married couples of lower socioeconomic status tended to begin childbearing at an earlier age and had, on average, larger families than did married couples of higher socioeconomic status (31–34). Within our control population, associations between deprivation and familial characteristics were apparent, whereby controls of lower socioeconomic status tended to have been born to younger parents, to be from larger sibships, and to be of later birth order.
Thorough consideration of each of the many factors that may influence validity and precision is crucial in evaluating the quality of epidemiologic studies. Ensuring a sufficient sample size to limit the influence of random sampling error, careful calibration of the measures used to classify outcomes and exposures, and adjustment for controllable confounding are each essential, as well as taking account of systematic biases (1). In recommendation of methods through which selection bias may be assessed in future studies or retrospectively considered for established studies, it is difficult to be prescriptive as any recommendation would depend upon the availability of study data and external data sources. We have illustrated two potential strategies using the data that were available in the United Kingdom that could be implemented in settings where such data are available and of high quality. These methods are presented in complement to those that have been described by other authors, for example, the study in which Glaser et al. (14) illustrated methods of reweighting according to age, race, education, and parity in a case-control study in reference to US census data.
| ACKNOWLEDGMENTS |
|---|
Financial support for this work was provided by the Leukaemia Research Fund.
The authors thank the collaborating clinical staff of the Leukaemia Research Fund Lymphoma Study and the study research team.
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Rothman KJ, Greenland S, eds. Modern epidemiology (1998) 2nd ed. Philapdelphia, PA: Lippincott-Raven Publishers.
- Paffenbarger RS Jr, Wing AL, Hyde RT. Characteristics in youth predictive of adult-onset malignant lymphomas, melanomas, and leukemias: brief communication. J Natl Cancer Inst (1978) 60:89–92.[Web of Science][Medline]
- Cartwright RA, McKinney PA, O'Brien CO, et al. Non-Hodgkin's lymphoma: case control epidemiological study in Yorkshire. Leuk Res (1988) 12:81–8.[CrossRef][Web of Science][Medline]
- Holly EA, Lele C, Bracci PM, et al. Case-control study of non-Hodgkin's lymphoma among women and heterosexual men in the San Francisco Bay Area, California. Am J Epidemiol (1999) 150:375–89.
[Abstract/Free Full Text] - Vineis P, Miligi L, Crosignani P, et al. Delayed infection, family size and malignant lymphomas. J Epidemiol Community Health (2000) 54:907–11.
[Abstract/Free Full Text] - Becker N, Deeg E, Nieters A. Population-based study of lymphoma in Germany: rationale, study design and first results. Leuk Res (2004) 28:713–24.[CrossRef][Web of Science][Medline]
- Grulich AE, Vajdic CM, Kaldor JM, et al. Birth order, atopy, and risk of non-Hodgkin lymphoma. J Natl Cancer Inst (2005) 97:587–94.
[Abstract/Free Full Text] - Altieri A, Castro F, Bermejo JL, et al. Number of siblings and the risk of lymphoma, leukemia, and myeloma by histopathology. Cancer Epidemiol Biomarkers Prev (2006) 15:1281–6.
[Abstract/Free Full Text] - Bracci PM, Dalvi TB, Holly EA. Residential history, family characteristics and non-Hodgkin lymphoma, a population-based case-control study in the San Francisco Bay Area. Cancer Epidemiol Biomarkers Prev (2006) 15:1287–94.
[Abstract/Free Full Text] - Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol (1996) 25:1107–16.
[Abstract/Free Full Text] - Law GR, Smith AG, Roman E. The importance of full participation: lessons from a national case-control study. Br J Cancer (2002) 86:350–5.[CrossRef][Web of Science][Medline]
- Richiardi L, Boffetta P, Merletti F. Analysis of nonresponse bias in a population-based case-control study on lung cancer. J Clin Epidemiol (2002) 55:1033–40.[CrossRef][Web of Science][Medline]
- Schuz J. Non-response bias as a likely cause of the association between young maternal age at the time of delivery and the risk of cancer in the offspring. Paediatr Perinat Epidemiol (2003) 17:106–12.[CrossRef][Web of Science][Medline]
- Glaser SL, Clarke CA, Keegan THM, et al. Attenuation of social class and reproductive risk factor associations for Hodgkin lymphoma due to selection bias in controls. Cancer Causes Control (2004) 15:731–9.[CrossRef][Web of Science][Medline]
- Hatch EE, Kleinerman RA, Linet MS, et al. Do confounding or selection factors of residential wiring codes and magnetic fields distort findings of electromagnetic fields studies? Epidemiology (2000) 11:189–98.[CrossRef][Web of Science][Medline]
- Madigan MP, Troisi R, Potischman N, et al. Characteristics of respondents and non-respondents from a case-control study of breast cancer in younger women. Int J Epidemiol (2000) 29:793–8.
[Abstract/Free Full Text] - Lahkola A, Salminen T, Auvinen A. Selection bias due to differential participation in a case-control study of mobile phone use and brain tumors. Ann Epidemiol (2005) 15:321–5.[CrossRef][Web of Science][Medline]
- Mezei G, Kheifets L. Selection bias and its implications for case-control studies: a case study of magnetic field exposure and childhood leukaemia. Int J Epidemiol (2006) 35:397–406.
[Abstract/Free Full Text] - Hernán MA, Hernández-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology (2004) 15:615–25.[CrossRef][Web of Science][Medline]
- Willett EV, Smith AG, Dovey GJ, et al. Tobacco and alcohol consumption and the risk of non-Hodgkin lymphoma. Cancer Causes Control (2004) 15:771–80.[CrossRef][Web of Science][Medline]
- Townsend P, Phillimore P, Beattie A. Health and deprivation: inequality and the North (1988) London, United Kingdom: Croom Helm.
- Greenwood M, Yule GU. On the determination of size of family and of the distribution of characters in order of birth from samples taken through members of the sibships. J R Stat Soc (1914) 77:179–99.[CrossRef]
- Smallwood S. New estimates of trends in births by birth order in England and Wales. Popul Trends (2002) 108:32–48.[Medline]
- Hare EH, Price JS. Birth order and family size: bias caused by changes in birth rate. Br J Psychiatry (1969) 115:647–57.
[Abstract/Free Full Text] - Price JS, Hare EH. Birth order studies: some sources of bias. Br J Psychiatry (1969) 115:633–46.
[Abstract/Free Full Text] - Armitage P, Berry G. Statistical methods in medical research (1994) 3rd ed. Oxford, United Kingdom: Blackwell Scientific Publications.
- STATA statistical software: release 8.2. (2003) College Station, TX: StataCorp LP.
- Galobardes B, Shaw M, Lawlor DA, et al. Indicators of socioeconomic position (part 1). J Epidemiol Community Health (2006) 60:7–12.
[Abstract/Free Full Text] - Hemminki K, Kyyrönen P. Parental age and risk of sporadic and familial cancer in offspring: implications for germ cell mutagenesis. Epidemiology (1999) 10:747–51.[CrossRef][Web of Science][Medline]
- Boffetta P, Linet M, Armstrong B. The Interlymph Collaboration: a consortium of molecular epidemiological studies of non-Hodgkin's lymphoma. (Abstract). Proc Am Assoc Cancer Res (2003) 44:1579.
- Pearce D, Britton M. The decline in births: some socio-economic aspects. Popul Trends (1977) 7:9–14.
- Kiernan KE, Diamond I. The age at which childbearing starts—a longitudinal study. Popul Stud (1983) 37:363–80.[CrossRef]
- Werner B. Fertility and family background: some illustrations from the OPCS Longitudinal Study. Popul Trends (1984) 35:5–10.
- Werner B. Fertility trends in different social classes: 1970 to 1983. Popul Trends (1985) 41:5–13.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||