Skip Navigation


American Journal of Epidemiology Advance Access originally published online on September 12, 2008
American Journal of Epidemiology 2008 168(9):1035-1046; doi:10.1093/aje/kwn224
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Web Tables and Figures
Right arrow All Versions of this Article:
168/9/1035    most recent
kwn224v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Aldrich, M. C.
Right arrow Articles by Wiencke, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aldrich, M. C.
Right arrow Articles by Wiencke, J. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

American Journal of Epidemiology Published by the Johns Hopkins Bloomberg School of Public Health 2008.

ORIGINAL CONTRIBUTIONS

Comparison of Statistical Methods for Estimating Genetic Admixture in a Lung Cancer Study of African Americans and Latinos

Melinda C. Aldrich, Steve Selvin, Helen M. Hansen, Lisa F. Barcellos, Margaret R. Wrensch, Jennette D. Sison, Charles P. Quesenberry, Rick A. Kittles, Gabriel Silva, Patricia A. Buffler, Michael F. Seldin and John K. Wiencke

Correspondence to Dr. Melinda C. Aldrich, University of California, San Francisco, Box 2911 Rock Hall, Mission Bay 582, 1550 4th Street, San Francisco, CA 94143-2911 (e-mail: melinda.aldrich{at}ucsf.edu).

Received for publication January 31, 2008. Accepted for publication June 27, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
A variety of methods are available for estimating genetic admixture proportions in populations; however, few investigators have conducted detailed comparisons using empirical data. The authors characterized admixture proportions among self-identified African Americans (n = 535) and Latinos (n = 412) living in the San Francisco Bay Area who participated in a lung cancer case-control study (1998–2003). Individual estimates of genetic ancestry based on 184 informative markers were obtained from a Bayesian approach and 2 maximum likelihood approaches and were compared using descriptive statistics, Pearson correlation coefficients, and Bland-Altman plots. Case-control differences in individual admixture proportions were assessed using 2-sample t tests and logistic regression analysis. Results indicated that Bayesian and frequentist approaches to estimating admixture provide similar estimates and inferences. No difference was observed in admixture proportions between African-American cases and controls, but Latino cases and controls significantly differed according to Amerindian and European genetic ancestry. Differences in admixture proportions between Latino cases and controls were not unexpected, since cases were more likely to have been born in the United States. Genetic admixture proportions provide a quantitative measure of ancestry differences among Latinos that can be used in analyses of genetic risk factors.

African Americans; case-control studies; epidemiologic methods; genetics, population; Hispanic Americans; linkage disequilibrium; lung neoplasms; statistics


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
An abundance of statistical methods and genetic markers are available with which to identify population substructure and estimate genetic ancestry in non-randomly mating populations recently formed from previously isolated populations, hence considered admixed populations (1). Genomic control (2) and structured association are 2 classes of statistical methods developed to control for population heterogeneity in admixed populations. Genomic control is a nonparametric method used to correct the inflated chi-square statistic caused by the presence of population heterogeneity. Numerous structured association methods have been developed for estimating genetic admixture. General approaches include the weighted least squares (3), maximum likelihood (4, 5), and Bayesian (6, 7) methods, although many variations exist (813). Structured association methods are often regarded as preferable to genomic control, since applying an inflation factor to the entire genome may either over- or undercorrect in certain scenarios (14). In addition, structured association methods provide a useful estimate of genetic ancestry for multivariable models.

Although self-reported ethnicity is associated with genetic ancestry (15, 16) and debate continues as to the implications of measuring genetic ancestry (1722), self-reported ethnicity is unlikely to provide an accurate measure of genetic ancestry, as it represents a combination of known and unknown factors which are genetic, social, economic, and behavioral. With developments in statistical methods and identification of genetic markers, genetic ancestry has become a new variable in statistical models used to study disease associations. However, the majority of published studies providing admixture estimates have been based on convenience samples, rather than epidemiologic investigations. Furthermore, limited comparative data exist for admixture estimation using Bayesian versus maximum likelihood approaches (9, 2326), and frequently fewer than 50 markers have been used (4, 5, 25, 2747).

African Americans have an approximately 2-fold higher incidence of lung cancer than Latinos in the United States (48). However, these observations are based upon self-reported ethnicity, and reasons for the disparate incidence rates remain unclear. Admixed populations, such as African Americans and Latinos, offer investigators a valuable opportunity to explore disparities in complex disease. In the present analysis, we compared Bayesian and frequentist approaches to estimating genetic admixture in African Americans and Latinos participating in a lung cancer case-control study.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Subjects
Newly diagnosed lung cancer patients residing in the San Francisco Bay Area of California were identified through the Northern California Cancer Center and Summit Hospital from September 1998 through March 2003. Incident patients were eligible for participation if they 1) self-identified as African-American or Latino, 2) were 21 years of age or older, 3) resided within one of 5 adjacent counties (Alameda, Contra Costa, Santa Clara, San Francisco, or San Mateo), and 4) had a diagnosis of primary lung cancer. Cases meeting eligibility criteria were requested to participate in an in-person interview and to donate a biologic sample (blood or buccal smear). Cases were not excluded if they had been previously diagnosed with cancer. A total of 368 cases (255 African Americans, 113 Latinos) were included in this analysis.

Potential controls were recruited through 1) random digit dialing, 2) Health Care Financing Administration records for persons aged 65 years or older, and 3) community-based sources. For each case, approximately 2 controls of the same age (±10 years), sex, and self-identified ethnicity were recruited. Eligible controls were requested to participate in an in-person interview and to donate a biologic sample. Extensive details of case and control recruitment are summarized elsewhere (49). A total of 579 controls (280 African Americans, 299 Latinos) were included in the analysis.

The study, designated the San Francisco Bay Area Lung Cancer Study, was approved by the University of California Committee for the Protection of Human Subjects. Written informed consent was obtained from all participating subjects.

Interview data collection and specimen processing
Demographic and epidemiologic data and biologic specimens were collected during the in-person interviews. Specimens were transported to a laboratory within 48 hours of collection and processed for long-term storage until they were ready for genotyping. When samples had been collected from all study participants, biospecimens were thawed and DNA was isolated by means of automated phenol chloroform extraction using the Autogen 3000 (Autogen, Inc., Holliston, Massachusetts). DNA concentration was measured by fluorescence (PicoGreen, Invitrogen Corporation, Carlsbad, California) and normalized to 30–100 ng/µL, for a total concentration of 150–500 ng.

Whole genome amplification was performed on samples yielding insufficient DNA (2 blood samples and 6 buccal samples) in accordance with the Omniplex protocol (Sigma-Aldrich Corporation, St. Louis, Missouri). The amplified product was cleaned with Millipore's Montage PCR96 filter plate (Millipore Corporation, Billerica, Massachusetts) (50).

Marker selection
A panel of 184 autosomal single nucleotide polymorphisms distinguishing the continental ancestor populations comprising Latinos and African Americans (see Web Table 1, which is posted on the Journal’s website (http://aje.oxfordjournals.org/)) was genotyped using DNA from Europeans (San Francisco Bay Area, California; n = 47) (51), West Africans (Bantu and Nilo Saharan speakers, Nigeria; n = 46), and Amerindians (Mayans, Guatemala; n = 46) (52, 53). Mean fixation indices (FST), estimated using FSTAT following the method of Weir and Cockerham (54), were 0.52 for West Africans versus Europeans, 0.52 for West Africans versus Amerindians, and 0.48 for Europeans versus Amerindians.

Genotyping
DNA collected from participants was genotyped at the University of California, Davis, Genome Center using the Illumina Bead Station 500G Golden Gate genotyping platform (Illumina, Inc., San Diego, California) and a custom Illumina panel. A participant was selected for genotyping if he or she was a lung cancer case (Latino or African-American) or a Latino control. A random sample of African-American controls was selected to complete the study. Participants were removed from statistical analyses if, during the interview, they reported belonging to more than 1 ethnic group (n = 44) or the quality of their DNA sample was poor (n = 5); this resulted in a final sample of 947 admixed participants (African-American or Latino).

Statistical analysis
Statistical analyses and admixture estimation procedures were conducted separately for African Americans (n = 535) and Latinos (n = 412). Exact tests for Hardy-Weinberg equilibrium and the linkage disequilibrium measure r2 were calculated using SAS/Genetics software (SAS Institute Inc., Cary, North Carolina). Correction for multiple testing among markers was conducted using the false discovery rate (55). A 1-sample Kolmogorov-Smirnov test was used to assess deviations from Hardy-Weinberg equilibrium by comparing significance probabilities obtained from the chi-square test with a uniform probability distribution. Maximum likelihood estimates of composite linkage disequilibrium, which makes no assumption about random mating or Hardy-Weinberg equilibrium, were computed (56). Chi-square tests yielded significance probabilities for assessing composite linkage disequilibrium which were compared with a nonparametric distribution from 30 iterations of sampling with replacement from all 15,976 pairwise comparisons of markers on different chromosomes.

Using the multilocus genotyping data, individual admixture was estimated in African-American and Latino participants using a maximum likelihood method (designated MLK) developed for this project by one of the authors (S. S.) and written in R. Although an array of approaches have been developed for inferring population structure (3, 4, 611, 24, 5763), STRUCTURE 2.1 (68) was selected for comparison, since it is a Bayesian method frequently used in published genetic association studies, single nucleotide polymorphism data can be input, and the software is freely available. IAE3CI (23, 39, 64) was also selected for comparison because it provided an alternate maximum likelihood method, single nucleotide polymorphism data could be input, it was easily implemented, and it was available to the authors.

To improve ancestry assignment, ancestral population genotyping data were input in all models along with the genotypes of the admixed participants. Parameters for STRUCTURE were set according to author recommendations. An admixture model with independent allele frequencies was selected for inference of ancestry. A Markov chain Monte Carlo scheme (50,000 burn-in length and 50,000 iterations after burn-in) based on Gibbs sampling was implemented to generate the posterior distribution of admixture proportions given the observed genotype at each locus. To estimate the number of K subpopulations, Markov chain Monte Carlo analysis was conducted for K = 2 through K = 5 for African Americans and Latinos independently. The model with the largest log-likelihood was used to select the final K. Ancestral subjects with greater than 5% admixture (3 Amerindians, 6 Africans, and 6 Europeans) were identified using STRUCTURE and were subsequently excluded from Bayesian and maximum likelihood methods estimating admixture proportions for African-American and Latino participants.

The maximum likelihood estimation program IAE3CI is based on methods from Hanis et al. (4) and Chakraborty et al. (5) and has been implemented in other studies (23, 38, 64). Bounds for admixture proportions are set to 0 ≤ m ≤ 1, where m represents the contribution of the parental populations to the hybrid population. The MLK program, which also follows estimation methods described by Chakraborty (65, 66) and Hanis et al. (4), initially allowed m to be unconstrained, imposing no minimum or maximum value. Maximum likelihood estimates of m1, m2, and m3 were then adjusted to sum to 1.0 such that a negative value was set to 0 and a value greater than 1 was set to 1.0.

Comparisons were made between the estimates obtained from the Bayesian and maximum likelihood methods. Descriptive statistics were estimated and case-control differences were compared using t tests. Scatterplots, correlation coefficients, and Bland-Altman plots compared the admixture estimation approaches from STRUCTURE, IAE3CI, and MLK (constrained 0 ≤ m ≤ 1). European ancestry, which contributed a relatively large amount of ancestry and had the least variability in both Latinos and African Americans, was dichotomized using the median value in each population, and logistic regression analyses were used to compare maximum likelihood admixture estimates of median-or-greater European ancestry between cases and controls. All statistical analyses were conducted using SAS, version 9.1, or R, version 2.5. All P values are 2-sided.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Latino controls were more likely than Latino cases to be foreign-born (Table 1). Mexican ancestry, determined using the reported birthplaces of the respondent and the 2 previous generations, was similar between Latino cases and controls, but controls had greater Central American ancestry than cases (30% and 10%, respectively). Additionally, 23% of cases but only 9% of controls were third-generation US-born.


View this table:
[in this window]
[in a new window]

 
Table 1. Demographic Characteristics of African Americans and Latinos in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003

 
Marker locations and allele frequencies for the ancestral populations genotyped for this study are displayed in Web Table 1 (http://aje.oxfordjournals.org/). Among the Latino controls, 9 loci did not conform to the expected Hardy-Weinberg equilibrium values (P < 0.05), and among African-American controls, 18 markers were out of Hardy-Weinberg equilibrium (P < 0.05) (data not shown). Correction for multiple testing resulted in only 1 marker being out of equilibrium for both African-American and Latino controls. Among African-American and Latino controls, 58% and 65% of loci had reduced heterozygosity, respectively, as compared with expected Hardy-Weinberg proportions. The Kolmogorov-Smirnov test indicated no departure from Hardy-Weinberg equilibrium for Latino or African-American controls (data not shown). The largest values for the linkage disequilibrium measure r2 were 0.63 and 0.79 for African-American and Latino controls, respectively. Eighteen (0.05%) and 4 (0.01%) marker pairs among all 32,765 pairwise comparisons for African-American and Latino controls, respectively, had r2 values greater than 0.20. Assessment of composite linkage disequilibrium showed that the distribution of P values differed from the null distribution derived from 30 iterations of sampling with replacement, providing evidence of allelic association between loci in both African Americans and Latinos (data not shown).

STRUCTURE identified a 3-ancestral-population model (K = 3) that best fit the genotyping data for both African Americans and Latinos. For Latinos, STRUCTURE and MLK yielded the highest correlations for all ancestral populations (r = 0.99; see Figure 1), whereas correlations between STRUCTURE and IAE3CI (Figure 2) and IAE3CI and MLK (Figure 3) were slightly lower. Bland-Altman plots show the difference between Bayesian and maximum likelihood estimates as a function of the mean ancestry estimates for assessment of systematic differences between methods. The 95% limits of agreement were narrow, indicating little difference between STRUCTURE and MLK estimates (see Web Figure 1 (http://aje.oxfordjournals.org/)). Bland-Altman plots for STRUCTURE versus IAE3CI (Web Figure 2) and IAE3CI versus MLK (Web Figure 3) show a greater distribution about the reference line (perfect correspondence) than the STRUCTURE versus MLK comparison, suggesting larger differences between estimates, although differences appeared to be random. For African Americans, STRUCTURE and MLK showed the highest correlations for all ancestral populations (Figure 4). Correlations between all estimation approaches were lowest for estimates of Amerindian ancestry among African Americans (Figure 4, Figure 5, and Figure 6). Similarly to the situation for Latinos, the 95% limits of agreement were narrowest for STRUCTURE versus MLK, indicating that estimates obtained from these 2 approaches differed little (Web Figures 4, 5, and 6).


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Scatterplots of ancestral subpopulation estimates from STRUCTURE versus MLK among Latino participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. STRUCTURE, Bayesian estimation method; MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 

Figure 2
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Scatterplots of ancestral subpopulation estimates from STRUCTURE versus IAE3CI among Latino participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. STRUCTURE, Bayesian estimation method; IAE3CI, maximum likelihood estimation method.

 

Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Scatterplots of ancestral subpopulation estimates from IAE3CI versus MLK among Latino participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. IAE3CI, maximum likelihood estimation method; MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 

Figure 4
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Scatterplots of ancestral subpopulation estimates from STRUCTURE versus MLK among African-American participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. STRUCTURE, Bayesian estimation method; MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 

Figure 5
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Scatterplots of ancestral subpopulation estimates from STRUCTURE versus IAE3CI among African-American participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. STRUCTURE, Bayesian estimation method; IAE3CI, maximum likelihood estimation method.

 

Figure 6
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6. Scatterplots of ancestral subpopulation estimates from IAE3CI versus MLK among African-American participants in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. IAE3CI, maximum likelihood estimation method; MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 
Estimated mean admixture proportions from all ancestral populations were similar regardless of whether a Bayesian or maximum likelihood method was applied (Tables 2 and 3). Among Latinos, all 3 estimation methods showed that Amerindian and European genetic ancestry differed significantly between cases and controls (P < 0.01, Table 2). Among African Americans, none of the admixture proportions differed between cases and controls—a result confirmed by all 3 estimation programs (Table 3).


View this table:
[in this window]
[in a new window]

 
Table 2. Summary Statistics Comparing Bayesian and Maximum Likelihood Estimation Approaches to Estimating Genetic Admixture Among Latinos in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003

 

View this table:
[in this window]
[in a new window]

 
Table 3. Summary Statistics Comparing Bayesian and Maximum Likelihood Estimation Approaches to Estimating Genetic Admixture Among African Americans in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003

 
For the following reasons, the maximum likelihood method MLK (assuming 0 ≤ m ≤ 1) was selected as the preferred program for estimating admixture: 1) admixture estimates were similar between all programs; 2) inferences about differences between case and control admixture were the same for all programs; 3) MLK almost perfectly correlated with STRUCTURE, an approach often used for estimating admixture; 4) epidemiology is solidly grounded in frequentist statistics; 5) fewer input parameters (assumptions) were required for MLK than for STRUCTURE; and 6) MLK required a fraction of the computing time necessitated by STRUCTURE.

Figure 7 and Figure 8 display maximum likelihood estimates of ancestry proportions for each Latino and African-American case and control, respectively. Considerable interindividual variability is evident in both populations (Figures 7 and 8 and Tables 2 and 3). Logistic regression models characterizing the relation between genetic admixture and lung cancer found that median-or-greater European ancestry among Latinos (54%) was significantly higher in lung cancer cases than in controls (odds ratio = 1.92, 95% confidence interval: 1.22, 3.03), after adjustment for the frequency-matched variables age and sex. For African Americans, similar analyses indicated that median-or-greater European ancestry (17%) did not vary meaningfully between lung cancer cases and controls (odds ratio = 0.94, 95% confidence interval: 0.67, 1.32), also after adjustment for the frequency-matched variables.


Figure 7
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7. Maximum likelihood estimates from MLK of percentage of genetic ancestry for each individual Latino control and case participating in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 

Figure 8
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 8. Maximum likelihood estimates from MLK of percentage of genetic ancestry for each individual African-American control and case participating in the San Francisco Bay Area Lung Cancer Study, California, 1998–2003. MLK, maximum likelihood estimation method (0 ≤ m ≤ 1).

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
In this study, population substructure was estimated in self-identified African Americans and Latinos from the San Francisco Bay Area. The chi-square test and the Kolmogorov-Smirnov test displayed little if any evidence of deviations from Hardy-Weinberg equilibrium. Together, the presence of reduced heterozygosity and excess allelic associations between the physically unlinked ancestry informative markers suggest that modest population substructure was present among these African-American and Latino controls. The proportions of European, African, and Amerindian ancestry in the Latinos and African Americans participating in this study were consistent with estimates from prior studies (32, 38, 39, 41, 6769).

Estimates derived from STRUCTURE and MLK consistently showed the greatest similarities. McKeigue et al. stated that Bayesian and frequentist approaches to estimating admixture will give similar estimates provided that sufficiently large samples are used, since the mean of the posterior distribution from Bayesian analyses will be "asymptotically equivalent to the maximum likelihood estimate" (10, p. 173). The results from this analysis are consistent with this statement.

Similar admixture proportions between African-American cases and controls and no association between European ancestry and lung cancer suggest that 1) the cases and controls in this study were well-matched according to genetic admixture proportions, 2) the markers used in this study failed to detect differences in this population, 3) any possible existing population structure did not influence sampling, or 4) environmental factors may play a larger role in the incidence of lung cancer for African Americans. One cannot rule out the possibility that susceptibility to lung cancer in African Americans is mediated by genetic factors. For Latinos, case-control differences in Amerindian and European genetic ancestry suggest either that these populations were not sufficiently matched on ethnicity or that markers associated with European or Amerindian ancestry may alter lung cancer risk among Latinos. The observed increase in European ancestry among Latino cases as compared with controls is compatible with our previously reported differences in cases and controls (49, 70). Cases ascertained through the population-based cancer registry were more likely to have been born in the United States than controls, many of whom were recruited through community-based sources. Use of community-based control recruitment resulted in our having a greater proportion of more recent immigrants; thus, it is not unexpected that controls had lower levels of European genetic ancestry than cases. Latino controls may not have been selected from the same study base as the cases, possibly resulting in an insufficiently matched study population. Controlling for admixture proportions in genetic association studies is an appropriate strategy for addressing this imbalance, as long as genetic admixture is not in the causal pathway between exposure and disease (11).

Although admixture estimates and statistical inferences were similar, both STRUCTURE and MLK have advantages and disadvantages. Advantages of STRUCTURE are that genetically similar clusters can be identified and the ancestry of admixed individuals of unknown origin can be estimated with or without data from ancestral populations of known origin. There are several weaknesses of this Bayesian approach: 1) identification of subpopulations is subjective (11); 2) identified clusters may not correspond to actual populations (6, 24); 3) extensive computing time is required; 4) the user must specify a number of priors and other parameters; and 5) a complicated prior distribution is specified, making it difficult to easily interpret the statistical methods being implemented or the consequences of deviations from the assumed model structure.

The maximum likelihood program MLK has the advantage that no bounds are imposed on the admixture proportions and therefore proportions greater than 1 or less than 0 are identified. The unconstrained MLK model provides more accurate admixture estimates than a model imposing boundaries on m, since truncating admixture estimates induces a directional bias (Dr. Neil Risch, University of California, San Francisco, personal communication, 2006). When the research aim is to compare mean admixture estimates between cases and controls, an unconstrained MLK model is appropriate. When the aim of estimating admixture is to obtain individual admixture estimates and examine associations with disease, the assumption of 0 ≤ m ≤ 1 can be imposed. Weaknesses of the MLK program include: 1) starting values of m are required, although these are informed by prior studies; 2) the number of subpopulations in the admixed population cannot be estimated; 3) having missing genotyping data eliminates the observation; and 4) the ancestry of admixed individuals of unknown origin cannot be estimated without data from ancestral populations of known origin.

Both the STRUCTURE method and the MLK method make several assumptions in these analyses. Both of these methods assume that 1) ancestral populations are in Hardy-Weinberg equilibrium within populations; 2) loci are in linkage equilibrium within subpopulations (6); 3) admixture occurred at the same point in time and randomly with respect to genotypes within populations; 4) there is no uncertainty in the ancestral population composition or the allele frequencies (4, 71); 5) no systematic change in allele frequencies has occurred in the parental or hybrid populations (71); and 6) there are only 3 contributing parental populations as determined by the sampling scheme (6). It is difficult to know whether these assumptions hold, since the ancestral populations from which African Americans and Latinos arose are unavailable. It is unlikely that the ancestral population composition and allele frequencies are known without error, and erroneous parental population selection can bias admixture estimates (4). Measurement error of admixture proportions can either bias effect estimates or lead to residual confounding when controlling for confounding by admixture.

The ancestral populations genotyped for this study are unlikely to be fully representative of the actual parental populations, since modern descendants of these populations may have undergone genetic events resulting in differing allele frequencies from their ancestors. This limitation, which applies to both the Bayesian and maximum likelihood approaches, is reflected in the unconstrained estimates of m from MLK. While the maximum likelihood model is parsimonious, it can give illegal values (m > 1 or m < 0) when the model fails (72). Model failure indicates that the number of ancestral populations may be incorrectly specified for the individual cases or controls, the ancestral genotype frequencies have error, or both. Poor correlations between programs for Amerindian ancestry among the African Americans participating in this study suggest that the Mayan population may not have been a representative Amerindian population for the African Americans, although admixture estimates were similar to those of published reports (39).

Strengths of this analysis include the large number of markers (73, 74) and the use of empirical data. Most genetic association studies have estimated admixture in Latinos and African Americans using a small number of markers for few diseases (see Web Tables 2 and 3, respectively (http://aje.oxfordjournals.org/)). Removal of ancestral individuals having less than 95% homogeneous ancestry allowed homogeneous ancestral allele frequencies to inform estimates of admixture, further strengthening this analysis.

In summary, genetic association studies conducted in admixed populations should include a panel of markers to identify genetic differences in ancestry. If additional genotyping is too costly, investigators should consider the presence of false associations due to allele frequency differences between cases and controls when interpreting results. A maximum likelihood method provided admixture estimates similar to those of the more computationally intensive Bayesian approach. While there are readily available admixture estimation programs, it is important for genetic epidemiologists to understand the fundamental issues and assumptions contributing to the estimation process. With the ongoing development of statistical tools, identification of more informative genetic markers, and their increasing use in epidemiologic studies, the importance of these issues is emphasized.


    ACKNOWLEDGMENTS
 
Author affiliations: Department of Medicine, School of Medicine, University of California, San Francisco, San Francisco, California (Melinda C. Aldrich); Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California (Steve Selvin); Department of Neurological Surgery, School of Medicine, University of California, San Francisco, San Francisco, California (Helen M. Hansen, Margaret R. Wrensch, Jennette D. Sison, John K. Wiencke); Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, California (Lisa F. Barcellos, Patricia A. Buffler); Division of Research, Kaiser Permanente, Oakland, California (Charles P. Quesenberry); Department of Medicine, Division of Biological Sciences, University of Chicago, Chicago, Illinois (Rick A. Kittles); Obras Sociales del Hermano Pedro, Antigua, Guatemala (Gabriel Silva); and Rowe Program in Human Genetics, Departments of Biochemistry, Molecular Medicine, and Internal Medicine, School of Medicine, University of California, Davis, Davis, California (Michael F. Seldin).

This work was supported by grants from the National Institute of Environmental Health Sciences (R01ES06717 to J. K. W. and 2R01ES09137-06 to P. A. B.), the National Institute of Arthritis and Musculoskeletal and Skin Diseases (R01AR050267 to M. F. S.), and the National Institute of Diabetes and Digestive and Kidney Diseases (R01K071185 to M. F. S.).

The authors thank Dr. John Belmont for his support with collection of the Mayan population samples. They also thank the Northern California Cancer Center and Summit Hospital for their assistance with case ascertainment.

Conflict of interest: none declared.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Ziv E, Burchard EG. Human population structure and genetic association studies. Pharmacogenomics (2003) 4(4):431–441.[CrossRef][Web of Science][Medline]
  2. Devlin B, Roeder K. Genomic control for association studies. Biometrics (1999) 55(4):997–1004.[CrossRef][Web of Science][Medline]
  3. Long JC. The genetic structure of admixed populations. Genetics (1991) 127(2):417–428.[Abstract]
  4. Hanis CL, Chakraborty R, Ferrell RE, et al. Individual admixture estimates: disease associations and individual risk of diabetes and gallbladder disease among Mexican-Americans in Starr County, Texas. Am J Phys Anthropol (1986) 70(4):433–441.[CrossRef][Web of Science][Medline]
  5. Chakraborty R, Ferrell RE, Stern MP, et al. Relationship of prevalence of non-insulin-dependent diabetes mellitus to Amerindian admixture in the Mexican Americans of San Antonio, Texas. Genet Epidemiol (1986) 3(6):435–454.[CrossRef][Web of Science][Medline]
  6. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics (2000) 155(2):945–959.[Abstract/Free Full Text]
  7. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics (2003) 164(4):1567–1587.[Abstract/Free Full Text]
  8. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes (2007) 7(4):574–578.[CrossRef][Web of Science][Medline]
  9. Tang H, Peng J, Wang P, et al. Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol (2005) 28(4):289–301.[CrossRef][Web of Science][Medline]
  10. McKeigue PM, Carpenter JR, Parra EJ, et al. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann Hum Genet. (2000) 64(pt 2):171–186.[CrossRef][Web of Science][Medline]
  11. Hoggart CJ, Parra EJ, Shriver MD, et al. Control of confounding of genetic associations in stratified populations. Am J Hum Genet. (2003) 72(6):1492–1504.[CrossRef][Web of Science][Medline]
  12. Hoggart CJ, Shriver MD, Kittles RA, et al. Design and analysis of admixture mapping studies. Am J Hum Genet. (2004) 74(5):965–978.[CrossRef][Web of Science][Medline]
  13. Parra EJ, Kittles RA, Argyropoulos G, et al. Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. Am J Phys Anthropol (2001) 114(1):18–29.[CrossRef][Web of Science][Medline]
  14. Marchini J, Cardon LR, Phillips MS, et al. The effects of human population structure on large genetic association studies. Nat Genet. (2004) 36(5):512–517.[CrossRef][Web of Science][Medline]
  15. Tang H, Quertermous T, Rodriguez B, et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet. (2005) 76(2):268–275.[CrossRef][Web of Science][Medline]
  16. Mountain JL, Risch N. Assessing genetic contributions to phenotypic differences among ‘racial’ and ‘ethnic’ groups. Nat Genet. (2004) 36(11 suppl):S48–S53.[CrossRef][Web of Science][Medline]
  17. Tishkoff SA, Kidd KK. Implications of biogeography of human populations for ‘race’ and medicine. Nat Genet. (2004) 36(11 suppl):S21–S27.[CrossRef][Web of Science][Medline]
  18. Rotimi CN. Are medical and nonmedical uses of large-scale genomic markers conflating genetics and ‘race’? Nat Genet. (2004) 36(11 suppl):S43–S47.[CrossRef][Web of Science][Medline]
  19. Burchard EG, Ziv E, Coyle N, et al. The importance of race and ethnic background in biomedical research and clinical practice. N Engl J Med (2003) 348(12):1170–1175.[Free Full Text]
  20. Race Ethnicity and Genetics Working Group. The use of racial, ethnic, and ancestral categories in human genetics research. Am J Hum Genet. (2005) 77(4):519–532.[CrossRef][Web of Science][Medline]
  21. Fine MJ, Ibrahim SA, Thomas SB. The role of race and genetics in health disparities research. Am J Public Health (2005) 95(12):2125–2128.[Free Full Text]
  22. Krieger N. Stormy weather: race, gene expression, and the science of health disparities. Am J Public Health (2005) 95(12):2155–2160.[Abstract/Free Full Text]
  23. Tsai HJ, Choudhry S, Naqvi M, et al. Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Hum Genet. (2005) 118(3–4):424–433.[CrossRef][Web of Science][Medline]
  24. Wu B, Liu N, Zhao H. PSMIX: an R package for population structure inference via maximum likelihood method. BMC Bioinformatics (2006) 7:317.[CrossRef][Medline]
  25. Fernandez JR, Shriver MD, Beasley TM, et al. Association of African genetic admixture with resting metabolic rate and obesity among women. Obes Res. (2003) 11(7):904–911.[Web of Science][Medline]
  26. Barnholtz-Sloan JS, Chakraborty R, Sellers TA, et al. Examining population stratification via individual ancestry estimates versus self-reported race. Cancer Epidemiol Biomarkers Prev (2005) 14(6):1545–1551.[Abstract/Free Full Text]
  27. Alarcon GS, Bastian HM, Beasley TM, et al. Systemic lupus erythematosus in a multi-ethnic cohort (LUMINA) XXXII: [corrected] contributions of admixture and socioeconomic status to renal involvement. Lupus (2006) 15(1):26–31.[Abstract/Free Full Text]
  28. Choudhry S, Ung N, Avila PC, et al. Pharmacogenetic differences in response to albuterol between Puerto Ricans and Mexicans with asthma. Am J Respir Crit Care Med (2005) 171(6):563–570.[Abstract/Free Full Text]
  29. Choudhry S, Burchard EG, Borrell LN, et al. Ancestry-environment interactions and asthma risk among Puerto Ricans. Am J Respir Crit Care Med (2006) 174(10):1088–1093.[Abstract/Free Full Text]
  30. Salari K, Choudhry S, Tang H, et al. Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics. Genet Epidemiol (2005) 29(1):76–86.[CrossRef][Web of Science][Medline]
  31. Sweeney C, Wolff RK, Byers T, et al. Genetic admixture among Hispanics and candidate gene polymorphisms: potential for confounding in a breast cancer study? Cancer Epidemiol Biomarkers Prev (2007) 16(1):142–150.[Abstract/Free Full Text]
  32. Ziv E, John EM, Choudhry S, et al. Genetic ancestry and risk factors for breast cancer among Latinas in the San Francisco Bay Area. Cancer Epidemiol Biomarkers Prev (2006) 15(10):1878–1885.[Abstract/Free Full Text]
  33. Chen H, Hernandez W, Shriver MD, et al. ICAM gene cluster SNPs and prostate cancer risk in African Americans. Hum Genet. (2006) 120(1):69–76.[CrossRef][Web of Science][Medline]
  34. Wassel Fyr CL, Kanaya AM, Cummings SR, et al. Genetic admixture, adipocytokines, and adiposity in Black Americans: The Health, Aging, and Body Composition Study. Hum Genet. (2007) 121(5):615–624.[CrossRef][Web of Science][Medline]
  35. Gallagher CJ, Keene KL, Mychaleckyj JC, et al. Investigation of the estrogen receptor-alpha gene with type 2 diabetes and/or nephropathy in African-American and European-American populations. Diabetes (2007) 56(3):675–684.[Abstract/Free Full Text]
  36. Gower BA, Fernandez JR, Beasley TM, et al. Using genetic admixture to explain racial differences in insulin-related phenotypes. Diabetes (2003) 52(4):1047–1051.[Abstract/Free Full Text]
  37. Higgins PB, Fernandez JR, Goran MI, et al. Early ethnic difference in insulin-like growth factor-1 is associated with African genetic admixture. Pediatr Res. (2005) 58(5):850–854.[CrossRef][Web of Science][Medline]
  38. Peralta CA, Ziv E, Katz R, et al. African ancestry, socioeconomic status, and kidney function in elderly African Americans: a genetic admixture analysis. J Am Soc Nephrol (2006) 17(12):3491–3496.[Abstract/Free Full Text]
  39. Reiner AP, Ziv E, Lind DL, et al. Population structure, admixture, and aging-related phenotypes in African American adults: The Cardiovascular Health Study. Am J Hum Genet. (2005) 76(3):463–477.[CrossRef][Web of Science][Medline]
  40. Shaffer JR, Kammerer CM, Reich D, et al. Genetic markers for ancestry are correlated with body composition traits in older African Americans. Osteoporos Int (2007) 18(6):733–741.[CrossRef][Web of Science][Medline]
  41. Tsai HJ, Shaikh N, Kho JY, et al. Beta 2-adrenergic receptor polymorphisms: pharmacogenetic response to bronchodilator among African American asthmatics. Hum Genet. (2006) 119(5):547–557.[CrossRef][Web of Science][Medline]
  42. Duggan D, Zheng SL, Knowlton M, et al. Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP. J Natl Cancer Inst (2007) 99(24):1836–1844.[Abstract/Free Full Text]
  43. Hernandez W, Grenade C, Santos ER, et al. IGF-1 and IGFBP-3 gene variants influence on serum levels and prostate cancer risk in African-Americans. Carcinogenesis (2007) 28(10):2154–2159.[Abstract/Free Full Text]
  44. Hooker S, Bonilla C, Akereyeni F, et al. NAT2 and NER genetic variants and sporadic prostate cancer susceptibility in African Americans. Prostate Cancer Prostatic Dis (2007) Nov 20 [Epub ahead of print]. (doi: 10.1038/sj.pcan.4501027).
  45. Leak TS, Keene KL, Langefeld CD, et al. Association of the proprotein convertase subtilisin/kexin-type 2 (PCSK2) gene with type 2 diabetes in an African American population. Mol Genet Metab (2007) 92(1–2):145–150.[CrossRef][Web of Science][Medline]
  46. Yende S, Angus DC, Ding J, et al. 4G/5G plasminogen activator inhibitor-1 polymorphisms and haplotypes are associated with pneumonia. Am J Respir Crit Care Med (2007) 176(11):1129–1137.[Abstract/Free Full Text]
  47. Zuo L, Kranzler HR, Luo X, et al. CNR1 variation modulates risk for drug and alcohol dependence. Biol Psychiatry (2007) 62(6):616–626.[CrossRef][Web of Science][Medline]
  48. Ries LAG, Melbert D, Krapcho M, et al. SEER Cancer Statistics Review, 1975–2004 (2007) Bethesda, MD: National Cancer Institute. (http://seer.cancer.gov/csr/1975_2004/).
  49. Wrensch MR, Miike R, Sison JD, et al. CYP1A1 variants and smoking-related lung cancer in San Francisco Bay Area Latinos and African Americans. Int J Cancer (2005) 113(1):141–147.[CrossRef][Web of Science][Medline]
  50. Hansen HM, Wiemels JL, Wrensch M, et al. DNA quantification of whole genome amplified samples for genotyping on a multiplexed bead array platform. Cancer Epidemiol Biomarkers Prev (2007) 16(8):1686–1690.[Abstract/Free Full Text]
  51. Wiemels JL, Wiencke JK, Kelsey KT, et al. Allergy-related polymorphisms influence glioma status and serum IgE levels. Cancer Epidemiol Biomarkers Prev (2007) 16(6):1229–1235.[Abstract/Free Full Text]
  52. Tian C, Hinds DA, Shigeta R, et al. A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet. (2006) 79(4):640–649.[CrossRef][Web of Science][Medline]
  53. Tian C, Hinds DA, Shigeta R, et al. A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping. Am J Hum Genet. (2007) 80(6):1014–1023.[CrossRef][Web of Science][Medline]
  54. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution (1984) 38:1358–1370.[CrossRef][Web of Science]
  55. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (1995) 57:289–300.
  56. Weir BS. Genetic Data Analysis II: Methods for Discrete Population Genetic Data (1996) Sunderland, MA: Sinauer Associates.
  57. Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. (1999) 65(1):220–228.[CrossRef][Web of Science][Medline]
  58. Guillot G, Mortier F, Estoup A. Geneland: a computer package for landscape genetics. Mol Ecol Notes (2005) 5(3):712–715.[CrossRef][Web of Science]
  59. Corander J, Waldmann P, Marttinen P, et al. BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics (2004) 20(15):2363–2369.[Abstract/Free Full Text]
  60. Huelsenbeck JP, Andolfatto P. Inference of population structure under a Dirichlet process model. Genetics (2007) 175(4):1787–1802.[Abstract/Free Full Text]
  61. Purcell S, Sham P. Properties of structured association approaches to detecting population stratification. Hum Hered (2004) 58(2):93–107.[CrossRef][Web of Science][Medline]
  62. Dawson KJ, Belkhir K. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet Res. (2001) 78(1):59–77.[CrossRef][Web of Science][Medline]
  63. Wang J. Maximum-likelihood estimation of admixture proportions from genetic data. Genetics (2003) 164(2):747–765.[Abstract/Free Full Text]
  64. Bonilla C, Parra EJ, Pfaff CL, et al. Admixture in the Hispanics of the San Luis Valley, Colorado, and its implications for complex trait gene mapping. Ann Hum Genet. (2004) 68(pt 2):139–153.[CrossRef][Web of Science][Medline]
  65. Chakraborty R, Weiss KM. Frequencies of complex diseases in hybrid populations. Am J Phys Anthropol (1986) 70(4):489–503.[CrossRef][Web of Science][Medline]
  66. Chakraborty R. Gene admixture in human populations: models and predictions. Yearb Phys Anthropol (1986) 29:1–43.[CrossRef][Web of Science]
  67. Collins-Schramm HE, Phillips CM, Operario DJ, et al. Ethnic-difference markers for use in mapping by admixture linkage disequilibrium. Am J Hum Genet. (2002) 70(3):737–750.[CrossRef][Web of Science][Medline]
  68. Reiner AP, Carlson CS, Ziv E, et al. Genetic ancestry, population sub-structure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet. (2007) 121(5):565–575.[CrossRef][Web of Science][Medline]
  69. Collins-Schramm HE, Chima B, Morii T, et al. Mexican American ancestry-informative markers: examination of population structure and marker characteristics in European Americans, Mexican Americans, Amerindians and Asians. Hum Genet. (2004) 114(3):263–271.[CrossRef][Web of Science][Medline]
  70. Cabral DN, Napoles-Springer AM, Miike R, et al. Population- and community-based recruitment of African Americans and Latinos: The San Francisco Bay Area Lung Cancer Study. Am J Epidemiol (2003) 158(3):272–279.[Abstract/Free Full Text]
  71. Reed TE. Caucasian genes in American Negroes. Science (1969) 165(895):762–768.[Free Full Text]
  72. Chakraborty R, Kamboh MI, Ferrell RE. ‘Unique’ alleles in admixed populations: a strategy for determining ‘hereditary’ population differences of disease frequencies. Ethn Dis (1991) 1(3):245–256.[Medline]
  73. Risch N, Burchard E, Ziv E, et al. Categorization of humans in biomedical research: genes, race and disease. Genome Biol. (2002) 3(7):comment2007.1–comment2007.12.
  74. Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. (2001) 60(3):227–237.[CrossRef][Web of Science][Medline]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Web Tables and Figures
Right arrow All Versions of this Article:
168/9/1035    most recent
kwn224v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Aldrich, M. C.
Right arrow Articles by Wiencke, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aldrich, M. C.
Right arrow Articles by Wiencke, J. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?