Genetic Ancestry Contributes to Somatic Mutations in Lung Cancers from Admixed Latin American Populations
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 Research Brief Genetic Ancestry Contributes to Somatic Mutations in Lung Cancers from Admixed Latin American Populations Jian Carrot-Zhang1,2,3, Giovanny Soca-Chafre4, Nick Patterson2,3, Aaron R. Thorner1, Anwesha Nag1, Jacqueline Watson1,2, Giulio Genovese2,3, July Rodriguez5, Maya K. Gelbard1, Luis Corrales-Rodriguez6,7, Yoichiro Mitsuishi8, Gavin Ha9, Joshua D. Campbell10, Geoffrey R. Oxnard1, Oscar Arrieta4,11, Andres F. Cardona5,12, Alexander Gusev1,2,13, and Matthew Meyerson1,2,3 abstract Inherited lung cancer risk, particularly in nonsmokers, is poorly understood. Genomic and ancestry analysis of 1,153 lung cancers from Latin America revealed striking associations between Native American ancestry and their somatic landscape, including tumor mutational burden, and specific driver mutations in EGFR, KRAS, and STK11. A local Native American ancestry risk score was more strongly correlated with EGFR mutation frequency compared with global ancestry correlation, suggesting that germline genetics (rather than environmental exposure) underlie these disparities. Significance: The frequency of somatic EGFR and KRAS mutations in lung cancer varies by ethnic- ity, but we do not understand why. Our study suggests that the variation in EGFR and KRAS mutation frequency is associated with genetic ancestry and suggests further studies to identify germline alleles that underpin this association. Introduction mutations is higher in lung adenocarcinomas from patients in East Asia (∼45%) compared with lung adenocarcinomas Lung cancer causes more than 1.7 million deaths per year from patients in Europe or patients of European (EUR) and/ worldwide (1) and kills more people than any other malig- or African (AFR) descent in North America (∼10%; refs. 4–8). nancy in Latin America (2). Lung adenocarcinoma is the most In Latin American countries, the frequency of somatic EGFR common subtype of lung cancer that is typically driven by mutations in lung adenocarcinoma ranges from roughly genomic alterations of genes in the receptor tyrosine kinase 14% in Argentina, to 25% to 34% in Colombia, Brazil, and (RTK)/RAS/RAF pathway (3), often allowing effective thera- Mexico, to 51% in Peru (refs. 9–11; Fig. 1A). Moreover, recent peutic targeting by RTK and other pathway inhibitors. It is well genomic studies from East Asian (EAS) and AFR populations known, but mysterious, that the frequency of somatic EGFR have suggested different distributions of tumor mutation 1 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Note: Supplementary data for this article are available at Cancer Discovery Massachusetts. 2Broad Institute of MIT and Harvard, Cambridge, Massa- Online (http://cancerdiscovery.aacrjournals.org/). chusetts. 3Departments of Genetics and Medicine, Harvard Medical School, Corresponding Authors: Matthew Meyerson, Dana-Farber Cancer Institute, Boston, Massachusetts. 4Personalized Medicine Laboratory, Instituto 450 Brookline Avenue, Boston, MA 02215. Phone: 617-632-4768; Fax: Nacional de Cancerologia, México City, México. 5Foundation for Clinical and 617-582-7880; E-mail: matthew_meyerson@dfci.harvard.edu; Alexander Applied Cancer Research – FICMAC, Bogotá, Colombia. 6Medical Oncology, Gusev, Phone: 203-980-8760; E-mail: alexander_gusev@dfci.harvard.edu; Hospital San Juan de Dios, San José, Costa Rica. 7Centro de Investigación y Andres F. Cardona, Cra. 16 #82-95, Bogotá, Cundinamarca, Colombia. Phone: Manejo del Cáncer – CIMCA, San José, Costa Rica. 8Division of Respiratory 573-0163-48173; E-mail: a_cardonaz@yahoo.com; and Oscar Arrieta, Medicine, Graduate School of Medicine, Juntendo University, Bunkyo-ku, San Fernando #22, Col. Sección XVI, Tlalpan, Mexico D.F., Mexico. Phone: Tokyo, Japan. 9Division of Public Health Sciences, Fred Hutchinson Can- 5255-5628-0400; E-mail: scararrietaincan@gmail.com cer Research Center, Seattle, Washington. 10Division of Computational Biomedicine, Department of Medicine, Boston University School of Medi- Cancer Discov 2021;11:1–8 cine, Boston, Massachusetts. 11Thoracic Oncology Unit, Instituto Nacional doi: 10.1158/2159-8290.CD-20-1165 de Cancerología, México City, México. 12Clinical and Translational Oncol- ©2020 American Association for Cancer Research. ogy Group, Clínica del Country, Bogotá, Colombia. 13Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts. March 2021 CANCER DISCOVERY | OF1 Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 RESEARCH BRIEF Carrot-Zhang et al. A B Inherited germline risk Europe USA EGFR 10%–15% East Asia EGFR African American 12% KRAS ~30% EGFR European American 10% EGFR 40%–50% KRAS ~10% TMB/SCNA European American high TMB/SCNA low Mexico EGFR 34% Colombia EGFR 25% Brazil EGFR 27% Peru EGFR 51% Environmental exposure Argentina EGFR 14% Figure 1. Genomic differences in lung adenocarcinoma across patient populations. A, TMB, SCNA burden, and the frequency of KRAS mutations are lower, whereas the frequency of EGFR mutations is higher, in lung cancers from East Asian patients, compared with lung cancers from patients of Euro- pean and/or African origin. The somatic EGFR mutation rate in lung cancer varies among Latin American countries. Mexican and Colombian populations have varying degrees of NAT ancestry, as indicated in blue. B, Both germline variations and environmental exposures such as smoking can predispose to somatic alterations driving the development of lung cancers, that may cause the genomic differences across populations. burden (TMB) and levels of somatic copy-number alteration and MDM2 (Fig. 2; Supplementary Tables S2 and S3). The (SCNA; refs. 12, 13), compared with lung adenocarcinoma detected mutation frequencies of EGFR and KRAS were 30% patients of EUR ancestry. and 10%, respectively, in the tested lung cancer samples from Despite the differences in patterns of somatic mutation in Mexican patients, and 23% and 13%, respectively, in the tested lung adenocarcinoma from patients of different ethnicities, lung cancer samples from Colombian patients. SCNA analysis the landscape of ancestry effects on the lung cancer genomes (see Methods) identified 9% and 2% cases with high-level ampli- for the Latin American populations has not been compre- fications in MYC and MDM2, respectively. We did not observe hensively described, and it remains unknown whether the novel amplification or deletion peaks in this Latin American differences are due to ancestry-specific germline variation or lung cancer cohort as assessed by GISTIC analysis. rather to population-specific environmental exposures (Fig. Ancestry effects on somatic cancer genomes are under- 1B). This is of particular importance as Native American studied (17, 18), and few genomic data sets have been devel- (NAT) ancestry, which includes components of EAS ancestry oped from patients with lung cancer with mixed ancestry. derived through waves of migration (14), is present to varying One potential source of samples for analysis of ancestry degrees in modern populations in Latin America, along with effects is discarded tissue or nucleic acids, left over after the EUR and AFR ancestry (15). clinical analysis of cancer samples. Here, we developed an analytic pipeline (https://github.com/jcarrotzhang/ancestry- from-panel) that offers the advantage of simultaneous meas- Results urement of global and local ancestry from sequencing tumor To explore the landscape of somatic cancer mutation in DNA only, without requiring matched germline samples lung cancers from Latin America and to assess the influence of (Supplementary Fig. S1A–S1C). Briefly, we called the gen- germline ancestry of genetically admixed patient populations otype of germline single-nucleotide polymorphisms (SNP) on these somatic alterations, we performed genomic analysis using on-target and off-target reads from the sequencing of 601 lung cancer cases from Mexico and 552 from Colom- panel and measured global ancestry based on principal com- bia, including 499 self-reported nonsmokers (Supplementary ponent analysis (PCA; ref. 19) of the germline SNPs, in which Table S1). Next-generation sequencing targeting a panel of 547 principal components (PC) 1, 2, and 3 captured prominently cancer genes plus intronic regions of 60 cancer genes (16) was the axis of AFR, EUR, and NAT ancestry, respectively (Sup- used to identify single-nucleotide variants (SNV), insertions/ plementary Fig. S2A). We then imputed missing SNPs using deletions (indels), SCNA, and gene fusions. This gene panel an external haplotype reference panel (15) and assigned local covers all currently known lung cancer drivers (3), which are ancestry to each genomic region (20), based on the imputed the focus of this work. Because we do not have matched ger- variants. We validated our approach to ancestry analysis by mline samples, we applied a custom script to identify known, performing whole-genome sequencing on a subset of 44 hotspot lung cancer driver mutations for the full 1,153 samples tumor samples, and SNP genotyping on 12 paired tumor to ensure the sensitivity for low-coverage samples, as well as to and normal samples (Supplementary Fig. S2B and S2C). We avoid potential germline contamination (Methods). We found found a high accuracy of our off-target reads approach based that 552 (48%) of all samples harbored oncogenic mutations in on panel sequencing of tumor DNA, by comparing tumor EGFR, KRAS, BRAF, ERBB2, or MET, or fusions in ALK, ROS1, and normal ancestry estimations (Pearson r > 0.99), and or RET; 785 of 1,153 samples harbored at least one detectable by comparing panel sequencing to whole-genome sequenc- alteration in a broader set of known lung cancer driver genes ing (Pearson r > 0.96). As panel testing of cancer genes has also including TP53, STK11, KEAP1, SMARCA4, SETD2, MYC, emerged as a practical diagnostic tool in clinical care, our OF2 | CANCER DISCOVERY March 2021 AACRJournals.org Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 Genetic Ancestry Affects Somatic Alterations in Lung Cancers RESEARCH BRIEF Latin American LUAD LATAM EUR EAS Mutation type LUAD LUAD LUAD Gain-of-function SNVs/indels/fusions EGFR* Missense_mutation Amplification 27% 13% 56% KRAS* Inframe_indel Fusion 12% 33% 13% BRAF Splice_mutation Deletion 2% ERBB2 Truncating_mutation 1% MET 1% ALK 4% ROS1
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 RESEARCH BRIEF Carrot-Zhang et al. A EGFR mutation % 160 KRAS mutation % 40 40 Sample size 120 80 20 20 40 0 0 0 20 40 60 80 100 20 40 60 80 100 NAT ancestry % NAT ancestry % B All samples Nonsmokers NAT ancestry % NAT ancestry % BRAF-mutant BRAF-mutant P = 9e-06 P = 2e-06 EGFR-mutant EGFR-mutant P = 0.004 KRAS-mutant KRAS-mutant ALK-mutant ALK-mutant –0.02 –0.01 0.00 0.01 0.02 0.03 –0.06 –0.04 –0.02 0.00 0.02 0.04 Smoking signature % Smoking signature % BRAF-mutant BRAF-mutant P = 0.005 EGFR-mutant P = 0.02 EGFR-mutant KRAS-mutant KRAS-mutant ALK-mutant ALK-mutant –0.075 –0.050 –0.025 0.000 0.025 0.050 –0.125 –0.100 –0.075 –0.050 –0.025 0.000 0.025 Figure 3. Targetable LUAD driver genes associated with genetic ancestry. A, The percentage of NAT germline ancestry is positively correlated with the percentage of somatic EGFR mutations, and negatively correlated with the percentage of somatic KRAS mutations. Color bar represents the number of samples in the NAT ancestry percentage range. B, Association of targetable LUAD driver genes with NAT ancestry, mutational signature, and gender (n = 705; left), and the association in never-smokers only (n = 387; right). Multivariable logistic regression P values are shown, with NAT ancestry percent- age, gender, smoking, and APOBEC signature as covariates. Red dots represent P < 0.05. Lines represent 95% confidence intervals. differences in Latin American patients with lung adenocarci- 5,425 genomic regions with an assignment of AFR, EUR, or noma that are independent of smoking activity. NAT ancestral population for each parental chromosome To assess whether the observed association with EGFR (Supplementary Table S6–S7). We performed a multivariable and KRAS mutations is due to NAT ancestry itself or to logistic regression of NAT ancestry for each genomic region an environmental exposure/socioeconomic status related to correlating with the EGFR-mutant or KRAS-mutant samples, NAT ancestry, we next investigated the influence of local controlling for the global ancestry (Methods). We did not ancestry. Previous work has shown that associations between identify any region where correlation reached genome-wide local ancestry and phenotype (while accounting for global significance of P < 5 × 10−5 (Fig. 4A; Supplementary Fig. S6). ancestry) provide evidence of a genetically driven phenotype, We next evaluated whether local ancestries across multiple as local ancestry is not expected to be causally associated sub–genome-wide significance threshold regions (5 × 10−5 < P with environmental exposure or socioeconomic status (22, < 0.05) were associated with the somatic mutation phenotype 23). We used RFMix (20) to map local ancestry, producing by constructing a polygenic ancestry score. This approach is A Genomic loci associated with EGFR Genomic loci associated with KRAS B Gene ~ local ancestry risk score + global ancestry 4 4 low NAT ancestry high low NAT ancestry high P val. Coef. 95% CI 2 2 Local ancestry risk score (cross-validated) 9e–08 0.55 0.35–0.75 −Log10 (P) −Log10 (P ) EGFR Global ancestry 0 0 (NAT ancestry %) 0.31 0.11 −0.11–0.34 Local ancestry risk score (cross-validated) 0.002 1.00 0.37–1.63 –2 –2 KRAS Global ancestry 0.14 0.38 −0.12–0.89 (NAT ancestry %) –4 –4 1 2 3 4 - -181920-X 5 6 7 8 9 10 11 1213 15 1 2 3 4 5 6 7 8 9 10 11 1213 15 - -181920-X Chromosome Chromosome Figure 4. Germline local ancestry in association with somatic EGFR and KRAS mutations. A, Genome-wide association of local NAT ancestry with EGFR (left) and KRAS (right). “NAT ancestry high” indicates positive association, whereas “NAT ancestry low” indicates negative association. Red line indicates P = 0.05. Orange line indicates genome-wide significance threshold (P < 5 × 10−5). B, Association of local ancestry risk score with somatic EGFR or KRAS mutations, controlling for global ancestry (proportion of overall NAT ancestry). OF4 | CANCER DISCOVERY March 2021 AACRJournals.org Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 Genetic Ancestry Affects Somatic Alterations in Lung Cancers RESEARCH BRIEF conceptually similar to previous work leveraging local ances- Disparities in access to genetic testing have been observed try to quantifying phenotypic heritability (22), but we use among Hispanic patients with lung cancer in the United States a risk score rather than variance partitioning as the former (29). Our study shows that while controlling for global ancestry, is more stable at low sample sizes. The local ancestry risk local ancestry is associated with mutations in EGFR and KRAS, score was defined as the sum of NAT ancestry across each providing the first example, to our knowledge, of a germline associated region weighted by the Z-score for the association influence, which may or may not act together with ancestry- of that region with the given mutation (Methods). To guard specific environmental exposure, on targetable somatic events against overfitting, Z-scores and local ancestry risk scores in lung cancer. These findings highlight the importance of pro- were computed by cross-validation: splitting the data set into viding somatic genetic testing for Latin American patients with 10 subsets, obtaining Z-scores for the mutation–ancestry lung cancer with admixed ancestries. Given the limited sample associations using nine subsets, and then calculating the size, we could not determine the precise risk loci for EGFR and local ancestry risk score on the held-out subset (Supplemen- KRAS mutation by local ancestry mapping. As low-dose CT tary Fig. S7). We then performed another logistic regression scans have enabled lung cancer screening that can significantly including both the cross-validated local ancestry risk score reduce lung cancer mortality (30, 31), we believe that the iden- and global ancestry as covariates and found that the local tification of germline allele(s) predisposing to the development ancestry risk score was significantly associated with EGFR of EGFR-mutant or KRAS-mutant lung adenocarcinoma from and KRAS mutations, respectively, whereas global ancestry large, Latin American lung cancer sample sets may improve was no longer significant in the joint model (Fig. 4B). In our understanding of the biological causes of EGFR and KRAS contrast, the local ancestry risk score was not associated mutations and the evolutionary processes in lung cancers. with TMB and STK11 mutations in a joint model with global Such findings could therefore shed light on prevention and ancestry. Finally, although previous work suggested associa- early-detection strategies for lung cancer in Latin America and tions between cis alleles and EGFR mutations (24), we did not beyond, particularly for nonsmokers. observe an association between the local ancestry of the EGFR locus and somatic mutations in EGFR (P = 0.8). Moreover, when including the local ancestry of the EGFR locus as a Methods covariate, the association of EGFR mutations and the local Sample Collection ancestry risk score remained significant (P = 4 × 10−6). Our The protocol of this work was approved by the ethical and sci- finding suggests that one or more genetic loci specific to NAT entific committees of the Instituto Nacional de Cancerologia in ancestry may modulate the evolution of lung cancer tumors Mexico City, Mexico, the Foundation for Clinical and Applied Cancer to harbor EGFR or KRAS mutations in the Latin American Research in Bogotá, Colombia, and the Dana-Farber Cancer Institute populations. in Boston, Massachusetts, for detecting EGFR mutations and further genomic analysis. Biopsies were collected for histologic diagnosis by the pathology departments of Instituto Nacional de Cancerologia Discussion and Foundation for Clinical and Applied Cancer Research. In summary, the genomic landscape of lung adenocarci- nomas is strikingly varied in Latin American patients with Library Preparation and Sequencing mixed ancestries. In our study of 1,153 lung cancers, we dem- Genomic DNA was extracted from fresh-frozen, blood and paraffin- onstrated that NAT ancestry was correlated with somatic embedded samples by a standard procedure using the Wizard Genom- driver alterations, including EGFR and KRAS mutations ics DNA Kit (Promega) according to the manufacturer’s instructions. that can be effectively targeted by small-molecule inhibitors DNA was fragmented to 250 bp, and size-selected DNA was ligated to sample-specific barcodes. A custom targeted hybrid capture sequenc- to prolong survival (4, 25), and TMB and STK11 that are ing platform (OncoPanel) was used to assay genomic alterations in potential prognostic biomarkers in patients with lung cancer tumor DNA (16). Each library was quantified by sequencing on an Illu- (26, 27). The ancestry and TMB association was independ- mina MiSeq nano flow cell (Illumina). Libraries were pooled in equal ent of smoking-related mutational processes, and there- mass to a total of 500 ng for enrichment using the Agilent SureSelect fore further investigation on the impact of ancestry-related hybrid capture kit (Agilent Technologies; cat. no. G9611A). Libraries TMB differences on the response to checkpoint inhibitors is were sequenced on an Illumina HiSeq2500 or HiSeq3000. Pooled sam- needed (28). Of note, our TMB estimates may be susceptible ples were demultiplexed using the Picard tools (https://broadinstitute. to germline contaminations due to the lack of matched github.io/picard/). Paired reads were aligned to the hg19 reference normal samples. If germline variants specific to the Mexican genome using BWA (32) with the following parameters “-q 5 -l 32 -k 2 or Colombian population could not be sufficiently filtered -o 1.” Aligned reads were sorted and duplicate-marked using Picard. In each batch, we sequenced a control DNA sample as a “plate normal.” (due to smaller germline reference panels), individuals with For a subset of 44 cases, the same libraries were sequenced on Illumina higher NAT ancestry would have more germline contamina- NovaSeq for low-coverage whole-genome sequencing at 1× coverage. tion, and the anticorrelation between TMB and NAT ances- try may thus be even more significant than we have observed. Mutation Analysis Furthermore, due to the lack of matched germline samples Mutation detection for SNVs was performed using MuTect v1.1.4 and the use of panel sequencing data, we tested only known, (33) in paired mode by pairing each sample to a control DNA sample hotspot mutations and protein-truncating mutations in profiled with the same OncoPanel. SomaticIndelDetector (34) was lung cancer driver genes. Future studies will be needed to used for indel calling. Mutations were annotated by variant effect comprehensively characterize lung cancer genomes from predictor (VEP; ref. 35) and Oncotator (36). Called variants with Latin American patients. a frequency greater than or equal to 0.01% that were found in the March 2021 CANCER DISCOVERY | OF5 Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 RESEARCH BRIEF Carrot-Zhang et al. Genome Aggregation Database (gnomAD; ref. 37), containing 25,748 of NAT ancestry than the Colombian population (15), we accounted exomes, were excluded. TMB was calculated by dividing the total for the country of sample collection throughout our analyses. TMB number of coding, nonsilent mutations in an individual by the target was included as a covariate when associating PC3 with mutations. size (3 MB). Mutational signatures were called using SignatureAna- Total SCNA burden was included as a covariate when associating PC3 lyzer (38) with SNVs classified by 96 trinucleotide contexts. Prior with SCNA of lung cancer genes. Gender, proportion of smoking, studies have shown that mutational signature analysis can be inferred and APOBEC signatures were included as covariates when associating based on on-target reads from clinical panel sequencing (8, 39). the percentage of NAT ancestry with oncogenic mutations. To test Smoking (COSMIC signature 4) and APOBEC signature (COSMIC whether smoking signature influences the relationship between ances- signature 2) activities were inferred by the estimated number of try and the KRAS mutations, the following model was performed: mutations in a trinucleotide context associated with each signature. KRAS ~ NAT ancestry + smoking signature + A custom script (https://github.com/jcarrotzhang/ancestry-from- NAT*smoking + other covariates panel) was applied to inspect the sites of hotspot driver mutations in EGFR, KRAS, BRAF, ERBB2, MET, and TP53. For each mutation, we Where gender and country of sample collection were considered as counted reads supporting the reference base and the altered base, after covariates, and NAT*smoking was included as an interaction effect. filtering out reads with base quality or mapping quality less than 20 (40). If the interaction term is not significant, that means that smoking A mutation was called if the total read count was greater than 5, the signature activity does not modify the effect of ancestry. altered read count was greater than 2, and the mutant allele frequency To identify specific genomic region(s) associated with lung adeno- was greater than 5%. Identified mutations with total coverage lower than carcinoma cases harboring certain somatic alterations, a logistic 30× were manually inspected using Integrative Genomics Viewer (41). regression model was applied controlling for the percentage of NAT ancestry, TMB, and country of sample collection, followed by Copy Number and Rearrangements 2 genomic control . Ten-fold cross-validation was performed in Read coverage was calculated at 1 MB bins across the genome and the following steps (Supplementary Fig. S7): the whole data set was was corrected for guanine-cytosine (GC) content and mappability biases split into 10 subsets. Z-scores for the mutation–ancestry associations using ichorCNA version 0.1.0 (42) using the plate normal as the matched for each genomic region were calculated using nine subsets, and a control for each sample. GISTIC version 2.0.22 (43) was applied to iden- cross-validated local ancestry risk score (sum of the NAT ancestry tify focal and arm-level SCNAs on ichorCNA-generated copy-number across each associated region weighted by the Z-score of that region) segments, with the high-level amplification defined as log2-transformed was calculated for each sample on the held-out subset. These steps copy-number ratio greater than 0.7. Rearrangement events were called were repeated 10 times until a local ancestry risk score was generated by Breakmer (44) and filtered on discordant read counts and split read for each sample: counts greater than 0. Total SCNA burden and the degree of aneuploidy local ancestry risk score i0 AiZi , n was defined by the number of genes or chromosomal arms affected by SCNAs, respectively (copy-number ratio > 0.1 or > −0.1). where n is the number of regions associated with the somatic feature Ancestry Analysis from Genotyping Array (P < 0.05), A is the ancestry of each associated region (NAT ancestry The Multi-Ethnic Genotyping Array (MEGA) was used for geno- was coded as 1, and EUR or AFR ancestry was coded as 0), and Z is typing of paired fresh-frozen tumor tissue and blood samples. We the Z-score of that associated region. used PLINK version 1.9 (45) to filter out variants with missing rate greater than 2%, or failed Hardy–Weinberg equilibrium test (P < Data and Code Availability 1 × 10−6). Markers with allele frequency less than 1% in the 1000 Raw sequencing data from cancer gene panel sequencing are depos- Genomes data set were also excluded. The Mexican and Colombian ited at European Genome-phenome Archive (EGA) under the acces- samples were merged with samples from the 1000 Genomes phase III sion code EGAS00001004752. Ancestry identification method from (15), and PCA was performed on the merged data set using genome- panel sequencing and custom code used in the analyses are avail- wide complex trait analysis (GCTA) version 1.91.6 (46). able at https://github.com/jcarrotzhang/ancestry-from-panel. This code and data are also available at https://codeocean.com/capsule/ Ancestry Analysis from Sequencing 6075183. All other data supporting the findings of this study are avail- SAMtools (47) was used to genotype germline variants after filtering able upon reasonable request from the corresponding authors. out reads with base quality or mapping quality less than 20. LASER version 2.04 (48) was used to estimate overall ancestry based on Authors’ Disclosures 637,037 germline variants from all populations in the Human Genome G. Ha reports a patent for methods for genome characterization Diversity Project (HGDP; ref. 49). We obtained principal components pending. G.R. Oxnard reports other from Foundation Medicine from LASER results that place each sample into a reference PCA space and Roche outside the submitted work. O. Arrieta reports personal using 939 HGDP samples as reference samples. For local ancestry fees from Pfizer, Lilly, Merck, and Bristol Myers Squibb and grants identification, phasing and imputation were performed using Beagle and personal fees from AstraZeneca, Boehringer Ingelheim, and version 4.1 (50) based on SAMtools genotyped variants. We imputed Roche outside the submitted work. A.F. Cardona reports grants, per- missing variants using phased haplotypes from 1000 Genomes (15). sonal fees, and non-financial support from Merck Sharp & Dohme, Ancestry was assigned to each SNP using RFMix v2 (20). For each Boehringer Ingelheim, Roche, Bristol-Myers Squibb, Foundation parental population (NAT, AFR, and EUR), 500 samples from 1000 Medicine, and Astra Zeneca; non-financial support from BioNTech, Genomes were used as reference samples. Local ancestry regions span- Rochem Biocare, and INQBox; grants and personal fees from Amgen ning centromeres were filtered. RFMix outputted global ancestry esti- and Bayer; personal fees and non-financial support from Foundation mates were used as the percentage of NAT ancestry. for Clinical and Applied Cancer Research; personal fees from EISAI, Merck Serono, Jannsen Pharmaceuticals, Pfizer, Celldex, and Eli Lilly, Association Analysis and other from Illumina, Thermo Fisher, and Roche Diagnostics Multivariable logistic regression or linear regression was performed outside the submitted work. M. Meyerson reports personal fees from using a Python module (https://www.statsmodels.org). Because the OrigiMed and grants from Bayer, Janssen, Novo, and grants Ono Mexican population of patients with lung cancer has a higher level outside the submitted work; in addition, M. Meyerson has a patent OF6 | CANCER DISCOVERY March 2021 AACRJournals.org Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 Genetic Ancestry Affects Somatic Alterations in Lung Cancers RESEARCH BRIEF for EGFR mutations for lung cancer diagnosis issued, licensed, and 8. Campbell JD, Lathan C, Sholl L, Ducar M, Vega M, Sunkavalli A, et al. with royalties paid from LabCorp and a patent for EGFR inhibitors Comparison of prevalence and types of mutations in lung cancers pending to Bayer; and was a founding advisor of, consultant to, and among black and white populations. JAMA Oncol 2017;3:801–9. equity holder in Foundation Medicine, shares of which were sold to 9. Arrieta O, Cardona AF, Martín C, Más-López L, Corrales-Rodríguez L, Roche. No disclosures were reported by the other authors. Bramuglia G, et al. Updated frequency of EGFR and KRAS mutations in nonsmall-cell lung cancer in Latin America: the Latin-American Authors’ Contributions consortium for the investigation of lung cancer (CLICaP). J Thorac Oncol 2015;10:838–43. J. Carrot-Zhang: Formal analysis, methodology, writing–original 10. Gimbrone NT, Sarcar B, Gordian ER, Rivera JI, Lopez C, Yoder SJ, draft, writing–review and editing. G. Soca-Chafre: Resources, writing– et al. Somatic mutations and ancestry markers in hispanic lung can- review and editing. N. Patterson: Methodology. A.R. Thorner: cer patients. J Thorac Oncol 2017;12:1851–6. Resources, writing–review and editing. A. Nag: Resources, writing– 11. Leal LF, de Paula FE, De Marchi P, de Souza Viana L, Pinto GDJ, review and editing. J. Watson: Resources. G. Genovese: Methodology. Carlos CD, et al. Mutational profile of Brazilian lung adenocarci- J. Rodriguez: Resources. M.K. Gelbard: Resources. L. Corrales-Rodriguez: noma unveils association of EGFR mutations with high Asian ances- Resources. Y. Mitsuishi: Resources, writing–review and editing. try and independent prognostic role of KRAS mutations. Sci Rep G. Ha: Methodology, writing–review and editing. J.D. Campbell: 2019;9:3209. Methodology, writing–review and editing. G.R. Oxnard: Resources, 12. Chen J, Yang H, Teo ASM, Amer LB, Sherbaf FG, Tan CQ, et al. writing–review and editing. O. Arrieta: Conceptualization, resources, Genomic landscape of lung adenocarcinoma in East Asians. Nat writing–review and editing. A.F. Cardona: Conceptualization, Genet 2020;52:177–86. resources, writing–review and editing. A. Gusev: Supervision, writing– 13. Sinha S, Mitchell KA, Zingone A, Bowman E, Sinha N, Schäffer AA, review and editing. M. Meyerson: Conceptualization, supervision, et al. Higher prevalence of homologous recombination deficiency in writing–review and editing. tumors from African Americans versus European Americans. Nature Cancer 2020;1:112–21. Acknowledgments 14. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing native american population history. Nature We would like to acknowledge the patients from Mexico and 2012;488:370–4. Colombia for their participation in this study. This study is sup- 15. Consortium T 1000 GP, The 1000 Genomes Project Consortium. A ported by the V Foundation Translational Research Award (the global reference for human genetic variation. Nature 2015;526:68–74. Stuart Scott Memorial Fund) and by NCI grants R35 CA197568, R01 16. Hanna GJ, Lizotte P, Cavanaugh M, Kuo FC, Shivdasani P, Frieden A, CA116020, and R01 CA227237, as well as an anonymous gift to the et al. Frameshift events predict anti-PD-1/L1 response in head and Dana-Farber Cancer Institute. M. Meyerson is an American Cancer neck cancer. JCI Insight 2018;3:e98811. Society Research Professor. J. Carrot-Zhang holds the CIHR Banting 17. Yuan J, Hu Z, Mahal BA, Zhao SD, Kensler KH, Pi J, et al. Integrated fellowship. J.D. Campbell is funded by the LUNGevity Career Devel- analysis of genetic ancestry and genomic alterations across cancers. opment award. We thank the Center for Cancer Genomics at Dana- Cancer Cell 2018;34:549–60. Farber Cancer Institute and the Genomic Platform at Broad Institute 18. Carrot-Zhang J, Chambwe N, Damrauer JS, Knijnenburg TA, Robert- for their sequencing and genotyping efforts, and we thank Aruna son AG, Yau C, et al. Comprehensive analysis of genetic ancestry and Ramachandran and Kar-Tong Tan for helpful discussions. its molecular correlates in cancer. Cancer Cell 2020;37:639–54. 19. Wang C, Zhan X, Liang L, Abecasis GR, Lin X. Improved ancestry estimation for both genotyping and sequencing data using projec- Received August 11, 2020; revised October 26, 2020; accepted tion procrustes analysis and genotype imputation. Am J Hum Genet November 19, 2020; published first December 2, 2020. 2015;96:926–37. 20. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discrimi- native modeling approach for rapid and robust local-ancestry infer- References ence. Am J Hum Genet 2013;93:278–88. 1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global 21. Koivunen JP, Kim J, Lee J, Rogers AM, Park JO, Zhao X, et al. Muta- cancer statistics 2018: GLOBOCAN estimates of incidence and mor- tions in the LKB1 tumour suppressor are frequently detected in tality worldwide for 36 cancers in 185 countries. CA Cancer J Clin tumours from Caucasian but not Asian lung cancer patients. Br J 2018;68:394–424. Cancer 2008;99:245–52. 2. Raez LE, Cardona AF, Santos ES, Catoe H, Rolfo C, Lopes G, et al. The 22. Zaitlen N, Pasaniuc B, Sankararaman S, Bhatia G, Zhang J, Gusev A, burden of lung cancer in Latin-America and challenges in the access et al. Leveraging population admixture to characterize the heritability to genomic profiling, immunotherapy and targeted treatments. Lung of complex traits. Nat Genet 2014;46:1356–62. Cancer 2018;119:7–13. 23. Florez JC, Price AL, Campbell D, Riba L, Parra MV, Yu F, et al. Strong 3. Cancer Genome Atlas Research Network. Comprehensive molecular association of socioeconomic status with genetic ancestry in Latinos: profiling of lung adenocarcinoma. Nature 2014;511:543–50. implications for admixture studies of type 2 diabetes. Diabetologia 4. Paez JG, Jänne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al. EGFR 2009;52:1528–36. mutations in lung cancer: correlation with clinical response to gefi- 24. Liu W, He L, Ramírez J, Krishnaswamy S, Kanteti R, Wang Y-C, et al. tinib therapy. Science 2004;304:1497–500. Functional EGFR germline polymorphisms may confer risk for EGFR 5. Shi Y, Au JS-K, Thongprasert S, Srinivasan S, Tsai C-M, Khoa MT, et al. somatic mutations in non-small cell lung cancer, with a predominant A prospective, molecular epidemiology study of EGFR mutations in effect on exon 19 microdeletions. Cancer Res 2011;71:2423–7. Asian patients with advanced non-small-cell lung cancer of adenocar- 25. Canon J, Rex K, Saiki AY, Mohr C, Cooke K, Bagal D, et al. The clini- cinoma histology (PIONEER). J Thorac Oncol 2014;9:154–62. cal KRAS(G12C) inhibitor AMG 510 drives anti-tumour immunity. 6. Midha A, Dearden S, McCormack R. EGFR mutation incidence in Nature 2019;575:217–23. non-small-cell lung cancer of adenocarcinoma histology: a systematic 26. Johnson DB, Frampton GM, Rioth MJ, Yusko E, Xu Y, Guo X, et al. review and global map by ethnicity (mutMapII). Am J Cancer Res Targeted next generation sequencing identifies markers of response 2015;5:2892–911. to PD-1 blockade. Cancer Immunol Res 2016;4:959–67. 7. Steuer CE, Behera M, Berry L, Kim S, Rossi M, Sica G, et al. Role of 27. Arbour KC, Jordan E, Kim HR, Dienstag J, Yu HA, Sanchez-Vega F, race in oncogenic driver prevalence and outcomes in lung adenocar- et al. Effects of co-occurring genomic alterations on outcomes in cinoma: results from the Lung Cancer Mutation Consortium. Cancer patients with KRAS-mutant non-small cell lung cancer. Clin Cancer 2016;122:766–72. Res 2018;24:334–40. March 2021 CANCER DISCOVERY | OF7 Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 RESEARCH BRIEF Carrot-Zhang et al. 28. Berland L, Heeke S, Humbert O, Macocco A, Long-Mira E, Lassalle S, 40. Carrot-Zhang J, Majewski J. LoLoPicker: detecting low allelic-fraction et al. Current views on tumor mutational burden in patients with variants from low-quality cancer samples. Oncotarget 2017;8:37032–40. non-small cell lung cancer treated by immune checkpoint inhibitors. 41. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, J Thorac Dis 2019;11:S71–80. Getz G, et al. Integrative genomics viewer. Nat Biotechnol 2011;29: 29. Lynch JA, Berse B, Rabb M, Mosquin P, Chew R, West SL, et al. Under- 24–6. utilization and disparities in access to EGFR testing among Medicare 42. Adalsteinsson VA, Ha G, Freeman SS, Choudhury AD, Stover DG, patients with lung cancer from 2010–2013. BMC Cancer 2018;18:306. Parsons HA, et al. Scalable whole-exome sequencing of cell-free DNA 30. Kovalchik SA, Tammemagi M, Berg CD, Caporaso NE, Riley TL, reveals high concordance with metastatic tumors. Nat Commun Korch M, et al. Targeting of low-dose CT screening according to the 2017;8:1324. risk of lung-cancer death. N Engl J Med 2013;369:245–54. 43. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, 31. Duffy SW, Field JK. Mortality reduction with low-dose CT screening Getz G. GISTIC2.0 facilitates sensitive and confident localization of for lung cancer. N Engl J Med 2020;382:572–3. the targets of focal somatic copy-number alteration in human can- 32. Li H, Durbin R. Fast and accurate short read alignment with Bur- cers. Genome Biol 2011;12:R41. rows–Wheeler transform. Bioinformatics 2009;25:1754–60. 44. Abo RP, Ducar M, Garcia EP, Thorner AR, Rojas-Rudilla V, Lin L, et al. 33. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez BreaKmer: detection of structural variation in targeted massively par- C, et al. Sensitive detection of somatic point mutations in impure and allel sequencing data using kmers. Nucleic Acids Res 2015;43:e19. heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. 45. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender 34. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kerny- D, et al. PLINK: a tool set for whole-genome association and popula- tsky A, et al. The genome analysis toolkit: a MapReduce framework tion-based linkage analyses. Am J Hum Genet 2007;81:559–75. for analyzing next-generation DNA sequencing data. Genome Res 46. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome- 2010;20:1297–303. wide complex trait analysis. Am J Hum Genet 2011;88:76–82. 35. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. 47. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Deriving the consequences of genomic variants with the ensembl API The sequence alignment/map format and SAMtools. Bioinformatics and SNP effect predictor. Bioinformatics 2010;26:2069–70. 2009;25:2078–9. 36. Ramos AH, Lichtenstein L, Gupta M, Lawrence MS, Pugh TJ, Sak- 48. Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew sena G, et al. Oncotator: cancer variant annotation tool. Hum Mutat EY, et al. Ancestry estimation and control of population stratification 2015;36:E2423–9. for sequence-based association studies. Nat Genet 2014;46:409–15. 37. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang 49. Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Q, et al. The mutational constraint spectrum quantified from varia- et al. Insights into human genetic variation and population history tion in 141,456 humans. Nature 2020;581:434–43. from 929 diverse genomes. Science 2020;367:eaay5012. 38. Kim J, Mouw KW, Polak P, Braunstein LZ, Kamburov A, Kwiatkowski 50. Browning SR, Browning BL. Rapid and accurate haplotype phasing and DJ, et al. Somatic ERCC2 mutations are associated with a distinct missing-data inference for whole-genome association studies by use of genomic signature in urothelial tumors. Nat Genet 2016;48:600–6. localized haplotype clustering. Am J Hum Genet 2007;81:1084–97. 39. Touat M, Li YY, Boynton AN, Spurr LF, Iorgulescu JB, Bohrson CL, 51. Campbell JD, Alexandrov A, Kim J, Wala J, Berger AH, Pedamallu CS, et al. Mechanisms and therapeutic implications of hypermutation in et al. Distinct patterns of somatic genome alterations in lung adenocar- gliomas. Nature 2020;580:517–23. cinomas and squamous cell carcinomas. Nat Genet 2016;48:607–16. OF8 | CANCER DISCOVERY March 2021 AACRJournals.org Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
Published OnlineFirst December 2, 2020; DOI: 10.1158/2159-8290.CD-20-1165 Genetic Ancestry Contributes to Somatic Mutations in Lung Cancers from Admixed Latin American Populations Jian Carrot-Zhang, Giovanny Soca-Chafre, Nick Patterson, et al. Cancer Discov Published OnlineFirst December 2, 2020. Updated version Access the most recent version of this article at: doi:10.1158/2159-8290.CD-20-1165 Supplementary Access the most recent supplemental material at: Material http://cancerdiscovery.aacrjournals.org/content/suppl/2020/11/24/2159-8290.CD-20-1165.DC1 E-mail alerts Sign up to receive free email-alerts related to this article or journal. Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at pubs@aacr.org. Permissions To request permission to re-use all or part of this article, use this link http://cancerdiscovery.aacrjournals.org/content/early/2021/02/13/2159-8290.CD-20-1165. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site. Downloaded from cancerdiscovery.aacrjournals.org on March 12, 2021. © 2020 American Association for Cancer Research.
You can also read