Genetic discovery in a million people - where do we go from here? - UK Biobank

Page created by Jorge Cruz
 
CONTINUE READING
Genetic discovery in a million people - where do we go from here? - UK Biobank
Genetic discovery in a million people –
    where do we go from here?

                   Cristen Willer, PhD
                   Associate Professor
  Frank N. Wilson Professor of Cardiovascular Medicine

           Department of Internal Medicine
            Department of Human Genetics
Department of Computational Medicine and Bioinformatics
Genetic discovery in a million people - where do we go from here? - UK Biobank
“The team, the team, the team”
                                   - Bo Schembechler

Sarah Graham     Jonas Nielsen

Brooke Wolford    Wei Zhou

Ida Surakka      Whitney Hornsby
Genetic discovery in a million people - where do we go from here? - UK Biobank
Why do we study the genetics of
 human diseases and traits?
Genetic discovery in a million people - where do we go from here? - UK Biobank
Why do we study the genetics of
       human diseases and traits?
                      So people can live healthy,
                      active, long lives -- avoiding
                      premature death due to heart
                      disease

Hopefully, genes involved in the trait, identified through naturally occurring
variation in humans, can become leads for prevention and treatment.
Lastly, perhaps we can predict who would most benefit from preventive lifestyle
changes, medical screening, or treatment.
Genetic discovery in a million people - where do we go from here? - UK Biobank
Genetics for improved treatment of disease
                       Biobank
                                 Statistical rigor        Clinical
                                                           trials
     Enrollment

                                                              New
                                                           therapeutics
                  QC

                                                 PNPLA5
                                    APOE
                                                                     Experimental
                                                                         model
                                                                        systems
Genetic discovery in a million people - where do we go from here? - UK Biobank
Biobanks amortize the cost of GWAS across traits

  Genotypes x
  phenotypes
Genetic discovery in a million people - where do we go from here? - UK Biobank
HUNT GWAS of 1,400 traits (N~70k)
     Low-pass genomes of 2,202
     Combined with HRC reference

                                     +

     Imputed into ~70k HUNT study            1,400 diseases and
                                             quantitative traits

                      ~3 days in the cloud                         Built a results viewer

                        http://pheweb.sph.umich.edu:5003/pheno/594.1
Genetic discovery in a million people - where do we go from here? - UK Biobank
Colorectal Cancer (N=4,562 cases)
                                                 BOLT-LMM
         SAIGE                                                             Logistic
                                                                                        Saddlepoint
                                                                                        approximati
                                                                            mixed
 Scalable and Accurate                                                      model
                                                                                            on
  Implementation of                                                         Sample
                                                                                        Unbalanced
                                                                                        case-control
                                                                          relatedness
GEneralized mixed model                          SAIGE
                                                                                           ratio

                                                                                 Optimization
                                                                                  strategies
                                                                               Large scale data

           • SAIGE is implemented as an open-source R package available at
              • https://github.com/weizhouUMICH/SAIGE/

           • The GWAS results for 1,403 binary phenotypes (3 days) with the
             PheCodes in UK Biobank using SAIGE are currently available:
              • https://www.dropbox.com/sh/wuj4y8wsqjz78om/AAACfAJK54Ktvn
                zSTAoaZTLma?dl=0

           • Michigan PheWeb http://pheweb.sph.umich.edu/UKBiobank

Zhou et al., Nature Genetics, 2018                                                         Shawn Lee & Wei Zhou
Genetic discovery in a million people - where do we go from here? - UK Biobank
Atrial fibrillation – basic mechanisms

• Irregularities in heart beat
• 62% heritability in twin study (Christophersen, Circ Arrhythm Electrophysiol, 2009)
                                                                                        Jonas Nielsen
Genetic discovery in a million people - where do we go from here? - UK Biobank
• As recently as 2010,
  arrhythmias were
  primarily thought to
  be caused by ion
  channel dysfunction
Meta-analysis for Atrial Fibrillation                              Nielsen et al., Nat Genet 2018

                             deCODE                  DiscovEHR/MyCODE                     MGI
                                                   European ancestry (USA)       European ancestry (USA)
Statistical Genetics

                             Iceland
                         13,471 AF cases                6,679 AF cases               1,226 AF cases                    60,620 AF
                         358,161 controls               41,803 controls              11,049 controls                   cases

                                    HUNT-MI                     UK biobank                    AFGen                    970,216
                                     Norway                      European                Mostly European               controls
                                 6,493 AF cases               14,820 AF cases            17,931 AF cases
                                 63,142 controls              380,919 controls           115,142 controls

                       163 independent risk variants at 111 loci
                       Prioritized 163 functional candidate genes likely to be involved in AF
Atrial fibrillation GWAS
(111 loci, 80 novel loci)
                 PITX2
                 7x10-443

                            Nielsen et al., Nat Genet 2018
Candidate functional genes
                     by biological function
 Cardiac and Skeletal Muscle Function
                                                   TFs cardiac development
  AKAP6, COL25A, CFL2, DPT, MYH6,
                                            EPHA3, GTF2I, HAND2, NAV2, NKX2-5,
MYH7, MYO18B, MYO1C, MYOCD, MYOT,
                                           PITX2, SLIT3, SOX15, SOX5, TBX5, TGFB3
 MYOZ1, MYPN, PKP2, RBM20, SGCA,
 SSPN, SYNPO2L, TTN, TTN-AS, WIPF1

 Intracellular calcium handling in heart            Cardiac ion channels
 CALU, CAMK2D, CASQ2, PLN, S100A7A          GRIK4, KCNC2, KCND3, KCNH2, KCNJ5,
                                           KCNN2, KCNN3, SCN10A, SCN5A, SLC9B1

            Angiogenesis                             Hormone signaling
      TNFSF12, TNFSF12-TNFSF13              ESR2, IGF1R, JMJD1C, NR3C1, THRB1

                              Congenital heart defects
                         MYH6, NKX2-5, PITX2, TBC1D32, TBX5
                                                                     Nielsen et al., Nat Genet 2018
PheWAS demonstrates phenotypes
correlated with AF
Turn genetic association into biology

    RoadMap Epigenomics and ENCODE have catalogued regions of open chromatin for
    many tissue types

                                                                           Jonas Nielsen
Pathways enriched for AF genes include
failure of heart looping and abnormal
heart development

                     Results from DEPICT   Nielsen et al., Nat Genet 2018
AF associated variants show enrichment in regions of
open chromatin in fetal heart tissue

         Analyses performed using GREGOR and GARFIELD   Nielsen et al., Nat Genet 2018
AF Association at the MYH6/MYH7 locus

               rs422068, intronic to MYH6
Rabbit hearts with mechanically
 induced HF demonstrate arrhythmia
 and increased expression of MYH7

Todd Herron, José Jalife
Life-time risk of AF based on genetic risk score

                                     Nielsen et al., AJHG, 2018
Biobanks allow us to study many phenotypes
                                                            157 loci for estimated
                                                            Glomerular Filtration
Genotypes x                                                      Rate (eGFR)
phenotypes            Rare LOF indel                             53 new loci
              42% risk of fracture to carriers               Sex-specific effects
                (32% risk for BMD < 2 SD)
                                                              Thyroid Stimulating
                                                                   Hormone
                                                               66 loci – 28 novel
                                                                   Pleiotropy
                                      12 Liver-related blood traits
                                      89 coding variants (11 LoF)
                                      17 have impact > 1 SD
Global Lipids Genetics Consortium

                                    22
Goal: longer, healthier lives (prevent premature death)
Where do I think the future lies?
1. Genetic discovery
   • Dual purposes: pharmaceutical targets and identifying high-risk
     individuals
   • Focus on coding variation has proven useful
2. Functional fine-mapping by encouraging collaboration between
   genetics discovery and molecular biology (iPSC, animal models,
   GTEx, single-cell)
3. Prediction/prevention
4. Return results to participants (large effect variants & polygenic
   risk scores) to improve uptake of lifestyle intervention (?)
How to make UK biobank even more useful!
• Quantify accuracy of any unusual measurements to standard clinical
  tests (i.e. blood lipid levels, eGFR, heel estimated bone mineral
  density, etc.)
• Self-reported family history information for more phenotypes
• Specialty clinic ascertainment (cardiac surgery, for example, for
  bicuspid aortic valve)
• Imputation of more variants from TOPMed
• Full exome or genome sequencing (and reference sequences for
  imputation into other cohorts)
Acknowledgements
     HUNT-MI            Working Team              DiscovEHR/MyCode    Functional collaborators
  Kristian Hveem         Jonas Nielsen                Regeneron           Y. Eugene Chen
   Lars Fritsche            Wei Zhou                Tanya Teslovich           Bo Yang
  Oddgeir Holmen         Sarah Graham                  Aris Baras           Todd Herron
 Maiken Gabrielsen        Ida Surakka              Shane McCarthy         Pepe José Jalife
Anne Heidi Skogholt     Brooke Wolford
                        Hyun Min Kang                  Geisinger               GLGC
       MGI                 Shawn Lee                  David Carey          Sek Kathiresan
 Chad Brummett                                                             Mike Boehnke
  Michael Mathis            deCODE                   Rabbit Model           Gina Peloso
 Sachin Kheterpal       Kari Stefansson              Todd Herron        Pradeep Natajaran
                      Daniel Gudbjartsson             Jose Jalife      All the GLGC cohorts
AFGen Consortium          Hilma Holm
  Patrick Ellinor     Rosa Thorolfsdottir

                                                                                                 25
                                         cristen@umich.edu
A “proxy case” is an unaffected first or second
degree relative of a case
                   F=1       F=0                        F=0.5        F=0
                                                                 &
               A         &                               C

                   F=1       F=0                       F=1 F=0.5 F=0
                         &                                   &       &
               B                                       D

                                                                         Brooke
     Liu & Pickrell, Nat Genet, 2017 “GWAS by proxy”                     Wolford   26
Unaffected relatives of cases have
intermediate risk allele frequency

                                     27
Unaffected relatives of cases have
intermediate risk allele frequency

                                     28
Power improves after modeling
unaffected
1,000         cases
      simulations at MAF 0.1

                                                                        Power
A - Standard GWAS                                   1.00

B - Exclude unaffected

                              Power at alpha=5e−8
                                                    0.75
                                                                                                scheme
relatives of cases                                                                                A
                                                    0.50                                          B
                                                                                                  C
C – Exclude cases (GWAX)                                                                          D
                                                    0.25

D – Model unaffected
                                                    0.00
relatives of cases with 0.5                                1.0   1.1                1.2   1.3
liability                                                              Odds Ratio

                                                                                                         29
BOLT-LMM: Linear Mixed Model
                                              Venous thromboembolism
                                               (VTE)
                                              • 2,325 Cases
                                              • 65,294 Controls
                                              • Case: Control = 0.036

                                                 λall = 1.047

GMMAT: Logistic Mixed Model

                                                 λall = 1.015

SPA-GMMAT: Logistic Mixed Model + SPA tests

                                                 λall = 1.015

                                                                        30
Simultaneous consideration of lipid-lowering
variants that are protective against liver disease
Figure 1. ZNF529
silencing induces
LDLR expression and
LDL-C uptake.
LoF carriers and bone mineral density

                             Frameshift indel in MEPE
                             0.8% frequency in Norway

                             Impact ↓ -0.5 SD on BMD

                             ↑ Fracture risk
                             OR 1.4 – 1.8 (p ~ 10-5)

                             Carrier fracture risk: 42%

                                                          572347
You can also read