Next Generation Sequencing: An Overview

Page created by Phillip Day
 
CONTINUE READING
Next Generation Sequencing: An Overview
Next Generation Sequencing:
       An Overview
Next Generation Sequencing: An Overview
Next Generation Sequencing: An Overview
DNA Sequencing
• Refers to determining the
  order of nucleotide (G, A, T,
  and C) in a stretch of DNA
• Useful in biotechnology
  research and discovery,
  diagnostics, and forensics.
Next Generation Sequencing: An Overview
What is Sequencing?...............How does it work?

DNA Sequencing

DNA sequencing = determining the nucleotide sequence (A, T, G, and C)
   of the DNA of a gene.

We are currently using the Sanger’s method.
Next Generation Sequencing: An Overview
Sanger Sequencing
• Utilizes dideoxunucleotides triphosphates to terminate DNA chain
  elongation.
• Separation of molecules by gel or capillary electorphoresis and
  detection of dye-labelled terminator
• Can be used to interrogate the sequence of single samples.
• 96 samples can be run in SBS format with 24-36 runs per day. A
  single instrument can generate 1-2 million bases per day.
Next Generation Sequencing: An Overview
Sanger Sequencing Workflow
Next Generation Sequencing: An Overview
Sanger Sequencing Pipeline
   Library                                Picking &
                                                                   Template Prep
   Construction                           Growth

                                                        Plasmids

                       PCR
                       Amplicons
  PCR Cycling                            Seq. Setup /    AxyPrep Mag Plasmid
  & CleanUp                              Cycling

                AxyPrep Mag PCR Clean up
               AxyPrep Mag PCR Normalizer
Ready to Sequence Templates
                                         3730xl
  Seq Setup/                               3730 XL
                                         Analysis
  Cycling                                  Analysis

                  AxyPrep Mag DyeClean
Next Generation Sequencing: An Overview
From Slab Gels to Single Molecule Sequencers

• Late 1980s Slab Gel sequencers using radioactive isotopes
and later Fluorescence chemistry (10 Kb per 4 hr run).
• Late 1990s Capillary sequencers (50 Kb per 1hour run)
• 2005 Massive parallel pyrosequencing (20 MB per 5 hr run)
• 2007 Sequencing by synthesis (1 GB per 5 day run)
• 2010 Single molecule sequencing (100 GB per 5 day run)
• 2013 Human genome in 15 minutes

                                                              8
Next Generation Sequencing: An Overview
Next Gen Sequencing
• Employs micro- and nanotechnologies to reduce the size of sample
components, reducing reagent costs, and enabling massively parallel sequencing
reactions.
• Highly multiplexed, allowing simultaneous sequencing and analysis of millions of
samples.

                                                       Multiple cycles

                                                 T     G    C    T       A   C
Next Generation Sequencing: An Overview
Sanger vs. Next Gen
Next Gen v. Sanger
Traditional Sequencing vs. Next Generation Sequencing: Data Throughput

     1 x Illumina GAII                200+ of 3730xl

                               Vs.

                         Days vs. Years
           The Sequencing Landscape is Changing
Next Generation Sequencing
        Platforms
Second Generation Sequencing Throughput

Illumina Genome Analyzer IIx          Roche GS FLX                      Life Technologies SOLiD 3
                                                                        Plus
• Sequencing by synthesis using       • Sequencing by synthesis using
reversible flurorescent dye           chemiluminescence detection;      • Sequencing by ligation; in vitro
terminators using clonal single                                         sample prep;
                                      • 400 to 500 base reads
molecule array;                                                         • 35 to 2 by 50 base reads;
                                      • 1 million fragments of DNA in
• 35 to 50 base reads length                                            • 500 million to 1 billion shotgun
                                      parallel on picotitre plate
• 138 to 336 million shotgun reads                                      reads per 2 slide run;
                                   • 400 MB per 10 hr run
per run;                                                                • reference sequence required
• 4.5 to 36 GB per 2 to 9.5 day run                                     • 12 – 48 GB per 3.5 to 14 day 2-
                                                                        slide run.
Differentiating Next Gen technologies

Sheared       Library                     Clonal                        Sequencing
            construction                 template
DNA
                                       amplification

                                   Clonal amplification via    Massively parallel sequencing-by-synthesis
Illumina    Library Construction
                                   bridge amplification        of DNA clusters

                                   Clonal amplification with   Massively parallel pyrosequencing of bead
      454   Library Construction
                                   emulsionPCR and             bound DNA templates
                                   enrichment

                                   Clonal amplification with   Massively parallel ligation-based
 SOLiD      Library Construction
                                   emulsionPCR and             sequencing of bead bound DNA templates
                                   enrichment
Comparison of Next Gen Technologies

                       GS FLX         Genome Analyzer          SOLiD

Library            Fragment, Mate-    Fragment, Mate-    Fragment, Mate-
Construction       Paired             Paired, Paired-End Paired
Sequencing         Sequencing by      Sequencing by       Sequencing by
Chemistry          Synthesis          Synthesis           Ligation
DNA Support        25-35 µm bead      Flow cell surface   1 µ bead

Amplification      Emulsion PCR       Cluster             Emulsion PCR
                                      amplification
Sequencing         High density well- 8-channel flow      Single slide
Reaction Surface   plate              cell                imaged in panel
Illumina Genome Analyzer
          (GA)
Illumina Genome Analyzer
(GA, GAII, GAIIx)
• DNA Libraries bound to a 8 channel flowcell
• Sequencing by synthesis using reversible terminators           cBot
• Detection of fluorescent tagged bases
• Readlengths of up to 2 by 100 bases.

                                                                        Paired end module

                                               Cluster station
Illumina workflow
Library construction
1-4 days depending on application

Cluster Amplification
(Cluster station)
Automated, approx 1 hour to set
up, 5 hours run time

Sequencing
(GAII and Paired End module)
 Approx 2-9 days depending on
 read length and number of reads
Illumina GA - Cluster Generation
 Flow cell, reagents and samples loaded onto cluster station
 Aspirates samples and reagents into flow cell
 Automates the formation of amplified clonal clusters from single DNA
  molecules
 Approx 5 hours run time, 1 hour hands on time
Template amplification: no beads, no
           emulsions

         “Cluster generation”
             (walk-away)
Illumina GA - Cluster Generation
             (cont)
Illumina GA - Sequencing
• Flowcell and reagents loaded onto
Genome Analyzer.
• 1-2 hour loading time
• Walkaway automation
• 2 - 9 days run time depending on
read length
3’ 5’
                                 Sequencing By Synthesis (SBS)

                                             Cycle 1:   Add sequencing reagents

                                                        First base incorporated
            A
                             T
                             G                          Remove unincorporated bases
        C                    C
            G
    T                        T                          Detect signal
                             A
                    C        C                          Deblock and defluor
A                            G
            T
                             A
                                                        PPP             Base      Fluor
G                            T
                             A
            A
                             C
        T                    C
                             C
                G            G
                             A              Cycle 2-n: Add sequencing reagents and repeat
        C                    T
                             C
                             G
                             A
                             T

                        5’
Sequencing by Synthesis - 4
              Fluors

Synthesis of 2nd strand   Four nucs per cycle
Genome Analyzer IIx -
   Specifications
Roche Genome Sequencer GS
          FLX
Roche GS FLX
  (GS20, FLX, Titanium)

• Uses pyrosequencing – a process that
uses chemiluminescence for detection.

• Detection of light signal generated by
luciferase when complimentary bases are
incorporated into sequencing strand.

• Libraries are bound to beads which are
deposited onto a PPT plate for
sequencing.

• Long read lengths up to 400bp.

• Run time approx 10 hours.
GS FLX Workflow
 DNA Library Preparation                 emPCR and enrichment          Sequencing

 1 to 3 days                                   1.5 days                 1 day

DNA Library Preparation                 emPCR                     Sequencing

 Fragment DNA through nebulization      Water-in-oil emulsion    DNA beads
  or other means.                                                   placed in pico-
 Attach adapters to DNA fragments.      Fixes adapter-ligated     titre plate device.
 Prepare single-stranded DNA library     fragments to small       Uses
  with adapters.                          DNA-capture beads         pyrosequencing
 Recently, a rapid protocol has been                               chemistry
  introduced which eliminates the        Purification of          Detection using a
  need for making the adapter-ligated     amplified DNA             CCD camera
  DNA fragments single stranded.          colonies on beads        10 hr sequencing
                                                                    run
Emulsion PCR
Mix PCR aqueous phase into a water-in-oil (w/o)
emulsion and carry out emulsion PCR
Enrichment

Beads with amplified DNA are purified using magnetic enrichment beads.
Approximately 1/3 of beads have a product.
GS FLX: Bead
             deposition

                        DNA beads are loaded         DNA beads packed
                        into the wells of the PTP.   into wells with
                                                     surrounding beads
                                                     and sequencing
Empty PicoTiter slide                                enzymes.
GS FLX: Instrument Loading

          Genome is loaded     Load PicoTiterPlate
              into             into instrument.
          a PicoTiterPlate.

          Load reagents in a   Sequence entire genome
          single rack.         at once, in real-time.
GS FLX: Sequencing-by-
                         synthesis
                                     • Simultaneous
                                     sequencing of million of
                                     DNA library molecules in
                                     a pico-titre plate..

                                     • Pyrophosphate signal
                                     generation upon
                                     complimentary nucleotide
                                     incorporation — dark
                                     otherwise.
DNA capture bead
containing millions of
copies of a single
clonal fragment
GS FLX Sequencing-by-
      synthesis
                 Repeated dNTP flow
                 sequence:
                  G   T  C   A

                 Process continues until user-
                 defined number of nucleotide
                 flow cycles are completed.

     A   A   T   C   G   G   C   A   T   G   C   T   A   A   A   A   G   T   C   A
     T   T   A   G C C G         T   A C G C
                                           A         T   T   T   T C
                                                                 G
                                                                 A T C
                                                                   G T C
                                                                     A G
                                                                       A G
                                                                         T
                                                                                     Anneal Primer
Advantages/Disadvantages
• Advantages                     • Disadvantages
  – Q20 read lengths of 400        – Difficulty getting through
    bases (99% accuracy at the
                                     homopolymers
    400th base and higher for
    preceding bases)               – Each run is expensive and
  – significantly higher             hence not ideal for re-
    throughput compared to           sequencing applications
    Sanger sequencing                compared to the Genome
  – Does not rely on cloning         Analyzer and/or SOLiD
    efficiency
  – DNA libraries can be
    barcoded and separated
    during data analysis.
Life Technologies SOLiD
Life Technologies SOLiD
• Sequencing by Oligo Ligation and
Detection
• Libraries are bound to beads which
are covalently attached to a glass
support surface after emulsion PCR
• Uses fluorescently labelled oligomers
• Dibase encoding
• Read lengths up to 2 by 50bp
•Up to 8 samples/slide
Emulsion PCR
Enrichment  P2’
                                 Large
                    P1         Polystyrene   P2
                          P2                      P1
                               bead coated
                                 with P2

Centrifuge in                      Supernatant
glycerol gradient                  Captured beads with templates

                                   Pellet
                                   Beads with no template
Bead Deposition

      3’-end
modification

                            Beads attached to glass
                          surface in a random array
Slide deposition and installation
Sequencing by Ligation
Sequence Data Analysis

                                        • 4 dyes to encode 16, 2-base
                                          combinations
                                        • Each base is interrogated
                                          by two probes, two
                                          different ligation reactions
                                        • Dual interrogation eases
                                          discrimination errors
                                            – Random or systematic vs.
                                              True polymorphisms (SNPs)
Data is best analyzed in color space
  - Leverages di-based advantages
SOLiD 3 Plus Specifications
Multiplexing
What is multiplexing?
• Multiplexing: a method to analyze multiple biological samples
  in a single sample.
• Barcodes are unique sequence identifiers added to samples
  during library construction.
• Once barcodes are added, multiple libraries can be pooled
  together for emulsion PCR/cluster generation and sequencing.
• Sequence data is then analyzed and traced back to each source.
Multiplexing
•   Simpler workflow, ease-of-use
•   Lower running costs
•   Higher number of samples per run
      Standard Protocol                 Multiplexing Protocol
            8 Samples                  16 Samples     128 Samples

            8 Libraries                16 Libraries   128 Libraries

           8 Emulsions                 1 Emulsion     8 Emulsions
Why it Multiplexing Important?

 Next generation DNA sequencing generates massive amounts of
sequence data.
 Currently more data is generated per library than is required. To
overcome this, researchers multiplex multiple libraries into single lane.
 However, generation of libraries is a bottleneck. Most researchers
are not able to do this.
 SPRIworks will relieve this bottleneck by enabling researchers to
make more DNA libraries faster.
Summary of Next Generation
   Sequencing Platforms
Examples of Next Generation Sequencing Applications

De novo sequencing
 “De novo sequencing is the initial sequencing that results in the primary genetic
sequence of organisms. A detailed genetic analysis of an organism is possible only
after de novo sequencing has been performed.”
- Applied Biosystems website

Re-sequencing- looks for variation between strains or individuals

cDNA Sequencing (Fragmented cDNAs)- Sequencing of transcribed regions

Amplicon/PCR sequencing- could be targeted re-sequencing or possibly genome
sequencing (i.e. viral).
You can also read