Principles and challenges of genomewide DNA methylation analysis

Page created by Darren Myers
 
CONTINUE READING
REVIEWS

                                         A p p l i c At i o n s o f n e x t- g e n e r At i o n s e q u e n c i n g

                                   Principles and challenges of genome-
                                   wide DNA methylation analysis
                                   Peter W. Laird
                                   Abstract | Methylation of cytosine bases in DNA provides a layer of epigenetic control in
                                   many eukaryotes that has important implications for normal biology and disease. Therefore,
                                   profiling DNA methylation across the genome is vital to understanding the influence of
                                   epigenetics. There has been a revolution in DNA methylation analysis technology over the
                                   past decade: analyses that previously were restricted to specific loci can now be performed
                                   on a genome-scale and entire methylomes can be characterized at single-base-pair
                                   resolution. However, there is such a diversity of DNA methylation profiling techniques that it
                                   can be challenging to select one. This Review discusses the different approaches and their
                                   relative merits and introduces considerations for data analysis.

Transposons
                                  The phenotype of a cell is primarily determined by its                  DNA methylation information is erased by standard
Mobile DNA elements that can      expression profile and its response to environmental                molecular biology techniques, such as cloning in bacteria
relocate within the genome of     cues. Epigenetics provides stability and diversity to the           and PCR, and it is not revealed by hybridization as the
their hosts.                      cellular phenotype through chromatin marks that affect              methyl group is located in the major groove of DNA rather
Restriction–modification
                                  local transcriptional potential and that are preserved or           than at the hydrogen bonds. Therefore, methylation-
system                            regenerated during cell division. Methylation of DNA                dependent pretreatments of DNA were developed to
A set of enzymes found in many    cytosine residues at the carbon 5 position (5meC) is a              reveal the presence or absence of the methyl group at
bacteria and archaea that         common epigenetic mark in many eukaryotes and is                    cytosine residues. The techniques based on these pre-
protects the host genome from
                                  often found in the sequence context CpG or CpHpG                    treatments were initially restricted to relatively local-
genomic parasites. Restriction–
modification systems consist of
                                  (H = A, T, C). When located at gene promoters, DNA                  ized regions of the genome, but many have now been
sequence-specific restriction     methylation is usually a repressive mark. However, CpG              scaled up to enable DNA methylation analysis on a
endonucleases, which target       DNA methylation is increased in the gene bodies of                  genome-wide scale.
invading DNA, and associated      actively transcribed genes in plants1–3 and mammals4–6.                 Although initial forays into broad DNA methylation
DNA methyltransferases with
similar recognition sequences,
                                  Plant CpHpG methylation is found in non-expressed                   profiling were made with two-dimensional gel electro-
which protect the host            transposons7. Bacteria and archaea also have 5meC, along            phoresis8–13, the era of epigenomics truly took off with the
genome from the action of         with N-4-methylcytosine and N-6-methyladenine, and                  adaptation of microarray hybridization techniques from
the endonucleases.                these modified bases participate in restriction–modification        the gene expression and genomic fields to the profiling of
                                  systems and mismatch repair strand discrimination,                  histone modifications and DNA methylation patterns14–21.
Mismatch repair
A DNA-repair pathway that
                                  among other roles.                                                  The current revolution in sequencing technology has
removes mismatched bases              DNA methylation is laid down by dedicated DNA                   recently opened the door to single-base-pair resolution
and corrects the insertion or     methyltransferases with highly conserved catalytic                  whole-genome DNA methylation analysis 22–24. Many
deletion of short stretches of    motifs. In eukaryotes, usually only a subset of potential           of the high-resolution genome-wide DNA methylation
(repeated) DNA.
                                  target sequences in the genome are methylated, therefore            profiling techniques were pioneered in model organisms
                                  the distribution of methyl marks can convey epigenetic              with small genomes, such as Arabidopsis thaliana22,24, but
USC Epigenome Center,
                                  information by demarcating regions of transcriptional               are now being applied to organisms with larger genomes,
University of Southern
California, Keck School of        silence or transcriptional potential. The delineation               including mammals6,23,25,26. These recent developments
Medicine, 1450 Biggy Street,      of regional DNA methylation patterns, and broader                   in the rapidly evolving panoply of DNA methylation
Room G511B, Los Angeles,          DNA methylation profiles, has important implications                analysis techniques demand an updated synthesis of the
California 90089‑9601, USA.       for understanding why certain regions of the genome                 main principles and an overview of the advantages and
e‑mail: plaird@usc.edu
doi:10.1038/nrg2732
                                  can be expressed in specific developmental contexts                 disadvantages of the various approaches.
Published online                  and how epigenetic changes might enable aberrant                        This Review covers the general principles of DNA
2 February 2010                   expression patterns and disease.                                    methylation analysis, with a particular emphasis on

NATuRE REvIEWs | Genetics                                                                                                       voluME 11 | MARCH 2010 | 191

                                                       © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

CpG islands                        genome-scale DNA methylation analysis technologies.                   PCR reaction. This approach would require a thermosta-
In eukaryotic genomes, regions     The relative merits of the different techniques are dis-              ble DNA methyltransferase with very high efficiency
of several hundred base pairs      cussed, and bioinformatic challenges that are unique                  and maintenance fidelity and a complete lack of de novo
that are not depleted of           to DNA methylation analysis are outlined. Although                    methyltransferase activity, and so has not been realized
CpGs by 5-methylcytosine
deamination owing to them
                                   some of the methods covered in this Review rely on                    to date. Therefore, almost all sequence-specific DNA
being unmethylated in the          species-specific arrays, most of the principles could                 methylation analysis techniques rely on a methylation-
germ line. They often overlap      be applied to any organism with 5meC, including bac-                  dependent treatment of the DNA before amplification
transcription start sites. Most    teria and archaea. some of the methods described in                   or hybridization29–32. There are three main approaches:
definitions of CpG islands set a
                                   this Review may be applicable to 5-hydroxymethylcy-                   endonuclease digestion, affinity enrichment and bisul-
minimum length (for example,
200 or 500 bp), a minimum
                                   tosine, which was recently confirmed to be present in                 phite conversion. After genomic DNA has been treated
observed:expected CpG ratio        mammalian cells27,28.                                                 with one of the methylation-dependent steps, various
(for example, greater than                                                                               molecular biology techniques, including probe hybridi-
0.6 or 0.65) and a minimum         Distribution and detection of DnA methylation                         zation and sequencing, can be used to reveal the loca-
GC content (for example,
50% or 55%).
                                   As DNA samples are usually derived from a collection                  tion of the 5meC residues. The combination of different
                                   of cells, which may vary in their DNA methylation                     types of pretreatment followed by different analytical
                                   patterns, the distribution of 5meC in a DNA sample is                 steps has resulted in a plethora of techniques for deter-
                                   complex. Measurements can be made either of the pat-                  mining DNA methylation patterns and profiles29,32–42. In
                                   tern of methylated target sequences along individual                  the following sections I discuss techniques based on the
                                   DNA molecules or as an average methylation level at                   three main approaches, and a summary is provided in
                                   a single genomic locus across many DNA molecules29.                   TABLE 1. Expression profiling of cells treated with DNA
                                   Analysis of DNA methylation is further complicated by                 methyltransferase inhibitors and/or histone deacetylase
                                   the uneven distribution of methylation target sequences,              inhibitors has also been used as a discovery tool for epi-
                                   such as CpG, across the genome. As a consequence of                   genetically silenced genes. However, it is prone to false-
                                   the frequent mutation of 5meC to thymine, these targets               positive and false-negative results and is not considered
                                   are depleted throughout most of the genome but are                    to be a reliable gauge of DNA methylation at a given
                                   maintained in specific regions, such as CpG islands. This             locus, and is therefore not discussed in this Review.
                                   non-uniform distribution is an important considera-
                                   tion in DNA methylation analysis, as discussed below.                 endonuclease digestion
                                   As noted above, 5meC is not readily distinguished from                Restriction endonucleases are such powerful tools in
                                   unmethylated cytosine by hybridization-based methods                  molecular biology that their biological role in restric-
                                   and, as DNA methyltransferases are not present during                 tion–modification systems in bacteria and archaea is
                                   PCR or in biological cloning systems, DNA methyla-                    sometimes overlooked. Each sequence-specific restric-
                                   tion information is erased during amplification. some                 tion enzyme has an accompanying DNA methyl-
                                   investigators have suggested that it could be feasible to             transferase that protects the endogenous DNA from
                                   maintain the pattern of methylation during PCR if an                  the restriction defence system by methylating bases in the
                                   appropriate DNA methyltransferase were present in the                 recognition site. some restriction enzymes are inhibited

                                    Table 1 | Main principles of DnA methylation analysis
                                    Pretreatment          Analytical step
                                                          Locus-specific analysis        Gel-based analysis          Array-based analysis      nGs-based analysis
                                    enzyme                • HpaII-PCR                    • Southern blot             • DMH                     • Methyl–seq
                                    digestion                                            • RLGS                      • MCAM                    • MCA–seq
                                                                                         • MS-AP-PCR                 • HELP                    • HELP–seq
                                                                                         • AIMS                      • MethylScope             • MSCC
                                                                                                                     • CHARM
                                                                                                                     • MMASS
                                    Affinity              • MeDIP-PCR                                                • MeDIP                   • MeDIP–seq
                                    enrichment                                                                       • mDIP                    • MIRA–seq
                                                                                                                     • mCIP
                                                                                                                     • MIRA
                                    sodium                • MethyLight                   • Sanger BS                 • BiMP                    • RRBS
                                    bisulphite            • EpiTYPER                     • MSP                       • GoldenGate              • BC–seq
                                                          • Pyrosequencing               • MS-SNuPE                  • Infinium                • BSPP
                                                                                         • COBRA                                               • WGSBS
                                    AIMS, amplification of inter-methylated sites; BC–seq, bisulphite conversion followed by capture and sequencing; BiMP, bisulphite
                                    methylation profiling; BS, bisulphite sequencing; BSPP, bisulphite padlock probes; CHARM, comprehensive high-throughput arrays
                                    for relative methylation; COBRA, combined bisulphite restriction analysis; DMH, differential methylation hybridization; HELP, HpaII
                                    tiny fragment enrichment by ligation-mediated PCR; MCA, methylated CpG island amplification; MCAM, MCA with microarray
                                    hybridization; MeDIP, mDIP and mCIP, methylated DNA immunoprecipitation; MIRA, methylated CpG island recovery assay;
                                    MMASS, microarray-based methylation assessment of single samples; MS-AP-PCR, methylation-sensitive arbitrarily primed PCR;
                                    MSCC, methylation-sensitive cut counting; MSP, methylation-specific PCR; MS-SNuPE, methylation-sensitive single nucleotide
                                    primer extension; NGS, next-generation sequencing; RLGS, restriction landmark genome scanning; RRBS, reduced representation
                                    bisulphite sequencing; –seq, followed by sequencing; WGSBS, whole-genome shotgun bisulphite sequencing.

192 | MARCH 2010 | voluME 11                                                                                                         www.nature.com/reviews/genetics

                                                          © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                 by 5meC in the sequence context CpG, so the patterns                4-bp recognition sequences (as opposed to the 6-bp
                                 of cutting by such enzymes can provide a read-out of                recognition sequences of SmaI and XmaI).
                                 DNA methylation.                                                        An alternative approach is differential methylation
                                    The most widely used methylation-sensitive restric-              hybridization (DMH), which involves digestion of one
                                 tion enzymes for DNA methylation studies are HpaII                  pool of genomic DNA with a methylation-sensitive
                                 and SmaI, in part because they each have an isoschizomer            restriction enzyme and mock digestion of another pool.
                                 (MspI for HpaII) or neoschizomer (XmaI for SmaI) that is            This produces parallel DNA pools that are amplified and
                                 not inhibited by CpG methylation. That is not to say that           labelled with different fluorescent dyes for two-colour
                                 these isoschizomers are not methylation-sensitive. For              array hybridization17,18,21,55. The relative fluorescent
                                 example, MspI is inhibited by non-CpG methylation of                signal intensities can be used to extract DNA methyla-
                                 the first cytosine in the recognition sequence (CCGG).              tion information at the corresponding loci on the array.
                                 Therefore, MspI can be used to screen for meCCG                     DMH is one of several techniques that incorporate pre-
                                 in plants43.                                                        cutting of genomic DNA with the restriction enzyme
                                    The first locus-specific DNA methylation analyses in             MseI. The recognition site of MseI is AATT, so it cuts
                                 the late 1970s and early 1980s relied on digestion with             frequently in the genome but relatively rarely within
                                 methylation-sensitive restriction enzymes followed                  CpG islands56. This digestion step is often incorporated
                                 by gel electrophoresis and hybridization on southern                into protocols to deplete the sample of CpG-poor DNA
                                 blots44–47. This method is still applicable for some locus-         before further processing, such as adaptor ligation.
                                 specific studies that require linkage of DNA methylation            small genomes, such as the A. thaliana genome, do not
                                 information across multiple kilobases, either between               require an amplification step and can be directly labelled
                                 CpGs or between a CpG and a genetic polymorphism.                   for hybridization43,57.
                                 Methylation-sensitive restriction digestion followed                    An adaptation of DMH is to use the methylation-
                                 by PCR across the restriction site is a very sensitive              dependent endonuclease McrBC. This provides greater
                                 technique that is still used in some applications today.            sensitivity to densely methylated regions58,59 than using
                                 However, it is extremely prone to false-positive results            a methylation-sensitive enzyme. A further modification,
                                 caused by incomplete digestion for reasons other than               known as Methylscope, is to use McrBC to cut randomly
                                 DNA methylation.                                                    sheared DNA (instead of MseI-digested DNA) 60,61.
                                                                                                     optimization of tiling array design and data process-
                                 Genome-scale approaches. over the past decade, several              ing has resulted in improved performance for McrBC
                                 endonuclease-dependent genome-scale DNA methylation                 digestion-based techniques. The optimized workflow
                                 analysis approaches have been developed. Restriction                is referred to as comprehensive high-throughput arrays
                                 landmark genome scanning (RlGs) was the first tech-                 for relative methylation (CHARM)62. A further varia-
                                 nique for broad DNA methylation profiling. Differences              tion of DMH is to use a cocktail of methylation-sensitive
                                 in methylation are detected as differences in the pattern of        restriction enzymes to digest one pool of DNA and
                                 restriction fragments generated by digestion with a methy-          McrBC to digest the other pool63–65. This is referred to
                                 lation-sensitive restriction enzyme separated by two-               as microarray-based methylation assessment of single
                                 dimensional gel electrophoresis. RlGs has been widely               samples (MMAss)63. MMAss has been used to gener-
                                 used to identify imprinted loci and sites that are methy-           ate libraries for sanger sequencing 65. Another method,
                                 lated in a cancer- or tissue-specific manner 8–13. simpler          known as HpaII tiny fragment enrichment by ligation-
                                 but less comprehensive gel electrophoretic techniques               mediated PCR (HElP), uses ligation-mediated PCR for
                                 include methylation-sensitive arbitrarily primed PCR                the amplification of HpaII or MspI genomic restriction
                                 (Ms-AP-PCR)48 and amplification of inter-methylated                 fragments followed by array hybridization62,66,67.
                                 sites (AIMs)49, which rely on the differential patterns
                                 of arbitrarily primed PCR products generated with                   Sequencing approaches. Restriction enzyme enrich-
                                 methylation-sensitive or -resistant restriction enzyme-             ment techniques are currently being adapted so that the
                                 digested genomic DNA. The use of RlGs, Ms-AP-PCR                    read-out can be obtained by next-generation sequenc-
                                 and AIMs is decreasing as these techniques are replaced             ing techniques instead of array hybridization. sequence-
Isoschizomers                    by methods that are less labour-intensive and that do not           based analysis is more flexible and powerful as it allows
Pairs of structurally distinct   rely on gel electrophoresis.                                        for allele-specific DNA methylation analysis, does not
restriction enzymes with the
same recognition sequence and
                                                                                                     require an appropriately designed microarray, can cover
the same cleavage positions.     Array hybridization. Many different techniques have                 more of the genome with less input DNA and avoids
                                 been developed that couple enzymatic methods to                     hybridization artefacts, although it is still subject to
Neoschizomers                    array-based analysis. one of these is methylated CpG                sequence library biases. Next-generation sequencing has
Pairs of structurally distinct
                                 island amplification (MCA), which uses the differential             been used to analyse the output of the HElP assay 67.
restriction enzymes with
the same recognition             methylation sensitivities and cutting behaviours of SmaI            sequencing-by-synthesis of libraries constructed from
sequence but with different      and its neoschizomer XmaI. It is often combined with                size-fractionated HpaII or MspI digests that are com-
cleavage positions.              representational difference analysis (RDA) and, more                pared with randomly sheared fragments68 is known as
                                 recently, with array hybridization (when it is known as             Methyl–seq; sequence-based analysis of HpaII diges-
Imprinted
A locus with monoallelic
                                 MCAM)50–54. However, MCA provides lower resolution                  tion followed by the use of a flanking cut with a type-IIs
expression determined by the     coverage than other enzyme-based techniques that rely               restriction enzyme (MmeI) and adaptor ligation is known
parental origin of the allele.   on pools of restriction enzymes or on enzymes with                  as methylation-sensitive cut counting (MsCC)6,41.

NATuRE REvIEWs | Genetics                                                                                                     voluME 11 | MARCH 2010 | 193

                                                      © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                  Affinity enrichment                                                 have since been developed, including the quantitative
                                  Chromatin immunoprecipitation (ChIP) followed by micro-             direct sanger sequencing of PCR products100 and the
                                  array hybridization (ChIP–chip) or next-generation                  highly automated application of this approach102.
                                  sequencing (ChIP–seq) has proven to be a particularly
                                  useful technique for genome-wide studies of histone                 Array hybridization. Bisulphite genomic sequencing
                                  modifications69–74. similarly, affinity enrichment of               excels at producing base-pair resolution DNA meth-
                                  methylated regions using antibodies specific for 5meC (in           ylation information, but bisulphite-based methods are
                                  the context of denatured DNA) or using methyl-binding               not easily adapted to array-hybridization techniques
                                  proteins with affinity for methylated native genomic                and so, until recently, were rarely used for genome-
                                  DNA are proving to be particularly powerful tools for               scale DNA methylation analysis. With the exception of
                                  comprehensive profiling of DNA methylation in com-                  5meC residues, bisulphite-treated DNA is comprised
                                  plex genomes. Affinity purification of methylated DNA               of three different bases instead of four. This reduced
                                  was first demonstrated with the methyl-binding protein              sequence complexity, and therefore greater sequence
                                  MECP2 (REF. 56). Enrichment of methylated regions by                redundancy, results in decreased hybridization spe-
                                  immunoprecipitation of denatured genomic DNA with                   cificity. Hybridization of bisulphite-converted DNA
                                  an antibody specific for methylated cytosine75, followed            either requires dedicated arrays based on the bisulphite-
                                  by hybridization to either a tiling array or to a feature           converted genome or must allow for substantial
                                  microarray, such as a CpG island array, is referred to as           mismatches in the hybridization.
                                  MeDIP76,77, mDIP78 or mCIP1. These techniques have                      The first mammalian hybridization arrays for ana-
                                  been widely used to explore the methylomes of plant 1,2,            lysing bisulphite-treated DNA required the ampli-
                                  mouse75,79–81 and human62,76–78,82–88 cells. Confusingly, the       fication of individual regions of the genome before
                                  term MCIp has also been used to describe affinity puri-             hybridization to a dedicated oligonucleotide array 103,104
                                  fication of methylated native DNA by a recombinant                  and were therefore, in essence, modestly scaled-up
                                  polypeptide containing the methyl-binding domain of                 locus-specific assays. smaller genomes, such as that of
                                  human methyl-CpG-binding-domain protein 2 (MBD2)                    A. thaliana, can be hybridized to oligonucleotide arrays
                                  fused to the Fc tail of human immunoglobulin G89–91.                after whole-genome amplification using random tetra-
                                  More recently, approaches have been developed that                  nucleotide primers. This method is referred to as bisul-
                                  use higher affinity methyl-binding proteins, including              phite methylation profiling (BiMP)105. It is worth noting
                                  multimerized MBD1 domains92 and protein complexes                   that this approach relies on microarrays developed for
                                  that contain the short isoform of MBD2 (MBD2b) and                  non-bisulphite-converted DNA. Therefore, as a result
                                  MBD3l1 (the latter approach is called the methylated                of mismatches caused by the conversion of unmethy-
                                  CpG island recovery assay (MIRA)93–95). Also, ChIP                  lated cytosine residues, the overall hybridization signal
                                  with an antibody specific for native MBD proteins has               is low, both within and outside potential methylation
                                  been used as an indirect measure of the distribution of             target sequences. Regions with dense cytosine meth-
                                  genomic methylcytosine75,96.                                        ylation are least affected — as they retain more Cs —
                                      These affinity-enrichment methods have mostly                   and yield a relatively strong signal105. Therefore, BiMP
                                  been combined with array hybridization, in which the                is only applicable to methylation-dense regions of
                                  input DNA and enriched DNA are labelled with differ-                small genomes.
                                  ent fluorescent dyes. As with other array-based analyses,               Illumina has adapted its GoldenGate BeadArray
                                  methylcytosine affinity-enrichment techniques are now               technology to interrogate DNA methylation in human
                                  rapidly shifting to analysis by next-generation sequenc-            genomic DNA samples. Multiplexed methylation-
                                  ing techniques88. Affinity-based methods allow for rapid            specific primer extension of bisulphite-converted DNA
                                  and efficient genome-wide assessment of DNA meth-                   at up to 1,536 different CpG sites is performed using
                                  ylation, but they do not yield information on individual            primers that are specific for methylated and unmethy-
                                  CpG dinucleotides and require substantial experimental              lated sequences at each site106–108. The primers for the
                                  or bioinformatic adjustment for varying CpG density at              two different methylation states are labelled with dif-
                                  different regions of the genome.                                    ferent fluorescent dyes and the products are hybridized
Chromatin                                                                                             to bead arrays containing approximately 30 beads per
immunoprecipitation               Bisulphite conversion                                               CpG site. The current standard implementation for the
A technique that is used to
identify the location of
                                  The discovery that treatment of denatured genomic                   human genome is the GoldenGate Methylation Cancer
DNA-binding proteins and          DNA with sodium bisulphite chemically deaminates                    Panel I, which covers 1,505 CpG sites selected from
epigenetic marks in the           unmethylated cytosine residues much more rapidly                    807 genes, but custom arrays can also be designed109.
genome. Genomic sequences         than methylated cytosines97,98 spurred a revolution in              The GoldenGate assay can be performed on 96 sam-
containing the mark of interest
                                  DNA methylation analysis in the 1990s99,100. This chem-             ples simultaneously and is therefore well suited for the
are enriched by binding soluble
DNA chromatin extracts            ical treatment of DNA effectively turns an epigenetic               profiling of large numbers of specimens109–116. Illumina
(complexes of DNA and protein)    difference into a genetic difference — unmethylated                 has also adapted its more comprehensive Infinium plat-
to an antibody that recognizes    Cs are converted to Ts (by uracil) — thereby enabling               form to DNA methylation analysis68,117,118. The Infinium
the mark. Related techniques      many new DNA methylation detection and analysis                     platform incorporates a whole-genome amplification
— such as methylated DNA
immunoprecipitation — use
                                  techniques29,32,34–42. Analysis of bisulphite-converted             step after bisulphite conversion, which is followed
antibodies to recognize DNA       DNA was initially done by sanger sequencing of cloned               by fragmentation and hybridization of the sample to
modifications directly.           PCR products from single loci99,101. Many enhancements              methylation-specific DNA oligomers that are linked

194 | MARCH 2010 | voluME 11                                                                                                 www.nature.com/reviews/genetics

                                                       © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                   to individual bead types. Each bead type corresponds to            representation than amplification with locus-specific
                                   a specific DNA CpG site and methylation state. The                 primers. several groups have combined padlock capture
                                   current implementation of the Illumina Infinium                    of mammalian DNA with next-generation sequenc-
                                   assay for DNA methylation analysis (known as the                   ing to achieve 90–99% target specificity after pooling
                                   HumanMethylation27 DNA Analysis BeadChip) inter-                   tens of thousands of probes for bisulphite-converted
                                   rogates 27,578 CpG sites from 14,495 protein-coding                DNA in a technique called bisulphite padlock probes
                                   gene promoters and 110 microRNA gene promoters117.                 (BsPP)6,25,41,123. However, as capture occurs after bisul-
                                                                                                      phite conversion, there is again the concern that the
                                   Sequencing approaches. Although the adaptation of                  methylation state could influence capture efficiency and
                                   bisulphite-converted DNA to array hybridization has                therefore distort the DNA methylation measurements,
                                   been challenging, bisulphite-converted DNA is par-                 even if all methylation permutations are included in
                                   ticularly well suited for sequencing-based approaches              the capture probes. Alternatively, capture probes can
                                   and is now enjoying a resurgence thanks to the appli-              be designed to avoid CpG dinucleotides in organisms
                                   cation of next-generation sequencing platforms. ultra-             in which DNA methylation is largely restricted to
                                   deep sequencing of a limited number of loci has been               CpGs. However, this design approach severely limits
                                   achieved by direct pyrosequencing of PCR products119               the comprehensive capture of CpG-rich regions, such
                                   and by sequencing of more than 100 PCR products in                 as CpG islands.
                                   a single run at an average coverage of more than 1,600                The ultimate comprehensive single-base-pair
                                   reads per locus on the Roche 454 platform120.                      resolution DNA methylation analysis technique is
                                       The main challenge in sequencing bisulphite-                   whole-genome bisulphite sequencing. Whole-genome
                                   converted DNA arises from its low sequence com-                    shotgun bisulphite sequencing (WGsBs) has been
                                   plexity. Reduced representation bisulphite sequencing              achieved on the Illumina Genome Analyzer platform
                                   (RRBs) was introduced to reduce sequence redun-                    for small eukaryotic genomes, such as A. thaliana22,24,
                                   dancy by selecting only some regions of the genome                 and for mammalian DNA23. Increased read lengths and
                                   for sequencing by size-fractionation of DNA frag-                  paired-end sequencing strategies have aided the
                                   ments after BglII digestion121 or after MspI digestion26.          implementation of WGsBs124–126, although approxi-
                                   These choices of restriction enzymes enrich for CpG-               mately a tenth of the CpG dinucleotides in the mam-
                                   containing segments of the genome but do not target                malian genome remain refractory to alignment of
                                   specific regions of interest in the genome. Targeting has          bisulphite-converted reads.
                                   been accomplished by array capture or padlock capture
                                   before sequencing. Targeted capture on fixed arrays or             Other approaches. An alternative approach to sequenc-
                                   by solution hybrid selection can enrich for sequences tar-         ing or hybridization is detection by matrix-assisted
                                   geted by a library of DNA or RNA oligonucleotides and              laser desorption ionization time-of-flight (MAlDI-
                                   can be performed before or after bisulphite conversion.            ToF) mass spectrometry 127–129. Although this approach
                                   The advantage of capture before bisulphite conversion              requires gene-specific amplification, and should there-
Array capture
                                   is that a standard genomic array can be used and the               fore be considered a candidate gene method, it is ame-
A method for enriching whole
genomic DNA for many               methylation status of the region does not influence                nable to automation, as implemented in the EpiTYPER
regions of interest by             the capture. However, array capture is a relatively inef-          platform developed by sequenom, which relies on
hybridization to an array          ficient process, particularly for GC-rich target regions           gene-specific amplification of bisulphite-treated DNA
containing RNA or DNA              such as CpG islands, which are of particular interest in           followed by in vitro transcription, base-specific RNA
sequences complementary
to the regions of interest.
                                   DNA methylation studies. Targeted capture requires                 cleavage and MAlDI-ToF analysis128,129. Although it is
                                   relatively large quantities of native genomic DNA or               not a genome-wide technology, it can provide quantita-
Padlock capture                    amplification before hybridization, which would erase              tively accurate results at multiple CpG dinucleotides for
A method for simultaneously        DNA methylation information in native genomic DNA.                 hundreds of gene loci128,130 and can be reliably applied
capturing and amplifying large
                                   solution hybrid selection is more efficient but allows             to pooled DNA samples to obtain group averages for
numbers of regions of interest
from whole genomic DNA.            complex cross-hybridization structures, resulting in               valuable samples130.
Each padlock probe has two         more enrichment of off-target sequences.
complementary oligonucleotide          Alternatively, array capture after bisulphite conver-          relative merits of different approaches
sequences that flank a region of   sion and adaptor-mediated PCR amplification provides               A straightforward comparison of DNA methylation
interest. The sequences are
joined by a loop of DNA that
                                   sufficient DNA for constructing a sequencing library               analysis methods is hampered by the complexity and
ensures efficient joint            (referred to here as bisulphite conversion followed                diversity of the various techniques. Many approaches
hybridization and contains         by capture and sequencing (BC–seq))122. Although                   have competing strengths and weaknesses. The choice
sequences for PCR with             this involves small amounts of input DNA, it requires              of technique will be influenced by the number of sam-
universal primers.
                                   multiple permutations of capture oligonucleotides to               ples and the quality and quantity of DNA, as well as the
Solution hybrid selection          reflect different methylation states and is subject                desired coverage and resolution. Also, it is necessary to
A method for enriching whole       to potential measurement error caused by varying capture           take into account the organism that is being studied:
genomic DNA for many regions       efficiency of different methylation states122.                     array-based analysis requires that a suitable array for
of interest by hybridization           Padlock capture provides improved enrichment                   the species of interest is available, whereas sequence-
to a complex library of RNA
or DNA sequences in solution,
                                   efficiency by combining the increased annealing spe-               based analyses are generally applicable to any species for
followed by retrieval of the       cificity of two tethered probes, and subsequent amplifi-           which a reference genome exists. A summary of features
annealed hybrids.                  cation with universal primers allows for a more uniform            and potential sources of bias is shown in TABLE 2.

NATuRE REvIEWs | Genetics                                                                                                      voluME 11 | MARCH 2010 | 195

                                                       © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

 Table 2 | features and sources of bias for various techniques
 technology               Features                                                                                                                                                                     Potential sources of bias

                          Unambiguous identification

                                                                                                                                Good coverage of regions

                                                                                                                                                                                                                                                                                              Cross-hybridization bias
                                                                                                       measurement capability
                                                                                 Non-CpG methylation

                                                                                                                                                           amounts of input DNA

                                                                                                                                                                                                                                              Incomplete bisulphite
                                                                                                                                with low CpG density

                                                                                                                                                           Compatible with low
                                                         In cis co-methylation

                                                                                                                                                                                                                                                                      Bisulphite PCR bias
                                                                                                                                                                                  Full repeat-masked

                                                                                                                                                                                                                         Fragment size bias
                                                                                                                                                                                  genome coverage
                          of CpG measured

                                                                                                                                                                                                                                                                                                                         DNA methylation

                                                                                                                                                                                                                                                                                                                                                             CpG density bias
                                                                                                                                                                                                                                                                                                                                           GC content bias
                                                                                                                                                                                                                                              conversion bias
                                                                                                       Allele-specific

                                                                                                                                                                                                       Copy-number
                                                                                                                                                                                                       variation bias
                                                         information

                                                                                 information

                                                                                                                                                                                                                                                                                                                         status bias
 Infinium                        (•)                                                                                                                               •                                                                                  •                    •                         •
 Enzyme–chip                     (•)                           (•)                                                                    (•)                                                                                     •                                                                      •                                         •
 MeDIP–chip                                                                                                                                                                              •                  •                                                                                        •                                         •                 •
 BSPP                               •                            •                      •                      •                                                                                                                                      •                    •                                                   •
 BC–seq                             •                            •                      •                      •                                                                                                                                      •                    •                                                   •
 RRBS                               •                            •                      •                      •                                                   •                                                                                  •                    •
 Enzyme–seq                         •                            •                                             •                      (•)                          •                                                          •
 MeDIP–seq                                                                                                     •                                                                         •                  •                                                                                                                                  •                 •
 WGSBS                              •                            •                      •                      •                         •                         •                     •                                                            •                    •
 ‘•’ indicates that the method has this feature or potentially has this bias; ‘(•)’ indicates that the method has this feature to a limited extent or in some circumstances.
 BC–seq, bisulphite conversion followed by capture and sequencing; BSPP, bisulphite padlock probes; –chip, followed by microarray; MeDIP, methylated DNA
 immunoprecipitation; RRBS, reduced representation bisulphite sequencing; –seq, followed by sequencing; WGSBS, whole-genome shotgun bisulphite sequencing.

                                                       Sample requirements. The many DNA methylation                                                                                                                        MeDIP and bisulphite-based techniques both use
                                                       analysis techniques differ vastly in sample requirements.                                                                                                        ssDNA and are therefore compatible with DNA sam-
                                                       some techniques, such as RlGs, require in excess of                                                                                                              ples that have been previously denatured. Enzyme-based
                                                       2 μg of high-purity, high-molecular-weight genomic                                                                                                               techniques and MIRA both act on dsDNA. This has the
                                                       DNA. others, such as the Illumina GoldenGate tech-                                                                                                               potential advantage that these methods could in princi-
                                                       nology, are compatible with degraded DNA extracted                                                                                                               ple be applied to native DNA before purification — for
                                                       from formalin-fixed, paraffin-embedded samples131. As                                                                                                            example, they could be used to detect tumour-derived
                                                       a general rule, methods based on methylation-sensitive                                                                                                           methylated DNA in serum or plasma.
                                                       or methylation-dependent endonuclease treatment tend                                                                                                                 Bisulphite treatment not only requires DNA dena-
                                                       to require DNA of high purity, quantity and integrity,                                                                                                           turation before treatment but also causes substantial
                                                       although adaptations of these techniques can reduce                                                                                                              DNA degradation, and purification is needed to remove
                                                       sample quantity requirements67. Affinity-enrichment                                                                                                              the sodium bisulphite. For these reasons, input DNA for
                                                       techniques are more tolerant of DNA impurity and                                                                                                                 many bisulphite-based methods can be of low purity
                                                       integrity, but many require substantial quantities of                                                                                                            and integrity. In fact, crude lysis without subsequent
                                                       input genomic DNA to produce sufficient enriched                                                                                                                 purification can be sufficient preparation of samples for
                                                       output DNA. The ratio of input DNA to affinity reagent                                                                                                           bisulphite-based analysis.
                                                       can affect the enrichment efficiencies of regions with                                                                                                               PCR amplification is used in many locus-specific
                                                       varying methylcytosine density and should be carefully                                                                                                           applications of bisulphite-treated DNA. It is common
                                                       considered when making adjustments to the amount of                                                                                                              practice in PCR to use duTP during amplification
                                                       input DNA in affinity-enrichment methods. The fairly                                                                                                             and to pretreat subsequent samples with uracil DNA
                                                       high DNA quantity and quality requirements of many                                                                                                               glycosylase to remove carry-over PCR products from
                                                       enzyme-dependent and affinity-enrichment meth-                                                                                                                   prior reactions. This quality-assurance step is incom-
                                                       ods have resulted in the use of cell lines, as opposed                                                                                                           patible with standard bisulphite conversion methods,
                                                       to primary tissues, in many early mammalian profiling                                                                                                            as the bisulphite conversion deaminates unmethylated
                                                       studies. low quantities of input DNA can be mitigated                                                                                                            cytosines to uracil, which would lead to degradation of
                                                       by amplification methods, such as PCR and whole-                                                                                                                 the bisulphite-converted genomic DNA by the uracil
                                                       genome amplification, but only if amplification is                                                                                                               DNA glycosylase. Bisulphite deamination without
                                                       implemented after the methylation-dependent step,                                                                                                                desulphonation before uracil DNA glycosylase treatment
                                                       otherwise the methylation information will be erased.                                                                                                            can resolve this incompatibility 132,133.
                                                       For example, the Illumina Infinium platform incorpo-
                                                       rates whole-genome amplification after bisulphite con-                                                                                                           Sample throughput. sample throughput is generally
                                                       version and as a result can profile 27,578 CpG sites using                                                                                                       limited by the cost of labour and reagents. Highly auto-
                                                       less than 1 μg of input genomic DNA.                                                                                                                             mated techniques have low labour costs but are often

196 | MARCH 2010 | voluME 11                                                                                                                                                                                                                                                                www.nature.com/reviews/genetics

                                                                                                            © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                                        Number of CpGs analysed per sample genome                                      The coverage and resolution of enzyme-based meth-
                                   1       10           102         103       104         105        106           107      108    ods are closely linked to the distribution of recognition
                                                                                                                                   sequences throughout the genome. Techniques relying
                                         MethyLight
                             104                                                                                                   on HpaII/MspI sites (HpaII and MspI are isoschizomers)
                                                EpiTYPER                                                                           can assess up to 98.5% of CpG islands and 91.1% of
                                                                   GoldenGate                                                      NCBI Reference sequence (Refseq) promoters in the
                                                                                                                                   human genome, but at modest resolution within each
Number of samples analysed

                             103            MSP                              Infinium
                                                                                                                                   locus as a result of the average distance between HpaII/
                                          COBRA                                      Enzyme–chip                                   MspI sites within CpG islands67. The low-specificity of
                             102
                                       Pyrosequencing                                     MeDIP–chip                               the recognition site for McrBC (RmeC…RmeC, in which R
                                        Southern blot                      BSPP                                                    is A or G) suggests that McrBC-based analysis can reveal
                                                                              BC–seq                                               DNA methylation at almost any methylated CpG island
                             10                   Sanger BS
                                                                                                                                   in the genome, but the requirement of two methylated
                                                                                                Enzyme–seq
                                                                                                                                   cytosines within a restricted distance from each other
                                                                                                           RRBS                    in cis (40–3,000 bp) results in a nonlinear relationship
                                                                                                           MeDIP–seq               between DNA methylation density and cutting efficiency.
                               1
                                                                                                                  WGSBS            The effective coverage and resolution of enzyme-based
                                                                                                                                   array hybridization techniques is a function of the dis-
  Figure 1 | sample throughput versus genome coverage. A plot of sample throughput                                                 tribution of potential cleavage sites and the composition
                                                                Nature
  against genome coverage for various DNA methylation techniques.      Reviews | Genetics
                                                                    Throughput    is                                               of the hybridization array 62. likewise, the coverage and
  determined by the number of samples that can be analysed per experiment, based on                                                resolution of affinity methods depends on the genomic
  large eukaryotic genomes. Coverage is determined by the number of CpGs in the                                                    distribution of potential affinity targets (for example,
  genome that can be analysed per experiment. BC–seq, bisulphite conversion followed
                                                                                                                                   CpG) and array composition. using next-generation
  by capture and sequencing; BS, bisulphite sequencing; BSPP, bisulphite padlock
  probes; –chip, followed by microarray; COBRA, combined bisulphite restriction
                                                                                                                                   sequencing instead of microarrays to generate data
  analysis; MeDIP, methylated DNA immunoprecipitation; MSP, methylation-specific                                                   output from enzyme-, affinity- and bisulphite-based
  PCR; RRBS, reduced representation bisulphite sequencing; –seq, followed by                                                       methods eliminates some of the limitations that are
  sequencing; WGSBS, whole-genome shotgun bisulphite sequencing.                                                                   attributed to array composition. However, the base-
                                                                                                                                   composition and fragment-size biases of sequencing
                                                                                                                                   platforms can introduce new coverage limitations and/or
                                                              reagent-intensive. The labour involved in many of the                measurement error 134.
                                                              current enzyme-based and affinity-enrichment meth-                       Furthermore, the sequence context of the methyl-
                                                              ods precludes the processing of large numbers of samples.            cytosine residue needs to be considered when selecting
                                                              Most studies that have used array-hybridization or next-             an appropriate technique. A brief summary of these
                                                              generation sequencing applications have reported results             considerations is provided in BOX 1.
                                                              for just a few samples at most. By contrast, semi-automated
                                                              bisulphite-based technologies, such as the sequenom                  Accuracy and reproducibility. DNA methylation profil-
                                                              EpiTYPER and Illumina GoldenGate platforms, excel                    ing methods vary in their accuracy, reproducibility, types
                                                              at characterizing large numbers of samples, and stud-                of bias and ability to detect in-cis methylation patterns, as
                                                              ies involving in excess of 100 primary human samples                 well as the extent to which they are influenced by target
                                                              have been reported109–111,114,115. Cost, sample through-             sequence density or DNA methylation density (TABLE 2).
                                                              put and coverage are interdependent. FIG. 1 compares                 Methylation-sensitive restriction enzyme-based methods
                                                              throughput with genome coverage (see also below).                    are able to resolve methylation differences in low-CpG-
                                                                                                                                   density regions, whereas McrBC-based and affinity-based
                                                              Genome coverage and resolution. The concept of reso-                 methods perform better for CpG-rich regions62. Many
                                                              lution is not straightforward when comparing various                 of the enzyme-based techniques reveal DNA methyla-
                                                              technologies with differing DNA fragmentation sizes                  tion information by DNA fragment length differences.
                                                              and non-uniform spacing of targeted regions. As an                   Fragment length can influence array hybridization
                                                              example, bisulphite sanger sequencing has been used to               efficiency, but is particularly problematic in sequence-
                                                              assess approximately 40,000 different CpG dinucleotides              based analysis, in which library construction efficiency
                                                              located on 2,524 different amplicons102, whereas the                 is strongly influenced by fragment length62,134.
                                                              Illumina Infinium DNA methylation platform has been                      Next-generation sequencing is often used differently
                                                              used to interrogate 27,578 different CpG dinucleotides               in affinity or enzyme pretreatment experiments and
                                                              located at 14,495 different gene promoters68. As the                 bisulphite genomic sequencing experiments. In affin-
                                                              methylation states of CpG dinucleotides located within               ity or enzyme pretreatment, short-read sequencing is
                                                              the same amplicon are highly correlated102, the number               used to determine the prevalence of different regions
                                                              of independent epigenomic data points obtained on the                of the genome in the enriched sample by counting the
                                                              Illumina platform is much higher (14,495 loci versus                 number of reads that are uniquely alignable to regions of
                                                              2,524) than in the sanger sequencing example, even                   the genome. By contrast, bisulphite sequencing extracts the
                                                              though the total number of CpG dinucleotides assessed                alignment and DNA methylation information from
                                                              in the sanger sequencing example is slightly higher                  the sequence itself. The read count methods are prone
                                                              (40,000 versus 27,578).                                              to sources of bias, such as GC content, fragment size and

  NATuRE REvIEWs | Genetics                                                                                                                                  voluME 11 | MARCH 2010 | 197

                                                                                    © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                  Box 1 | considerations for detection of methylation in different sequence contexts
                                  Of the approaches for detecting DNA methylation discussed in this Review, enzyme-based methods are the most
                                  narrowly confined by the sequence context of the 5-methylcytosine (5meC) residues. Most techniques that rely on
                                  enzymes have been designed for either an meCpG context (in the case of restriction enzymes) or an RmeC (R = A or G)
                                  sequence context (in the case of McrBC). Therefore, enzyme-based techniques will tend to be oblivious to meC residues
                                  in other sequence contexts.
                                    Most affinity-based methods will enrich for 5meC regardless of sequence context. However, the array hybridization or
                                  sequencing used subsequently for analysis will not be able to distinguish between CpG and non-CpG methylation.
                                  Bisulphite sequencing is the most reliable method for identifying meC in any sequence context. However, verification
                                  following more extensive bisulphite treatment and/or recurrent observation in independent samples is recommended
                                  to distinguish apparent CpH (H = A, T, C) methylation from incomplete bisulphite conversion.
                                    Hydroxymethylcytosine is best detected using thin-layer chromatography, high-performance liquid chromatography
                                  and/or mass spectrometry27,28. None of these techniques is well-suited for locus-specific analysis. Hydroxymethylcytosine
                                  is likely to be indistinguishable from 5meC in most restriction enzyme-based techniques and in bisulphite conversion
                                  techniques. It remains to be seen whether affinity-based methods show cross-affinity to hydroxymethylcytosine.

                                 copy-number variations in the source DNA, that affect               In sequence-based analysis, aside from flagging known
                                 the likelihood that a particular region is included in the          sNPs, the standard method for discriminating between
                                 sequenced fragments134. The bisulphite-based methods                evolutionary and bisulphite-induced deamination is to
                                 are also subject to these effects, but they do not influ-           resequence the non-bisulphite-converted experimen-
                                 ence the DNA methylation measurement itself, as this                tal genome. However, a more efficient method relies
                                 information is extracted from the sequence.                         on the recognition that a sNP caused by evolutionary
                                    Bisulphite-based methods tend to be fairly accu-                 deamination of meC to T will have been propagated on
                                 rate and reproducible. The major sources of bias and                the opposing DNA strand as an A, whereas bisulphite
                                 measurement error are incomplete bisulphite conver-                 deamination of an unmethylated cytosine will leave the
                                 sion and differential PCR efficiency for methylated                 G on the opposing strand unaffected137. sequencing of
                                 versus unmethylated versions of the same sequence135.               both DNA strands of bisulphite-converted DNA can
                                 Incomplete bisulphite conversion can arise from incom-              therefore discriminate between a CpG sNP and an
                                 plete denaturation before bisulphite treatment or rean-             unmethylated CpG without the need to resequence the
                                 nealing during the bisulphite conversion. Bisulphite                non-bisulphite converted genome. sufficient coverage
                                 conversion destroys the self-complementarity of DNA.                can resolve heterozygotes and randomly hemimethylated
                                 Therefore, repeated denaturation cycles during the bisul-           states. Interestingly, the methylation state of CpG dinu-
                                 phite conversion process are usually sufficient to ensure           cleotides can be associated with cis-linked sNPs138.
                                 near complete conversion, as reannealing becomes pro-               Therefore, sequence-based technologies that can pro-
                                 gressively less likely. The completion of bisulphite con-           vide allele-specific DNA methylation information are
                                 version can be monitored with assays that are specifically          preferable to allele-agnostic methylation assays.
                                 designed to detect incomplete conversion136 by monitor-                 Enzyme-based analyses and enrichment techniques
                                 ing the conversion of spiked DNA controls or by reten-              both rely on the relative enrichment or depletion of
                                 tion of non-target sequence cytosine dinucleotides, such            regions of the genome. The occurrence of aneuploidy
                                 as CpH in most mammalian genomes, with the caveat                   in cancer cells can lead to inaccuracies in methylation
                                 that native CpH methylation may exist, particularly in              measurements if copy-number alterations are not appro-
                                 embryonic stem cells23,26,120. It should be noted that over-        priately controlled. Enzyme-based methods that rely on
                                 treatment with bisulphite not only degrades DNA but                 the balanced measurement of the ratio between methyl-
                                 can also lead to an increased incidence of methylated               ated and unmethylated versions of a sequence are less
                                 cytosines converting to thymine residues98, which results           prone to measurement error caused by copy-number
                                 in under-reporting of DNA methylation. This can also                alterations than methods that rely on unbalanced
                                 be monitored by spiked methylated DNA controls.                     enrichment for either just methylated or just unmethyl-
                                                                                                     ated sequences. Affinity-based methods are particularly
                                 Allele specificity, polymorphisms and copy-number var-              prone to measurement error caused by copy-number
Hemimethylated                   iation. CpG dinucleotides are common sites of polymor-              alterations, as they are oblivious to the unmethylated ver-
Methylation of a residue         phism, owing to deamination of methylated cytosines                 sion of a sequence. The inclusion of a control produced
on one strand within a           during evolution. The deamination event can be present              from the experimental sample, such as DNA artificially
palindromic target sequence
                                 in the reference genome or in the experimental genome,              methylated by M.SssI84 or digested with MspI67, can be
but not of the corresponding
residue within the palindromic   and the polymorphism is indistinguishable from bisul-               used to normalize copy number variations. Bisulphite-
target sequence on the           phite-induced deamination of an unmethylated cytosine               based methods, such as the Illumina GoldenGate plat-
complementary DNA strand.        on one strand of the experimental genome. The high                  form, are less susceptible to copy-number variation139.
Not be confused with             frequency of CpG polymorphisms can result in errors
monoallelic methylation,
in which one allele of a
                                 in DNA methylation measurement. Indeed, polymor-                    Sensitivity. Genome-scale DNA methylation profiling
locus is methylated in           phisms in Illumina GoldenGate BeadArray primers can                 techniques are not particularly well suited for the detec-
a diploid organism.              masquerade as sample-specific epigenetic variation110.              tion of low-frequency DNA methylation states in a DNA

198 | MARCH 2010 | voluME 11                                                                                                 www.nature.com/reviews/genetics

                                                      © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

 Table 3 | Bioinformatic resources
Resource                    Purpose                                                    URL                                                                 Refs
Batman                      MeDIP DNA methylation analysis tool                        http://td-blade.gurdon.cam.ac.uk/software/batman                      88
BDPC                        DNA methylation analysis platform                          http://biochem.jacobs-university.de/BDPC                             153
BSMAP                       Whole-genome bisulphite sequence mapping                   http://code.google.com/p/bsmap                                       154
CpG Analyzer                Windows-based program for bisulphite DNA                   -                                                                    155
CpGcluster                  CpG island identification                                  http://bioinfo2.ugr.es/CpGcluster                                    156
CpGFinder                   Online program for CpG island identification               http://linux1.softberry.com                                             -
CpG Island Explorer         Online program for CpG Island identification               http://bioinfo.hku.hk/cpgieintro.html                                157
CpG Island Searcher         Online program for CpG Island identification               http://cpgislands.usc.edu                                            158
CpG PatternFinder           Windows-based program for bisulphite DNA                   -                                                                    159
CpG Promoter                Large-scale promoter mapping using CpG islands             http://www.cshl.edu/OTT/html/cpg_promoter.html                       160
CpG ratio and GC            Online program for plotting the                            http://mwsross.bms.ed.ac.uk/public/cgi-bin/cpg.pl                       -
content Plotter             observed:expected ratio of CpG
CpGviewer                   Bisulphite DNA sequencing viewer                           http://dna.leeds.ac.uk/cpgviewer                                     161
CyMATE                      Bisulphite-based analysis of plant genomic DNA             http://www.gmi.oeaw.ac.at/en/cymate-index/                           162
EMBOSS CpGPlot/             Online program for plotting CpG-rich regions               http://www.ebi.ac.uk/Tools/emboss/cpgplot/index.html                    -
CpGReport/Isochore
Epigenomics Roadmap         NIH Epigenomics Roadmap Initiative homepage                http://nihroadmap.nih.gov/epigenomics                                   -
Epinexus                    DNA methylation analysis tools                             http://epinexus.net/home.html                                           -
MEDME                       Software package (using R) for modelling MeDIP             http://espresso.med.yale.edu/medme                                   163
                            experimental data
methBLAST                   Similarity search program for bisulphite-modified          http://medgen.ugent.be/methBLAST                                     164
                            DNA
MethDB                      Database for DNA methylation data                          http://www.methdb.de                                            165–168
MethPrimer                  Primer design for bisulphite PCR                           http://www.urogene.org/methprimer                                    169
methPrimerDB                PCR primers for DNA methylation analysis                   http://medgen.ugent.be/methprimerdb                                  164
MethTools                   Bisulphite sequence data analysis tool                     http://www.methdb.de                                                 170
MethyCancer Database        Database of cancer DNA methylation data                    http://methycancer.psych.ac.cn                                       171
Methyl Primer Express       Primer design for bisulphite PCR                           http://www.appliedbiosystems.com/                                       -
                                                                                       methylprimerexpress
Methylumi                   Bioconductor package for analysing DNA                     http://www.bioconductor.org/packages/bioc/html/                         -
                            methylation data from Illumina platforms                   methylumi.html
Methylyzer                  Bisulphite DNA sequence visualization tool                 http://ubio.bioinfo.cnio.es/Methylyzer/main/index.html                  -
mPod                        Genome-wide DNA methylation viewer integrated http://www.compbio.group.cam.ac.uk/Projects/                                      172
                            with the Ensembl genome browser               p4meth.html
PubMeth                     Database of DNA methylation literature                     http://www.pubmeth.org                                               173
QUMA                        Quantification tool for methylation analysis               http://quma.cdb.riken.jp                                             174
TCGA Data Portal            Database of TCGA DNA methylation data                      http://cancergenome.nih.gov/dataportal                                  -
 BDPC, Bisulphite Sequencing Data Presentation and Compilation; BSMAP, Bisulphite Sequence Mapping Program; CyMATE, Cytosine Methylation Analysis Tool for
 Everyone; EMBOSS, European Molecular Biology Open Software Suite; MeDIP, methylated DNA immunoprecipitation; MEDME, Modelling Experimental Data with
 MeDIP Enrichment; NIH, US National Institutes of Health; QUMA, Quantification Tool For Methylation Analysis; TCGA, The Cancer Genome Atlas.

                              sample, as sensitivity is generally a function of sequenc-              Bisulphite genomic sequencing can provide DNA
                              ing depth. Therefore, it is relatively expensive to achieve          methylation information from individual DNA mole-
                              sensitive detection of low-abundance DNA methylation                 cules. This feature can be used to investigate cell lineages140
                              patterns. The introduction of next-generation sequenc-               and DNA–protein interactions141. Analysis of single
                              ing has opened the door to efficient deep sequencing                 molecules was initially achieved by subcloning of PCR
                              of bisulphite PCR products. For example, ultra-deep                  products99,100 but has more recently been adapted to the
                              sequencing on the Roche 454 pyrosequencing platform                  analysis of digital PCR products by limiting dilution142–144.
                              can generate thousands of reads per locus, therefore                 This digitization of bisulphite PCR can be used to elimi-
                              providing the potential for the detection of rare DNA                nate bias caused by the different amplification efficiencies
                              methylation variants119,120.                                         of methylated and unmethylated sequences144.

NATuRE REvIEWs | Genetics                                                                                                     voluME 11 | MARCH 2010 | 199

                                                    © 2010 Macmillan Publishers Limited. All rights reserved
REVIEWS

                                     epigenomic bioinformatics                                            analysis (which for affinity-based methods might
                                     The development of computational tools and resources                 incorporate read direction). several important statisti-
                                     for DNA methylation analysis is accelerating rapidly 145.            cal considerations that are specific to DNA methylation
                                     TABLE 3 provides a list of some available resources.                 measurements are highlighted in BOX 2.
β distribution
                                     However, the issues related to bioinformatic analysis
A continuous probability             are complex and are not discussed comprehensively                    conclusions and future directions
distribution with an interval        here. The many considerations for hybridization-based                DNA methylation analysis has undergone a veritable
between 0 and 1. Two positive        techniques include image and scanning artefacts, back-               revolution over the past decade. We are on the verge
parameters, α and β, are used
                                     ground correction, batch and array normalization                     of witnessing a deluge of sequence-based DNA meth-
to define β distributions.
                                     and adjustments for GC content and CpG density 88.                   ylation data, along with other epigenomic and genomic
Median absolute deviation            sequence-based analyses involve alignment to a                       data types. This data production will be coordinated
A measure of statistical             reference genome, collapsing of clonal reads, read                   in national and international consortia146,147. The bot-
dispersion that is less              counts or bisulphite-based analysis23, and further data              tleneck in DNA methylation advances will increasingly
influenced by outliers and
extreme values than standard
deviation. It is defined as the
median of the collection of
                                      Box 2 | An introduction to statistical issues in DnA methylation analysis
absolute deviations from the
data set’s median.                    DNA samples are generally derived from mixtures of cell populations with heterogeneous DNA methylation profiles.
                                      Bisulphite sequencing-based approaches can provide a discrete DNA methylation pattern corresponding to a single
Quantile normalization
                                      original DNA molecule. However, most techniques provide an average measurement across the sampled DNA molecules
A method for equalizing the
                                      for a particular locus or CpG dinucleotide. For biological and historical reasons, this is usually expressed as a fraction or
total signal intensities and
distributions of probe signal         percentage methylation of the total molecules assessed. For most platforms, DNA methylation measurements represent
strengths among arrays or             absolute measurements for a given sample, whereas gene expression measurements are usually expressed as a
among colour channels on an           differential comparison between samples. Usually, strand-specific DNA methylation or hemimethylation is not
array. It sorts all probes by         considered, although monoallelic methylation — including, but not restricted to, imprinted regions — does occur in vivo.
signal strength and then              The resulting measurement scale is therefore 0 to 1, or 0 to 100%, with 0 indicating that no methylated molecules were
matches probes at each rank           identified and 1 or 100% indicating that all identified molecules were methylated. The fraction is calculated as M/(M + U),
position among arrays and             in which M represents the signal for methylated molecules and U the signal for unmethylated molecules.
forces the values at each rank
                                         It is important to note that this is a finite scale (β distribution) that has different statistical properties to the infinite
position to be equal. An
                                      scale that is commonly used in gene expression array analysis. For example, β-distributed DNA methylation
identical distribution of
probe signal strengths                measurements are not normally distributed and the variance of measurements within a finite scale is influenced by the
among the arrays or colour            mean of the measurements; the variance of measurements with a mean near to the middle of the range can be much
channels is obtained.                 larger than the variance of measurements with a mean close to the limits (0 and 1). Therefore, sorting features (for
                                      example, genes or probes) by standard deviation will result in a bias towards features with mean methylation in the
LOESS normalization                   middle of the range. Reducing the number of features by selecting probes with high standard deviation or median
A computationally intensive           absolute deviation is a common step in unsupervised analyses of microarray data. Therefore, variance-stabilizing data
method in which a polynomial          transformations or selection of probes based on a different metric should be considered. The behaviour of the distance
regression is fitted to each
                                      metrics used to compare measurements across samples (such as fold-change, log-ratio or simple subtraction) is
point in the data and more
                                      different for β-distributed fractions or percentages in DNA methylation measurements than for infinite ratio scales.
weight is given to data nearer
the point of interest. It is often    This should be given careful consideration when selecting the most appropriate metric. Clustering and partitioning
applied to hybridization array        methods used to identify subgroups of samples with similar DNA methylation profiles are being developed for
data to remove differences in         β-distributed DNA methylation data115.
global signal intensity among           An alternative method is to report the ratio of methylated to unmethylated molecules for a particular locus (M/U),
data sets or colour channels.         usually as a log2(M/U) ratio62,152. This has gained acceptance with the increased use of microarrays in DNA methylation
                                      analysis62, and has the advantage that many of the tools developed for gene expression data can be readily applied.
MA plot                               However, several points should be considered. First, many data normalization methods for gene expression data
A representation of microarray
                                      assume that many genes are not expressed and that most genes are not differentially expressed; similar assumptions
data in which M (vertical axis)
                                      cannot be made for DNA methylation measurements. More importantly, M and U are generally not independent, which
is the intensity ratio between
the red (R) and green (G) colour      violates an assumption of many of the statistical approaches for gene expression microarrays. M and U are biologically
channels (M=log(R/G)) and A           inversely correlated, but many DNA methylation platforms show them to be positively correlated if signal strength
(horizontal axis) is the mean         is strongly influenced by genomic location or by probe sequence — that is, if M and U are both derived from either a
intensity (A=(logR+logG)/2).          strongly or weakly hybridizing region. Furthermore, the M/U ratio may be inappropriate for situations in which the
This representation is often          platform is measuring across multiple CpG dinucleotides. For example, if two CpGs are being measured and each CpG
used as a basis for normalizing       is methylated at 10%, M/U will equal 0.1 if the platform assesses each CpG independently, but will equal (0.1 × 0.1)/
microarray data, with the             (0.9 × 0.9) = 0.01 if the platform only registers methylation when both CpGs are methylated and the absence of
underlying assumptions that
                                      methylation when both CpGs are unmethylated. As the number of locally grouped CpGs increases, these distortions
dye bias is dependent on signal
                                      become quite pronounced.
intensity, that the majority of
probes do not have very                 An important distinction between DNA methylation measurements and gene expression measurements is that the
different signal intensities          total amount of CpG methylation can differ substantially among samples. Therefore, normalization methods, such as
among channels and that               quantile normalization and LOESS normalization (which is often applied to data represented on MA plots), that assume
approximately the same                similar total signal across samples can remove real biological signal. It is important to note that the fluorescent dyes in
number of probes in each              the Infinium DNA methylation assay do not correspond to DNA methylation states. Therefore, normalization algorithms
channel have signal intensities       based on dye channel comparisons cannot be applied to Infinium DNA methylation data without modification.
that are stronger than the              DNA methylation bioinformatics, biostatistics and computational biology are fertile areas of research that are under
equivalent probes in
                                      rapid development. TABLE 3 lists several resources for DNA methylation data analysis that are currently available.
the other channel.

200 | MARCH 2010 | voluME 11                                                                                                       www.nature.com/reviews/genetics

                                                           © 2010 Macmillan Publishers Limited. All rights reserved
You can also read