Centromeres under Pressure: Evolutionary Innovation in Conflict with Conserved Function - MDPI

Page created by Eduardo Casey
 
CONTINUE READING
Centromeres under Pressure: Evolutionary Innovation in Conflict with Conserved Function - MDPI
G C A T
T A C G
G C A T
           genes
Review
Centromeres under Pressure: Evolutionary
Innovation in Conflict with Conserved Function
Elisa Balzano 1       and Simona Giunta 2, *
 1   Dipartimento di Biologia e Biotecnologie “Charles Darwin”, Sapienza Università di Roma, 00185 Roma, Italy;
     elisa.balzano@uniroma1.it
 2   Laboratory of Chromosome and Cell Biology, The Rockefeller University, 1230 York Avenue,
     New York, NY 10065, USA
 *   Correspondence: simona.giunta@cantab.net
                                                                                                      
 Received: 7 July 2020; Accepted: 4 August 2020; Published: 10 August 2020                            

 Abstract: Centromeres are essential genetic elements that enable spindle microtubule attachment
 for chromosome segregation during mitosis and meiosis. While this function is preserved across
 species, centromeres display an array of dynamic features, including: (1) rapidly evolving DNA;
 (2) wide evolutionary diversity in size, shape and organization; (3) evidence of mutational processes to
 generate homogenized repetitive arrays that characterize centromeres in several species; (4) tolerance
 to changes in position, as in the case of neocentromeres; and (5) intrinsic fragility derived by sequence
 composition and secondary DNA structures. Centromere drive underlies rapid centromere DNA
 evolution due to the “selfish” pursuit to bias meiotic transmission and promote the propagation
 of stronger centromeres. Yet, the origins of other dynamic features of centromeres remain unclear.
 Here, we review our current understanding of centromere evolution and plasticity. We also detail
 the mutagenic processes proposed to shape the divergent genetic nature of centromeres. Changes
 to centromeres are not simply evolutionary relics, but ongoing shifts that on one side promote
 centromere flexibility, but on the other can undermine centromere integrity and function with
 potential pathological implications such as genome instability.

 Keywords: centromere; repetitive DNA; mutagenesis; centromere evolution; HORs; chromosome instability

1. An Introduction to Centromere Diversity
      In 1882, Walter Flemming observed the central structure that forms the primary constriction
on mitotic chromosomes [1], later named the centromere [2]. Despite its early cytological discovery,
the centromere remains a fascinating and rather mysterious region of the genome. A hundred
years after Flemming’s observation, the smallest centromere, suitably named “point centromere”,
was characterized by Louise Clarke and John Carbon in the budding yeast Saccharomyces cerevisiae [3],
made of a single centromere-specific nucleosome [4]. Already from these early studies, two key and
apparently contrasting aspects of centromere biology emerged: great heterogeneity in centromere
DNA size, organization and structure across species [5,6], while holding an essential and evolutionarily
conserved function in enabling chromosome segregation [7]. Centromeres can be broadly classified into
different types (Table 1) based on relative size: (1) point centromeres, which are rare and only found
in fungi; (2) regional centromeres, which are the most common type of centromere where a specific
genomic region defines the centromere location (because regional centromeres can vary widely in
size, a further sub-classification has been proposed between short (40 kb) regional
centromeres [8]); (3) holocentric centromeres, which are diffused and encompass the entire chromosome
(recently, single base pair resolution data have shown that holocentric organisms like C. elegans in
reality consist of hundreds of budding yeast-like point centromeres in a “polycentric” set up); and (4)

Genes 2020, 11, 912; doi:10.3390/genes11080912                                     www.mdpi.com/journal/genes
Centromeres under Pressure: Evolutionary Innovation in Conflict with Conserved Function - MDPI
Genes 2020, 11, 912                                                                                              2 of 28

meta-polycentric centromeres, which are a recently-added, rare category where the centromeres are
alternated and thus extended to cover a section of the chromosome. These categories that highlight the
genetic diversity of centromeres are recapitulated in Table 1, and described in detail below.

                                   Table 1. Centromere structure in different species.

                Centromere Type                   Species                     Size          References
                                                  fungi
                Point centromere                                                                  [4]
                                          Saccharomyces cerevisiae          ~125 bp
                                                   fungi
                  Short regional
                                              Candida albicans             ~3–5 kb              [9,10]
                   centromere
                                         Schizosaccharomyces pombe        ~35–110 kb
                                              viridiplantae
                                            Arabidopsis thaliana        ~400 kb–1.4 Mb
                                                                                                [11–13]
                                               Oryza sativa              ~65 kb–2 Mb
                  Long regional                  Zea mays                  ~180 kb
                   centromere
                                                metazoa
                                          Drosophila melanogaster           ~420 kb
                                                                                                [14–16]
                                              Mus musculus                   ~1 Mb
                                              Homo sapiens                ~0.5 to 5 Mb
                Meta-polycentric              tracheobionta
                                                                                                 [17]
                  centromere                  Pisum sativum               ~69–107 Mb
                                                viriplantae
                                                                                                 [18]
                                                Luzula nivea                ~100 Mb
                 Holocentromere
                                                 metazoa
                                               Bombyx mori                 ~8–21 Mb             [19–21]
                                           Caenorhabditis elegans          ~14–21 Mb

      A unified consensus for the centromere can be reached when describing its conserved and essential
role: centromeres are necessary for the correct inheritance of genetic material by enabling chromosome
attachment to the spindle microtubules during each round of cell division [22,23]. Centromeres as
conditio sine qua non for genome inheritance are highlighted by the quest to engineer human artificial
chromosomes (HACs). HACs require centromeric DNA, or centromere chromatin, in order to be stably
transmitted over cellular generations [24].
      Centromere specialization is primarily determined by a unique chromatin environment founded on
the presence of a centromere-specific nucleosome containing the histone H3 variant protein centromere
protein A (CENP-A) that serves as a docking template for centromere factor binding and mitotic
kinetochore assembly, and epigenetically encodes the transgenerational inheritance and propagation of
the centromeric locus [25]. Underscoring its essential and evolutionarily conserved function, homologs
for CENP-A are found in many species throughout evolution and are studied in a variety of laboratory
model organisms (Table 2) [26,27].

                 Table 2. H3-like centromeric protein A homologues in different model organisms.

        H3-Like Centromeric Protein A Homologues                     Model Organism                      Size
              Chromosome segregation 4 (Cse4)                   Saccharomyces cerevisiae [28]     ~26 kDa [29]
           Centromere-specific histone H3 (Cnp1)               Schizosaccharomyces pombe [30]     ~13 kDa [31]
                 Centromere identifier (Cid)                    Drosophila melanogaster [32]      ~25 kDa [33]
               Centromeric histone 3 (CenH3)                      Arabidopsis thaliana [34]       ~19 kDa [35]
         Histone H3-like centromeric protein (HCP-3)             Caenorhabditis elegans [36]      ~32 kDa [37]
               Centromeric protein A (Cenpa)                         Mus musculus [38]            ~15 kDa [39]
              Centromeric protein A (CENP-A)                        Homo sapiens [40,41]          ~15 kDa [42]
Centromeres under Pressure: Evolutionary Innovation in Conflict with Conserved Function - MDPI
Genes 2020, 11, 912                                                                                                         3 of 28

      The centromere histone is preserved in flies as the centromere identifier (Cid) [32,43], in worms as
histone H3-like centromeric protein (HCP-3) [36], in plants and fungi as centromeric histone 3 (CenH3),
in fission yeast as centromere-specific histone H3 (Cnp1) [30], in budding yeast as chromosome
segregation 4 (Cse4) [28], in mouse as (Cenp-a) [38] and in human as CENP-A [40,41], as well as other
species (Table 2). The ubiquity and conservation of the centromere-specific histone variant prompted
the suggestion for a common designation of CenH3/CENP-A [44]. As more model organisms are
being studied, our understanding of centromere epigenetic specification and its diversity broadens.
Recent work in the garden pea Pisum sativum show that it contains multiple copies of CenH3 protein to
generate an extended primary constriction, defined as a “meta-polycentric centromere” with alternated
CenH3 domains [16,45]. Similar scattered features are also seen in other organisms [46]. CENP-A is
interspersed with the canonical H3-containing nucleosome in a way that is conserved from flies to
humans [46], and forms high-density islands/sub-domains of CENP-A across the human centromeres,
as it was reported looking at stretched chromatin fibers [47]. The expansion of the peculiar centromere
structure of Pisum was also found in another legume tribe species, Lathyrus, where it has an additional
copy of the CENH3 gene that was not seen in other phylogenetically-related species [48], underscoring
centromere genetic and epigenetic diversity across even closely related species. While a structural
relationship exists between human centromere proteins that mark the functional “centrochromatin” [49],
CENP-A and other centromere proteins are amongst the fastest changing during evolution, with hyper-
variable regions, divergence in length (Table 2) and divergence in overall sequence and domain
composition
  Genes 2020, 11, x(Figure 1).REVIEW
                    FOR PEER                                                                     4 of 29

       Figure1.1.CenH3
      Figure        CenH3protein
                             proteinalignments,
                                       alignments,    conservation
                                                   conservation   andand  diversity
                                                                       diversity     across
                                                                                 across       species.
                                                                                         species.  TheThe    structural
                                                                                                        structural  elements
      ofelements
          CenH3 of   CenH3are
                  proteins   proteins   are illustrated,
                                illustrated,             with conserved
                                                with conserved   residuesresidues
                                                                           in blue.inThe
                                                                                      blue.  The histogram
                                                                                          histogram    aboveabove   the
                                                                                                               the sequences
       sequences
      shows    the shows   the conserved
                    conserved    regions: regions:    the carboxyl
                                             the carboxyl          terminal
                                                             terminal       domain
                                                                        domain   andandits its components(L1
                                                                                            components      (L1 and
                                                                                                                 and α-
                                                                                                                     α-helix)
       helix)
      are      are highly
           highly         preserved
                    preserved   acrossacross   eukaryotes.
                                         eukaryotes.     TheThe shared
                                                              shared    CENP-ATargeting
                                                                      CENP-A      Targeting Domain
                                                                                               Domain(CATD)
                                                                                                        (CATD)drives
                                                                                                                   drives the
       the association between proteins and centromeres [50]. Despite the variability of the amino terminal
      association    between proteins and centromeres [50]. Despite the variability of the amino terminal tail,
       tail, this domain contains a phosphorylatable serine for CenH3 mitotic function [51]. This image is
      this domain contains a phosphorylatable serine for CenH3 mitotic function [51]. This image is courtesy
       courtesy of Damien Goutte-Gattat [52].
      of Damien Goutte-Gattat [52].
      Intriguingly, new evidence has demonstrated the absence of the largely conserved centromere
 histone in some organisms. CenH3-independent centromeres were found in the African sleeping
 sickness parasite Trypanosoma brucei [53] and in four lineages of insects, underscoring an ancient
 transition associated with a switch from regional or point centromeres to holocentric centromeres
 that was accompanied by loss of the centromere-specific histone [54]. This raises the question as to
 why some holocentric organisms retain a centromere-specific histone while others do not. Partly, it
 may relate to the conservation of kinetochore proteins present among holocentric and monocentric
 centromeres even in species where CenH3 is lost [55]. Retaining kinetochore assembly is the ultimate
 goal to enable centromere activity [56]. In the case of specific insect lineages, the holocentric
 centromeres devoid of CenH3 still present canonical kinetochore proteins, especially the outer part
Genes 2020, 11, 912                                                                                 4 of 28

     Intriguingly, new evidence has demonstrated the absence of the largely conserved centromere
histone in some organisms. CenH3-independent centromeres were found in the African sleeping
sickness parasite Trypanosoma brucei [53] and in four lineages of insects, underscoring an ancient
transition associated with a switch from regional or point centromeres to holocentric centromeres that
was accompanied by loss of the centromere-specific histone [54]. This raises the question as to why some
holocentric organisms retain a centromere-specific histone while others do not. Partly, it may relate
to the conservation of kinetochore proteins present among holocentric and monocentric centromeres
even in species where CenH3 is lost [55]. Retaining kinetochore assembly is the ultimate goal to enable
centromere activity [56]. In the case of specific insect lineages, the holocentric centromeres devoid of
CenH3 still present canonical kinetochore proteins, especially the outer part where the kinetochore
interfaces with microtubules [54,57]. Trypanosoma brucei remains to date as an exception, showcasing
extremely divergent outer kinetochore components defined as an “unconventional” kinetochore which
is made up of 20 apomorphic kinetoplastid kinetochore proteins (KKT1–20) not conserved across
the other flagellated members of the monophyletic group of Euglenozoa [53,58]. The Trypanosoma
“exception” challenges the assumption that centromere function is funded on its epigenetic specification.
Other systems may exist where chromosome segregation is free from the imposed presence of CenH3,
or even “canonical” kinetochore constrains [59]. Further investigations into CenH3 divergent evolution,
holocentromere condition and cases that lack epigenetic specification for centromeres will shed light
on essential and universal requirements for chromosome segregation.
     The wide diversity of centromere proteinaceous constituents is paralleled by the progressive
mutability of underlying centromere DNA [60]. At the genetic level, centromere sequences are
characterized by repetitive DNA, often rich in A/T nucleotides and arranged in tandem units as found
in many organisms. The high representation of repeats across species implies a bias for reiterated
DNA in supporting centromere formation and function [61]. Yet the finding by Voullaire et al. (1993)
of an ectopic human centromere, so-called neocentromere, on marker chromosomes 10 deprived
of repetitive DNA brought the requirements for DNA repeats at centromere under scrutiny [62].
Neocentromeres seem to have a sequence-independent formation [63], underscoring the epigenetic
foundation of centromeres [64–66]. Alphoid-less centromeres likely originated from neocentromeres.
An absence of satellite repeats was seen in the horse centromere on chromosome 11 (Equus Caballus
11, ECA11) [67], in zebra for chromosomes 2, 5, 7, 13, 18–21 [68] and in the donkey centromeres 11 and
16 [69]. These satellite-free centromeres form primary constrictions and still guarantee segregation
fidelity [70]. In particular, ECA11 is well conserved in the syntenic region in other mammals and its
two internal regions of 136 and 99 kb both bind CENP-A and CENP-B [71], respectively, suggesting
robust propagation even in the absence of satellite DNA repeats.
     A reconciliation regarding the functionality of repetitive centromere sequences was offered by
recent data pointing to a role for CENP-B in fulfilling centromere specification by stabilizing and partly
recruiting CENP-C directly to the centromere in human cells depleted of CENP-A [72,73]. CENP-B is
recruited to a specific consensus sequence, the CENP-B box present within human α-satellite repeats [74].
Thus, CENP-B-containing centromeres are specified by a concerted contribution of both CENP-A
loading, in a sequence independent manner, and of CENP-B recruitment to the CENP-B box [75]. So,
while epigenetically CENP-A is necessary and sufficient to establish a centromere in proliferating
somatic cells [76], whether it is on a HAC [24] in an ectopic location [63,77] or on a lactose operon
(LacO) array [78], recent evidence shows that CENP-B may be able to fully compensate for CENP-A in
enabling centromere specification, formation, positioning and transgenerational inheritance [72,73]
(Daniele Fachinetti and Sebastian Hoffman; personal communication).
Genes 2020, 11, 912                                                                                 5 of 28

      Cis-acting α-satellite sequences are not sufficient to define a functional centromere. Indeed,
“non-alphoid centromeres” have been found in plants [79,80], in birds [81], among Equidae subspecies
(e.g., speciation between horse and donkey) [69,82,83], in different primate species [84] and in
humans [85]. This means that new centromere sites are generated without a corresponding alteration
in DNA organization and they are still undergoing repositioning. Indeed, new centromere formation
could represent a way to insert inter- and intra-species diversity [86–88].
      Ectopic centromere formation represents an opportunity to re-localize the centromere to a new
position outside the endogenous site, giving rise to a functional neocentromere which enables cell
division upon disruption of the endogenous centromere. The configuration of the neocentromere
can occur at a distance from the endogenous centromere, as found within inverted duplications
between a breakpoint and a telomere end [89]. The ability for kinetochore protein assembly on the
new locus is assisted by CENP-A recruitment to the neocentromere [90]. Interestingly, chromosomes
containing active neocentromeres can be maintained over generations, implying that the chromosomal
positioning of the centromere region retains flexibility in its localization and can promote sister
chromatid separation even when decentered or greatly shifted from the endogenous locus. Thus,
the pliability in accommodating centromere functionality over diverse sequences and variable overall
size also extends to adaptability to different locations along the chromosome [91]. Similarly to gene
duplication being the first step toward divergence and functional innovation, the establishment of a new,
competent centromere site outside of the endogenous locus offers flexibility and sustained functionality.
Amongst the many plausible mechanisms for neocentromere formation, the recently reported ectopic
CENP-A loading [92] and/or binding transiently to DNA double strand breaks (DSBs) [93] may
represent favorable sites for the initiation of neocentromere formation, the establishment of a functional
de novo centromere [94–97] and for its stabilization during subsequent generations [98]. Leo et al.
offers a detailed review in this Genes Centromere Stability special issue of the different models of
neocentromere formation [99].
      Following the evolutionary footsteps of centromere sequences and proteins can help unravel
some of the aforementioned riddles and paradoxes in centromere biology. Here, we have delved into
the conflict between evolutionarily and ongoing mutagenesis in centromere DNA and whether these
processes may impact the conserved and essential functions of centromeres. How these seemingly
detrimental mechanisms converge to undermine centromere function while also being important
contributors to centromere biology and evolution will be discussed (Section 2).

2. Centromere Organizational Diversity in Light of Evolution
     From the smallest and simplest centromere of Saccharomyces cerevisiae to the large and complex
ones found in higher eukaryotes, including human megabase-sized ones, the evolutionary compulsion
to sustain variability in order to exploit this locus for chromosome segregation is evident [100].
     A case in point is the fast evolving “point” centromere of budding yeast S. cerevisiae with as little
as ~125 bp (base pair) consensus AT-rich sequences [4,28,101].
     Each centromere has three centromere DNA elements (centromere determining elements, CDEs) for
the association of the centromere DNA binding protein complex: CDEI (~8 bp), CDEII (~78–86 bp) and
CDEIII (~25 bp) (Figure 2A) [102–106]. Cse4 maps on the CDEII DNA element and forms a modified histone
octamer, with different studies proposing a variety of models for this nucleosome:homotypic tetrasome
(Cse4/H4)2 [107], hexasomes with non-histone proteins (Cse4-H4/Scm3)2 [108], asymmetric/mixed
octasomes (Cse4/H3/(H4/H2B/H2A) [109] and single right-handed hemisomes (CenH3/H4/H2A/H2B)
wrapping the ~80 bp of DNA centromeric sequence [110,111].
length of the chromosome (14–21 Mb) (Figure 2H) [21], yet it is still dependent on the H3-like
 centromere histone HCP-3 for chromosome segregation during mitosis [36,121]. Through the
 evolutionary lens, centromere organization looks somewhat stochastic, with different species having
 evolved their own particular way to adapt a centromere locus for chromosome segregation.
 Importantly, while centromeres can exist in different forms and arrangements, their purpose to
Genes       11, 912
      2020, accurate
 achieve             division of genetic material is always accomplished [22,122].                 6 of 28

    Figure 2. Centromere structures in different eukaryotes. (A) The S. cerevisiae point centromere is
      Figure
    125  bp in2. Centromere
                 size and it structures
                                  is composed in different
                                                  of threeeukaryotes.
                                                             centromere (A)DNA
                                                                             The S.elements
                                                                                     cerevisiae(CDEs):
                                                                                                point centromere    is 125and
                                                                                                          CDEI, CDEII
      bp in size
    CDEIII.   (B) and
                   The itS.ispombe
                              composed     of threeiscentromere
                                      centromere                  DNA
                                                       made of inner     elements
                                                                      (ImrL         (CDEs):
                                                                               and ImrR)   andCDEI,
                                                                                                outerCDEII
                                                                                                       (dg andanddh)
                                                                                                                  CDEIII.
                                                                                                                     inverted
      (B) The S.sequences
    repetitive     pombe centromere
                                that flank isa central
                                                made ofunique
                                                          inner (ImrL   and (Cnt).
                                                                 sequence     ImrR) (C)
                                                                                      andThe
                                                                                           outer
                                                                                               two(dg  andsatellite
                                                                                                    main    dh) inverted
                                                                                                                    domains
      repetitive
    (AATAT     andsequences
                    AAGAG)that        flank
                                   of the D.amelanogaster
                                               central uniquecentromere
                                                                sequence (Cnt).    (C) The twowith
                                                                            are interspersed     main  satellite domains
                                                                                                     transposable    elements
      (AATAT     and   AAGAG)         of  the  D.  melanogaster    centromere    are   interspersed
    (black lines). (D) A. thaliana has a 180 bp repeat unit intermingled with retrotransposons (black with   transposable
      elements
    lines).      (black
            (E) The       lines).centromere
                      mouse        (D) A. thaliana
                                               is madehasup
                                                          a 180  bp repeat
                                                             of major        unitsequences
                                                                      satellite   intermingled   with of
                                                                                             (MaSat)    retrotransposons
                                                                                                           234 bp monomers
    (spanning ~6 Mb; green arrows) and minor satellite sequences (MiSat) of 120 bp monomers (spanning
    ~600 kb; blue arrows). (F) Human centromeres contain tandem repeats of α-satellite 171 bp monomers
    organized head to tail into higher order repeats (HORs). (G) The meta-polycentric centromere of P.
    sativum is a very long centromere of 13 families of satellite DNA repeats and one family of Ty3/gypsy
    retrotransposons, organized into 3–5 domains containing CenH3. (H) The polycentric or holocentric
    centromere of C. elegans covers the entire length of chromosome on which there are several points for
    microtubule attachment. In spite of this great diversity, all these centromeres perform faithful roles in
    chromosome segregation.

     In fission yeast Schizosaccharomyces pombe, the centromeric region is large relative to the total
genome size, spanning 35–110 kb, of which ~4 kb represents a unique central sequence (cnt) flanked by
two inverted repetitive sequences (ImrL and ImrR) (Figure 2B) [10,112].
     Next based on overall size is the 420 kb repetitive centromere of Drosophila melanogaster, composed
of over 85% satellite DNA interrupted by the presence of transposable elements (TE) (Figure 2C) [113].
Genes 2020, 11, 912                                                                                  7 of 28

      A very similar composition of satellite DNA and centromeric transposable elements was also
found in plants, such as Arabidopsis thaliana [114], Oryza sativa [115] and Zea mays [13]. Elements of
diversity in these plant satellite DNA are displayed by the size of the basic unit present and number of
reiterations of these units which make up the centromeres, ranging from 400 kb to 1.4 Mb. For instance,
the Arabidopsis centromere has a 180 bp monomer (Figure 2D) [11,116], rice has a 155 bp satellite
CentO unit [12] and maize contains a 156 bp satellite unit named CentC [13]. These repeated units,
while divergent, all specifically bind well-characterized centromere proteins. Satellite sequences found
in the mouse centromere also contain repetitive domains with distinct unit sizes [15,117]. The mouse
centromere is organized into minor satellite DNA with a 120 bp homogenized unit that constitutes
the core centromere region, and flanking major satellite DNA of pericentromeric heterochromatin
that is made up of less-ordered 234 bp units (Figure 2E) [15,118]. In humans, the centromere is also
distinct from the flanking pericentromere. The former is made up of tandemly organized repeats, called
α-satellite DNA, while the latter is made of monomeric α-satellite units and other types of repeats.
Within the core centromere, the 171 bp monomeric units of α-satellite DNA arranged in tandem share
between 50% to 70% sequence homology. Several repeat units form a higher order repeat (HOR) block
that is reiterated with a similarity of 97–100% to make up a homogenized array spanning several
megabases, usually 2–5 Mb (Figure 2F) [16,61]. Notably, each human chromosome has a different
number of monomers that make up its HOR, with some chromosome-specific sequences contained
within the homogenized array. Thus, sequence diversity is not only found across species but also
within species, across the karyotype.
      In addition to the aforementioned regional centromeres, large or small (which we categorized as
short and long regional centromeres, as in Table 1), there are other kinds of centromere genetic structures
with less common organization, including organisms that have multiple or diffused centromeres.
A striking example of a centromere which is an intermediate between a monocentric (single) centromere,
and a polycentric, is the garden pea, P. sativum. Similarly to other species equipped with satellite DNA,
the P. sativum centromere is constructed on tandem repeated domains of 13 individual families of
satellite DNA and one family of Ty3/gypsy retrotransposons (Figure 2G). The Pisum meta-polycentric
centromere is then made up of 1–5 domains. Reminiscent of the multiple centromeric arrays found
in human chromosomes, only one array represents the active centromere that forms the kinetochore.
Notably, the garden peas’ centromere is considered polycentric because multiple active arrays contribute
to a linear-like kinetochore [17], unlike other centromeres where only one of the repetitive arrays
is functional [119].
      In addition to the monocentromere and meta-polycentric centromere described above with a
defined site for each chromosome, the holocentromere is dispersed to the total length of chromosome
with a non-localized kinetochore. The holocentric condition is spread in several phyla, implying
multiple distinct and independent occurrences during evolution [120]. The Caenorhabditis elegans
centromere is a prime example of a holocentric organism, where the centromere encompasses the full
length of the chromosome (14–21 Mb) (Figure 2H) [21], yet it is still dependent on the H3-like centromere
histone HCP-3 for chromosome segregation during mitosis [36,121]. Through the evolutionary lens,
centromere organization looks somewhat stochastic, with different species having evolved their own
particular way to adapt a centromere locus for chromosome segregation. Importantly, while centromeres
can exist in different forms and arrangements, their purpose to achieve accurate division of genetic
material is always accomplished [22,122].
      Indeed, primary constriction size appears invariant and with a constant scale of magnitude
from yeast to human [123]. Thus, despite the great evolutionary diversity and organization across
eukaryotes, centromere function in chromosome segregation remains conserved.
Genes 2020, 11, 912                                                                                8 of 28

Centromere Drive: From Conflicts to Benefits
      A rapid and heterogeneous evolution of centromere components across eukaryotes is in
disagreement with its vital and conserved centromere function [7,124]. Yet, these mutagenic changes must
be in accord with a synchronized shift of centromeric elements that provide an evolutionary advantage.
A plausible reason for this fast centromere evolution–adaptation paradox is elegantly provided by the
“centromere drive” hypothesis formulated by Malik and Henikoff [7,125], where centromere DNA and
protein components co-evolve under genetic conflict [126]. Centromere drive sees centromeres not
only as essential regions of the genome during cell division, but also as “selfish genetic elements” that
have an opportunity to play tug-of-war during the first asymmetric division (MI) in female meiosis
and bias their transmission [126,127]. In fact, in the centromere drive model, the stronger centromeres
segregate successfully with respect to the competitors. Their ability to exploit the asymmetry of oocyte
meiosis, overthrowing Mendelian genetic laws [127], means that there is a Darwinian selection between
centromeric variants for their transmission to the gametes and consequently for their inheritance,
which underlies the constant genetic changes as a continued quest toward improved strength and
favored inheritance. There are several examples demonstrating the validity of the centromere drive
hypothesis. Recent elegant proofs were provided by the Lampson lab using crosses between mouse
strains with different amounts of centromere proteins. The “stronger” centromere was preferentially
inherited during female meiosis due to increased levels of kinetochore proteins contributing to the
likelihood of transmission to the egg [128]. The presence of mutational changes in centromeric
sequences is reconciled with simultaneous conformational changes in centromeric proteins, generating
more microtubule attachment sites [129–131]. Lampson and collaborators set up a system to investigate
the implication of changes in satellite DNA in recruiting the kinetochore complex. They found a
6–10-fold increment of minor satellite mouse centromeric repeats in “strong” centromeres compared
to the “weaker” centromere mouse strain [129,132]. The size difference translates into increased
retention of CENP-B protein on its DNA binding motifs, CENP-B box present on the minor satellite
that consequentially recruits additional CENP-A proteins [133] and, in turn, is responsible for the
robust assembly of the outer kinetochore for robust attachment to the asymmetric meiotic spindle [129].
The stronger centromeres are able to orient towards the egg pole and remain in the mature oocyte,
winning a spot in self-propagation [128,133]. In addition to centromere DNA changes, meiosis can also
be biased by other features, including spindle asymmetries [128].
      Even though this evidence elucidates the advantage of centromere evolutionary changes,
deleterious effects must also be taken into consideration, including unbalanced segregation that
could generate incompatible post-zygotic hybrids contributing to speciation [124,134].
      Centromere rearrangements are protagonists in karyotypic divergence, as in the case of the horse
and donkey. Changes in centromere repositioning created chromosomal structural variations that
act like a “genetic barrier” between these two species due to the odd rate of meiotic chromosome
recombination, which causes the gametogenic failure in mules [135].
      To contrast this constraint, CenH3 gene duplications are positively evolving, with the vast
majority becoming pseudogenes and fixing in the population as they are able to adapt to the selection
imposed by changes in centromeric sequences [136]. For instance, Mimulus aurantiacus displays many
CenH3 duplication events under a divergent process in which paralogs differentiate with distinct
sub-specialized functions [136]. CenH3 duplication and divergence are also seen in Drosophila where
five duplications of the Cid gene correlate with tissue-specific expression [60,137,138].
      Thus, similarly to other evolutionary changes, centromere DNA and centromeric genes use
duplications as a mechanism to mitigate rapid mutagenesis. Notably, this rapid evolution of centromere
sequences and/or proteins is an irreversible process and on some occasions, it might turn into
chromosomal instability [139].
Genes 2020, 11, 912                                                                                  9 of 28

      In addition to the issue of speciation, changes at the centromere are not simply evolutionary
relics that are now settled, but ongoing shifts in the context of centromere drive. Centromeres may
be unstable regions of the genome not just on an evolutionary timescale, but also within the cellular
lifetime [140]. Indeed, recombination and rearrangements were found to happen within a single cell
cycle in human primary epithelial cells [141]. In Section 3, we will review the mutagenic processes
that occurred to form the peculiar genetic structures of centromeres during evolution, and that may
continue to undermine centromere stability during cell division.

3. Mapping Mutagenic Mechanisms by Following Their Evolutionary Footsteps on Centromere DNA
     Centromere DNA is one of the fastest evolving sequences found within the eukaryotic genome.
The repetitive nature of centromeres, often in head-to-tail orientation, implies that the repeat units were
subjected to expansion and reiteration, followed by other rounds of mutagenesis, to enable formation
of the region as we observe it today. To reconstruct the repetitive array, several simulations have
been proposed to understand how mutagenesis acts on centromeres to shape their genetic structure.
Recombination at the centromere seems obvious yet has remained counter-intuitive. Starting 80 years
ago, numerous evidence has been accumulating, demonstrating the negative effects of meiotic
recombination within the centromere region [142,143] in different organisms [144]. A reduced level of
recombination events at centromeric and immediately flanking sequences during meiosis has long been
established, giving a reputation to centromeres as “cold” spots to recombination, as described by Andy
Choo, who asked the question: “Why is the centromere so cold (to recombination)?” [145]. Highly
condensed chromatin has been thought to repress recombination in order to avoid instability within
centromere DNA repeats [146], as well as DNA methylation [147]. Extreme linkage disequilibrium for
single nucleotide polymorphisms (SNPs) found at centromeres is another indicator of a low rate of
recombination and crossing over events [148–150]. Yet, centromere DNA structure and the high degree
of homology between satellites across chromosomes are strongly indicative of recombination-driven
homogenization and evolution. In addition to evolutionary processes, recombination has been shown
to happen to centromeres at relatively high levels during a single cellular generation, with specific
factors contributing to its (at least partial) suppression [141,147]. Sister chromatid exchanges were
detected in mouse [147] and in human cells [141] using a technique called Centromere-Chromosome
Orientation-Fluorescent in situ Hybridization (Cen-CO-FISH) [151], and centromere proteins including
human CENP-A contribute to repressing centromere rearrangements [141]. Intriguingly, recombination
and other mutagenic processes may be promoted by intrinsic features of centromere repetitive DNA.
Given the exceptional flexibility of centromeric repeats, altered topological conformation and secondary
structures are likely to occur [142,152,153]. Emerging roles for centromere chromatin in mitigating
centromere instability by reducing recombination [141], transposition events and possibly suppressing
DNA damage formation indicates an interesting balance between intrinsic or programmed mutagenesis
and epigenetic stabilization at centromere.
     On an evolutionary timescale, homogenization of centromeric repeats has been speculated to
emerge precisely through short and long-range stochastic unequal exchange (Figure 3A) between
sister chromatids. These were described in the Smith model [154] by a non-reciprocal recombination
between homologous sequences that are neutral to selection [155–157]. Similarly, the mechanism of
gene conversion (GC) (Figure 3B) [158] is a unidirectional transfer of genetic information from an
intact to a broken strand, and can readily account for centromere expansion driven by DNA damage.
Depending on the length of GC tracts, they can be called short-tract gene conversions (STGC) for DNA
segments ranging between 50 to 200 bp [159,160] or long-tract gene conversions (LTGC) for segments
over 1 kb [161,162], with LTGC likely playing a role at large centromeres.
Genes 2020, 11, 912                                                                                                                   10 of 28
 Genes 2020, 11, x FOR PEER REVIEW                                                                                                   12 of 29

       Figure3.3.Mutagenic
     Figure        Mutagenic processes
                                 processes thatthat maymay operate at centromere
                                                                         centromere sequences
                                                                                         sequencesand   andhavehavecontributed
                                                                                                                       contributedtoto
       theirrepetitive
     their    repetitiveorigins.
                           origins. (A)
                                      (A) Unequal
                                            Unequal exchange
                                                         exchange following
                                                                     following recombination
                                                                                  recombinationcan     cancause
                                                                                                              causegaingainororloss
                                                                                                                                  lossofof
       tandemrepeats
     tandem      repeatsandand DNA
                                DNA rearrangements.
                                         rearrangements. (B)    (B)Gene
                                                                     Geneconversion
                                                                            conversion  causes
                                                                                            causesthethe
                                                                                                       unidirectional
                                                                                                           unidirectional  transfer   of
                                                                                                                                transfer
     ofgenetic
        geneticinformation
                  informationamongamonghomologous
                                             homologousrepetitive
                                                               repetitiveDNA
                                                                           DNAsequences
                                                                                   sequences   andandcan
                                                                                                       canresult
                                                                                                              resultininreciprocal
                                                                                                                         reciprocaloror
       non-reciprocalexchange
     non-reciprocal       exchange(the (thelatter
                                             latter isisdepicted).
                                                         depicted). (C) Replication
                                                                         Replication slippage
                                                                                         slippageon   onmisalignment
                                                                                                          misalignmentrepeatedrepeated
       DNA    strands   during   replication
     DNA strands during replication is thought  is thought    to induce  centromere     expansion      or contraction
                                                                         centromere expansion or contraction depending    depending
     ononwhether
          whetherthe  thehairpin
                           hairpin(depicted)/distortion
                                     (depicted)/distortionisisfoundfoundon onthe
                                                                               thenewly
                                                                                   newlysynthesized
                                                                                             synthesizedstrand
                                                                                                             strand(blue
                                                                                                                       (bluerepeats)
                                                                                                                              repeats)or
       orbulge
     the   the bulge     (depicted)/distortion
                 (depicted)/distortion       is on the is on  the template
                                                          template   DNA (green DNArepeats).
                                                                                       (green repeats).        (D) Break-induced
                                                                                                  (D) Break-induced         replication
       replication
     (BIR)  repairs (BIR)   repairsdouble-stranded
                     one-ended         one-ended double-stranded
                                                            break (DSB) break     (DSB)
                                                                          substrate,       substrate,
                                                                                       produced      byproduced
                                                                                                         replication  byfork
                                                                                                                          replication
                                                                                                                                collapse.
       fork
     (E)     collapse.
         Rolling   circle(E) Rolling circle
                           replication    occursreplication
                                                   when the occurs      when the 3′ and
                                                                30 end circularizes,    end itscircularizes,
                                                                                                  replicationand      its replication
                                                                                                                 produces      repeated
       produces repeated
     concatemers.             concatemers.
                       (F) Single               (F) Single
                                   strand annealing          strand
                                                           (SSA)     annealing
                                                                   repairs  DSBs(SSA)
                                                                                   throughrepairs
                                                                                               the DSBs    through
                                                                                                   annealing           the annealing
                                                                                                                  of complementary
       of complementary       ssDNA       strands   succeeded     by  DNA    tail end   digestion
     ssDNA strands succeeded by DNA tail end digestion and ligation. These repair pathways           and    ligation.   These    repair
                                                                                                                          are essential
       pathways    are   essential  for  maintaining      genome    stability, yet  when    operating
     for maintaining genome stability, yet when operating on repetitive sequences (especially arranged inon   repetitive   sequences
       (especially
     tandem           arrangedhigh
                and sharing        in degree
                                        tandemofand        sharing
                                                       sequence      high degree
                                                                  homology      like atofthesequence     homology
                                                                                              centromere),       they maylike result
                                                                                                                                at thein
       centromere),     they  may     result  in  mutagenic      variability
     mutagenic variability as a way for ongoing DNA evolution and shaping.    as  a way     for  ongoing     DNA      evolution     and
       shaping.
Genes 2020, 11, 912                                                                                11 of 28

     Generally, homology tracts are templates for the resolution of double Holliday junctions (HJ)
and synthesis-dependent strand annealing (SDSA) during gene conversion. Both these intermediates
are implicated as down-stream processing for the resolution of DNA double stranded breaks (DSB)
through DNA damage repair (DDR) pathways. The origins of DSBs within centromere repeats
remain unknown. We speculate that stochastic damage can be exacerbated by the intrinsic fragility
of centromeres [140]. Another interesting source of DNA damage is represented by transposons.
The occurrence of non-allelic gene conversion between duplicated TEs has been demonstrated [163,164]
and, while CENP-A nucleosomes seem to play a role in suppressing these TE-mediated mutagenic
events, they are thought to retain an active role that impacts the centromere genomic landscape [165,166].
The insertion of TEs and post-insertion events are thought to produce the homogenization of arrays
seen among non-homologous chromosomes within the same cell [165]. Indeed, recent evidence in
Monopterus albus show that two TEs, called GYPSY5-ZM_I retrotransposable element of Zea mays and
MuDR-13_VV DNA transposable element of Vitis vinifera, gave rise to the Monopterus albus satDNA
repeats MALREP (MALREP-A, MALREP-B, and MALREP-C) through unequal crossing-over [167].
The same mechanism was previously observed in the P. sativum tandem repeat satellite PisTR-A,
in which the long terminal repeats (LTRs) of the Ty3/gypsy Ogre retrotransposons represent the template
for the amplification of satDNA arrays [168] and, thus, contribute to the origin of species-specific
centromeric satellites [48,169]. Generation of a new centromere site has also been correlated with
the pervasive transcription of TEs that recruit CENP-A through small RNAs called centromere
repeat-associated short interacting RNAs (crasiRNAs) [166,169]. Given the recently appreciated role of
centromere transcripts and transcription in centromere function [170], it is possible that TEs operate
by inducing breaks and/or by exerting the induction of transcription, and both these processes may
converge to promote centromere formation.
     High prevalence of gene conversion events are overrepresented in palindromic and reversed
repetitive sequences [164]. DNA palindromes appear to be a feature of centromeres and pericentromeres
in different species [171–173]. Palindromes also have the intrinsic potential to adopt non-canonical
B-DNA helix conformations, including Z-DNA, triplex, quadruplex, cruciform [174], again suggesting
a multi-step challenge associated with DNA-based transactions like replication, transcription and repair
processes at centromere repeats [175]. In addition to palindromes, there are a multitude of alternative
DNA secondary structures that centromere repetitive DNA assume, including non-B-DNA [153],
triples and G-quadruplex (G4) [176–178], i-motifs [179,180], hairpins [181] and loops found at human
α-satellites [152]. These and other possible arrangements for three-dimensional DNA folding are
expected to directly hinder the replication process as physical barriers. These impediments can also
lead to the lower affinity of DNA polymerase for the newly synthetized strand, causing out of register
“replication slippage” (Figure 3C) [182]. Replication slippage has been speculated to contribute to
centromere repeat amplification, and can provoke either replication fork stalling or collapse, generating
a DSB and further promoting mutagenesis [183,184]. DSBs can be repaired through different pathways
with specialized protein cascades and diverse outcomes. While DSB repair pathways have been
extensively detailed, information on centromeric DSB repair is still lacking. Generally, non-homologous
end joining (NHEJ) is a primary pathway of repair utilized throughout the cell cycle that promotes the
rapid re-ligation of broken DNA ends without requiring extensive processing. NHEJ is comprised of
canonical-NHEJ (c-NHEJ) or alternative-NHEJ (a-NHEJ). The latter can utilize micro-homology between
the two broken ends for alignment between sequences of 1–16 nucleotides before rejoining [185].
NHEJ represents an error–prone repair solution, which leaves behind a mutational scar, but such
a signature is not obviously observed within the available centromere sequences. Only once a
homologous sequence is available after replication can the damaged locus be repaired by homologous
recombination (HR). In S-phase and G2, approximately half of all DSBs become substrates for HR
using the sister template. To date, it is unclear how the suppression of HR in G1 occurs to prevent
centromere recombination with homologous sequences in other chromosomes or within the same
chromatid. Activation of HR relies on the generation of single stranded DNA as the DSB is resected.
Genes 2020, 11, 912                                                                              12 of 28

HR or homology-directed repair (HDR) encompasses different sub-pathways but commonly initiates
with DNA resection (strand invasion mediated by RecA (in bacteria) or Rad51 (in eukaryotes) that
leads to the formation of a displacement loop (D-loop) to create the Holliday junction). A conservative
form of HDR is synthesis-dependent strand annealing (SDSA) [186]. SDSA fills DSBs and inhibits
crossing over [187]. Because centromeres actively undergo recombination during the mitotic cell
cycle [141,147,188] and short- and long-range recombination events are speculated to drive centromere
formation and evolution, HR likely represents an active mode of repair for centromere damage.
However, this poses important questions on how faithful recognition of the true sister sequence is
accomplished, differentiating the many identical and matching sequences within the same chromatid
or across chromosomes. Aberrant recombination would give rise to non-allelic exchanges, as we
reviewed previously [140].
      There are other forms of DNA damage repair whose mutational signatures have been associated
with centromere DNAs. Replication fork failure, regression into so-called chicken foot structures and
other stalled/collapsed fork conformations can also produce unusual HR substrates, where resolution of
the one-ended DSB can be achieved through activation of break-induced repair (BIR) (Figure 3D) [189],
or microhomology-mediated break-induced repair (MMBIR) in case of non-sister templates [190,191].
BIR pathway activation on repetitive sequences can cause an out-of-register invasion and the resolution
of the D-loop leads to expansions and/or contractions of repeat arrays [192]. Centromere sequences
seem to carry a mutational signature compatible with BIR according to a recent report [183].
      As an alternative, circular 30 ssDNA (single stranded DNA) templates generated at the D-loop
lead to the induction of rolling circle replication (RCR) (Figure 3E) which occurs preferentially within
inverted repeats arrays, generating concatemers [193]. As a result, DNA repair protein RAD51 homolog
1 (RAD51) plays a central role in processing the HJ loop [194], principally with the aim to inhibit
single-strand annealing (SSA), an error-prone mechanism that anneals the homologous DNA sequence
at the break without a gap, causing a sequence deletion (Figure 3F) [187,195]. SSA results in loss of
DNA where the 25-nucleotide strand annealing is followed only by polymerase filling and intermediate
ligation [196,197]. Because many of these repair pathways are error-prone, they induce mutagenesis
that may favor the evolution of centromere DNA (Figure 3).
      Indeed, Rice [183] assigned a contribution to both BIR and SSA pathways in the plasticity of
HORs. Contrary to Smith [146], intermingled alternation of CENP-A-enriched/centric core expansion
by the BIR pathway during replication, and the length-eroding SSA pathway during the repair of
DSBs have converged to enable the formation of homogenized HORs. The latter repair pathway (SSA)
appears quite infrequently in centromeric and pericentromeric regions [183]. The large size of the
HORs underscores this expansion [189,198]. Furthermore, there is a corresponding increase in CENP-A
with expansion of HOR sequence arrays, which in turn leads to increased CENP-A deposition in the
form of a positive feedback loop [199,200].
      The aforementioned processes cause amplification, expansion and large-scale remodeling of
the genomic landscape at the centromere. However, they must also be intersected by localized
mutagenesis, including that which triggers divergence between monomers. In the example of the
human centromere, individual monomers of α-satellites share only 50–70% sequence identity between
each other, while HOR blocks are nearly-identical. Thus, large-scale processes may be rarer and have
operated on a wider timescale than small-scale changes and micro-mutations that may continue to
shape and diverge centromeres. Notably, BIR seems sufficient to create mutations within the replicated
sequences (around 1000-fold with respect to DNA replication without out-of-register forks [183,201])
and results in both long and short-range changes.
      A supplementary mechanism to accomplish concomitant mutagenizing and homogenizing of
the centromeric repeats is based on inter-chromosomal translocations guided by the organization and
proximity of spatial repeats. A high percentage of translocation events has been demonstrated in
centromeric homology inverted repeats (HIRs) of common progenitors of C. albicans and C. tropicalis,
in which the loss of these inverted repeats provokes the formation of a new centromere. When the
Genes 2020, 11, 912                                                                              13 of 28

essential function of centromeric HIRs is missing, a CENP-A-rich zone influences the seeding of
evolutionary new centromeres (ENCs) in order to reestablish the eroded centromere region [202].
The plasticity of the centromere in establishing into a completely new location adds another layer of
complexity in tracking sequence generation through mutagenic processes, where sequences may be
originating from diverse and changing ancestral seeding DNA. Yet, these fitting simulations represent
important points of reflection to gain a more profound and complete appreciation of the complexity in
sustaining centromere evolution and maintenance. Much needed empirical evidence will uncover
which of these processes operate within the repetitive satellites through current sequencing efforts.
Because mechanisms to suppress processes like HR are emerging [141,147], mutagenic processes,
along with their mitigating pathways, will reveal how centromere DNA stability and evolution
are maintained.

Formation of Human Centromeres through Evolutionary Mutagenesis
      The DNA organization at human centromeres is a notable example of repeat amplification,
homogenization and mutagenesis. One of the first studies on the evolution of human satellite DNA
was advanced by Smith in 1976 [153], with the unequal sister crossover model used to describe the
dynamic mutability shown by α-satellite repeats. The model explains that the diverse nature of these
repetitive sequences is driven by the proportion between the rate of recombination of the mitotic sister
chromatid (r), the rate of the base pair mutation (u), and the minimum match length (m) required for
unequal crossover [203,204].
      More recent advances in methodologies and sequencing allowed the construction of centromere
phylogenies to compare centromeres among different organisms, as well as between the same species.
Intra- and inter-species analyses are a very helpful tool for the recognition of ancestral and new
properties of centromere repeats, exposing evolutionary constrains and adaptive changes over different
timescales [201]. In fact, even if the base substitution rate between chimps and human species is only
1.2% in non-centromeric regions (whether or not there is over-repeated and non-repeated DNA [205]),
there is a continuous rapid divergence that has been demonstrated through the hybridization of
human centromeric DNA probes on the ortholog chimp centromere sequences, suggesting that
centromeres have higher degree of divergence [206–208]. α-satellite DNA has been found in Old
World Monkeys [209–211], in New World Monkeys [212,213] and in prosimians [214,215], where it
maintains a monomeric, more disordered α-satellite organization [216–218]. Instead, α-satellite higher
order structure (as found in human centromeres) is also present in our relative Great Apes such as
chimpanzees, gorillas [218,219], and orangutans [218,220]. This may reflect a very recent evolution
of monomeric satellites into an upper level organization through homogenized HORs [221]. This is
particularly interesting as pericentromeres retain monomeric, seemingly ancestral, α-satellite DNA
interspersed with Long interspersed nuclear elements (LINEs), Short interspersed nuclear elements
(SINEs) and other repetitive elements, suggesting that monomeric α-satellites served as an early
template for the HOR homogenization that followed.
      Alexandrov and colleagues advanced a very interesting model about the formation of HOR in
Great Apes from an old ancestral monomer in lower primates [209]. Supposedly, the divergence of old
monomers prior to the split among human, chimpanzee and gorilla gave rise to a monomer type able to
bind CENP-B, creating three supra-chromosomal families (SF) in which both the old and new monomers
are alternated [222,223]. In Great Apes, the new type of monomer is present in all chromosomes
with some exceptions (e.g., the Y chromosome in humans), although these peculiar cases also have
condensed structural organization [224]. In this model, HOR expansion and homogenization could be
raised by two different mechanisms: improper replication with the creation of multiple copies (such as
rolling circle replication, Figure 3E) [225] and unequal crossovers/gene conversion events (Figure 3A,B)
([154] and [226], respectively). Given the shared layers of α-satellites between chromosomes, it is
possible that the newest-born centromere within an old centromere promotes the sliding to the side of
the old monomers [227]. New FS arrays, homogenized in chromosome-specific HORs, may facilitate
Genes 2020, 11, 912                                                                              14 of 28

the maintenance of higher order structure through the concomitant recruitment of DNA binding
proteins [228]. The integration of the CENP-B box within the HOR array could facilitate kinetochore
assembly, yet its absence from the Y chromosome remains unclear [229]. The kinetochore-associated
recombination machine (KARM) is proposed to have a role in homogenizing functional centromeres
through topoisomerase II-induced breaks that are subsequently repaired by recombination [227].
      While evolutionary processes underlying centromere divergence remain unclear [7], a new
attractive model was recently provided by Rice [183] by assigning a contribution to all cellular
processes involved in the plasticity of HORs, as if HORs have their own molecularly encoded life cycle.
The steady drafting of HOR array extension and organization promotes a continued expansion, rather
than shrinkage, to generate megabases of homogenized HORs, while SSA contributes to diversity
between the individual units [183]. For the longest centromere, the overall size can reach up to
8 Mb [230]. This rapid increment in HOR size cannot be justified solely through antiparallel and
unbalanced exchanges between sister chromatids, first due to the exceptional variation found in
sex chromosomes and second due to the conserved head-to-tail orientation in all centromeric HORs.
Their homogenization seems principally due to replication-associated repair processes that contribute
to length diversification and homogenization of the HOR array [183].
      The model’s structural frame is based on the spatial organization of three types of ~170 bp
monomeric repeat units [231,232] that are predicted to influence centromere strength (i.e., the level of
outer kinetochore proteins): (1) one with a protein-binding sequence at its 50 end (the 17 bp b-box that
binds CENP-B), (2) a second that is identical to the first except that the CENP-B-box is mutated so that
it no longer binds CENP-B, and (3) a third lacking CENP-B docking site altogether [193].
      Among these three monomeric repetitive units, intra-array competition exists. It is based on the
capability of centromeric core repeats to extend and migrate towards the flanking heterochromatin
region, contrasting it. Thus, this new and interesting model highlights the contrasting forces and high
level of evolution caused by the amplification (BIR process), shirking (SSA process) and homogenization
of HORs [183].
      Inside human HORs, the number of monomers ranges from two (as in chromosome 1 [233]) to
34 monomers (as in chromosome Y) [224,234]. The sequence of monomers has up to 35% variability
among chromosomes and within the same chromosome [235], indicating that the formation of HOR
followed a different mutagenic process than HOR amplification through homogenization. Despite the
human HOR on the Y chromosome possessing alphoid DNA sequences, it differs from the other HORs
on autosomes and X chromosomes because it lacks CENP-B boxes [235], indicating that CENP-B is not
essential for a functional centromere [72,219]. Notably, some younger HORs with more homogenized
monomers [236] that have yet to accumulate additional mutations and SNPs are shared among
non-homologous autosomes [237], as for the chromosome groups 1, 5, 19-13, 21-14 and 22 [202].
Some of these sequences are regarded as “pan-centromeric” and are often used for the rapid detection
of multiple centromeres in different chromosomes. The fact that we can distinguish between younger
and older HORs based on mutational burden implies that either: (1) centromeres are exposed to genetic
changes at a high rate, or (2) mechanisms that protect centromeres mitigate for these events yet are not
fool proof, leading to the progressive accumulation of mutations.
      While chromosomes can contain more than one centromere array with its own set of HORs [238],
Sullivan and colleagues have highlighted the striking example of metastable epialleles found on
chromosome 17, where three contiguous unique Chr17-specific α-satellite HOR arrays (D17Z1,
D17Z1-B, and D17Z1-C) are found within the centromeric region, but only one array is active at
any given time [239]. This helps to prevent errors in nucleating the kinetochore and segregating
chromosomes during cell division. Interestingly, all arrays still have the ability to recruit CENP-A,
acting like epialleles. Yet in the majority of individuals across the human population, the active
centromere forms on the main array containing less inter-HOR variation [239]. These data indicate that
the homogenization of HOR is functionally important to support centromere function [119,154,239].
As the homogenization of HORs relies on replication fork collapse and re-initiation of replication
You can also read