The Effective Population Sizes of the Anthropoid Ancestors of the Human-Chimpanzee Lineage Provide Insights on the Historical Biogeography of the ...

Page created by Ruby Jones
 
CONTINUE READING
The Effective Population Sizes of the Anthropoid Ancestors of
the Human–Chimpanzee Lineage Provide Insights on the
Historical Biogeography of the Great Apes
Carlos G. Schrago*,1
1
 Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
*Corresponding author: E-mail: carlos.schrago@gmail.com, guerra@biologia.ufrj.br.
Associate editor: Asger Hobolth

Abstract
The recent development of methods that apply coalescent theory to phylogenetic problems has enabled the study of the

                                                                                                                                                                             Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
population-level phenomena that drove the diversification of anthropoid primates. Effective population size, Ne, is one of
the main parameters that constitute the theoretical underpinning of these new analytical approaches. For this reason, the
ancestral Ne of selected primate lineages has been thoroughly investigated. However, for some of these lineages, the
estimates of ancestral Ne reported in several studies present significant variation. This is the case for the common
ancestor of humans and chimpanzees. Moreover, several ancestral anthropoid lineages have been ignored in the studies
conducted so far. Because Ne is fundamental to understand historic species demography, it is a crucial component of a
complete description of the historical scenario of primate evolution. It also provides information that is helpful for dif-
ferentiating between competing biogeographical hypotheses. In this study, the effective population sizes of the anthro-
poid ancestors of the human–chimp lineage are inferred using data sets of coding and noncoding sequences. A general
pattern of a serial decline of population sizes is found between the ancestral lineage of Anthropoidea and that of Homo
and Pan. When the theoretical distribution of gene trees was derived from the parametric estimates obtained, it closely
corresponded to the empirical frequency of inferred gene trees along the genome. The most abrupt decrease of Ne was
found between the ancestors of all great apes and those of the African great apes alone. This suggests the occurrence of a
genetic bottleneck during the evolution of Homininae, which corroborates the origin of African apes from a Eurasian
ancestor.
Key words: species tree, primate evolution, speciation, coalescence.

Introduction                                                                           amount of sequence data available at the time. The coming
                                                                                       of the genomics era has renewed interest on the gene tree/
The molecular phylogenetics of anthropoids, especially of
                                                                                       species tree debate, and several methodological tools have
the lineage leading to humans and chimpanzees, has been
                                                                                       recently been developed (Edwards et al. 2007; Liu et al.
extensively studied since the birth of the field of molecular

                                                                                                                                                                          Article
                                                                                       2008, 2009; Edwards 2009). Most of these new methods rely
evolution (Sarich and Wilson 1967). From the start, molecular
data have challenged long-held assumptions on the chronol-                             on multispecies coalescence and, therefore, explicitly estimate
ogy of anthropoid diversification and also presented a novel                           population parameters such as effective population sizes
interpretation of the speciation process of our primate ances-                         (Rannala and Yang 2003; Heled and Drummond 2010).
tors (Goodman 1967; Goodman et al. 1972). The theoretical                                  In this sense, effective population sizes (Ne) are necessary
developments that followed the introduction of coalescent                              to understand the probability of gene trees given a species
theory in the early 1980s (Kingman 1982) were also immedi-                             tree because the distribution of gene trees relies on knowl-
ately applied to the problem of anthropoid evolution (Tajima                           edge of the number of coalescent units in internal branches,
1983; Takahata and Nei 1985; Pamilo and Nei 1988). In                                  which are measured in 2Ne generations (Degnan and Salter
particular, the problem of the phylogenetic relationship of                            2005). Thus, a deeper understanding of primate phylogenetics                       Fast Track
the Homo, Pan, and Gorilla genera was an extensively analyzed                          relies on accurate estimates of population-level parameters.
topic (Satta et al. 2000). It became clear that the topological                        With this aim, several analyses based on multiple loci as well
inconsistencies found among individual gene trees are not                              as whole genomes have been conducted within the last few
caused only by stochastic and systematic errors of gene tree                           years. The estimates obtained have exhibited a large variance
reconstruction (Wu 1991; Hudson and Wu 1992). The actions                              among studies. For example, the effective population size of
of population-level phenomena, such as incomplete lineage                              the ancestor of humans and chimpanzees has been inferred
sorting, also result in differences between gene trees and the                         to be as low as 12,000 (Yang 2002) and as high as 96,000
species tree (Nei 1987).                                                               (Chen and Li 2001) (table 1). This discrepancy is frequently
   This observation was largely ignored by molecular phylo-                            a consequence of the absolute substitution rates or genera-
geneticists in the 1990s, perhaps because of the limited                               tion times assumed when transforming the scaled effective

ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: journals.permissions@oup.com

Mol. Biol. Evol. 31(1):37–47 doi:10.1093/molbev/mst191 Advance Access publication October 11, 2013                                                                   37
Schrago et al. . doi:10.1093/molbev/mst191                                                                                     MBE
Table 1. Published Estimates of the Population Parameters of the Homo/Pan Ancestor Using Multilocus Data.
Study                           h or Ne               s                  l or Calibration Point              g                Method
Takahata et al. (1995)       0.005              4.6 (0.0046)        10 9                                  15          Likelihood
Chen and Li (2001)           52,000–96,000      4.8–6.4             Orangutan divergence at 12–16 Ma      15–20       Topological mismatch
Yang (2002)                  12,000–21,000      5.2–4.9             10 9                                  20          Likelihood
Rannala and Yang (2003)      0.002–0.0026       4.8                 10 9                                  20          Bayesian
Hobolth et al. (2007)        65,000             4.1                 Orangutan divergence at 18 Ma         25          HMM
Burgess and Yang (2008)      0.0061             0.0039              10 9                                 15          Bayesian
Scally et al. (2012)                            3.7 (5.5–7.0)       10 9                                  20–25       HMM
Hobolth et al. (2011)        47,000             4                   10 9                                  20          HMM

                                                                                                                                              Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
population size  in the effective number of individuals Ne              (fig. 1). The Gorilla divergence occurred at approximately
(Langergraber et al. 2012).                                              9 Ma, preceded by Pongo (~18 Ma), Nomascus (20 Ma), and
   Aside from the common ancestor of Homo and Pan, the                   Macaca (28 Ma). The root node, the separation between cat-
effective population size of the ancestor of ((Homo, Pan),               arrhines and platyrrhines, was dated at approximately 54 Ma.
Gorilla), that is, the African great apes, has also been investi-        The mean substitution rate of the data set with all codon
gated using multilocus approaches, and compared with the                 positions was estimated at 0.5  10 9 substitutions/site/year
ancestor of Homo/Pan, the estimates are relatively more                  (s/s/y), that for third codon positions of coding sequences
stable among studies (Rannala and Yang 2003; Hobolth                     was 0.9  10 9 s/s/y and for introns 1.0  10 9 s/s/y. The
et al. 2007; Burgess and Yang 2008). Estimates of the effective          ancestral state reconstruction of generation times indicated
population size of the ancestor of both Asian and African                that generation times increased gradually along the lineage
great apes were also obtained using multiloci and genomic                that led to humans and chimpanzees (fig. 1). The generation
data (Rannala and Yang 2003; Hobolth et al. 2011). On the                time of the Homo/Pan ancestor was 26.3 years. In the ((Homo,
other hand, the ancestor of the hominoids, catarrhines, and              Pan), Gorilla) ancestral population it was inferred to be
anthropoids are rare (Burgess and Yang 2008). Such estimates             21.2 years, whereas the generation time of the ancestor of
are necessary for the evaluation of primate phylogenetics in             the great apes was estimated to be 15.2 years and that of the
light of the multispecies coalescent process. In fact, phyloge-          Nomascus/great apes ancestral lineage 13.5 years. The gener-
netics is not the only research field that would gain insights           ation times of the ancestral stock that gave rise to hominoids
from the knowledge of ancestral effective population sizes.              and cercopithecoids and to platyrrhines and catarrhines
Other pattern-based areas of investigation, such as historical           were inferred to be 10.5 and 8.7 years, respectively. The
biogeography, could also potentially resolve conflicting                 lengths of the credibility intervals of the estimates of ancestral
hypotheses using population-level parameters. For example,               generation times were wide. As expected by a Brownian
it was the union of phylogeny and evolutionary demography                motion process, the variance of the estimates increased
that elucidated the evolution of human populations                       from the shallowest to the root node of tree (fig. 2).
(Novembre et al. 2008; Stoneking and Krause 2011).
   In this study, the effective population sizes of the lineages
                                                                         Ancestral Effective Population Sizes
leading to humans and chimpanzees are estimated using
inferred evolutionary rates and ancestral generation times.              In the data set with all codon positions, the scaled effective
This strategy avoids the adoption of simplistic values of stan-          size of the population of the ancestor of Homo and Pan
dard substitution rates and generation times. The evolution-             was estimated to be 0.00142 for coding regions (table 2).
ary demography of the anthropoid ancestors of the Homo/                  The value rose to 0.00274 and 0.00457 with third codon
Pan lineage exhibited a general pattern of a continuous                  positions and introns, respectively. This increase is expected,
decrease of effective population sizes. The estimates of ances-          as the rate of evolution is higher in third codon positions
tral demography resulted in a close match between the                    and in introns. When multiplied by mean substitution rates
expected and empirical distribution of gene trees along the              and the inferred ancestral generation time (26.3 years),
genome. Finally, the use of ancestral population sizes to shed           the effective population size of the Homo/Pan ancestor
light on the historical biogeography of the great apes is                was calculated to range from 27,716 individuals using all
discussed.                                                               codons of the coding region to 41,263 using only intron
                                                                         sequences (table 3). The average number of substitutions
Results                                                                  per site in sequences since the complete genetic isolation
                                                                         of the human and chimpanzee lineages was inferred
Evolutionary Rates and Inference of Ancestral                            at 0.002 (complete coding regions), 0.0038 (third codon
Generation Times                                                         positions), and 0.00363 (introns) (table 4), which corresponds
The phylogenetic analysis implemented in BEAST rendered a                to calendar times of 4.1 Ma (complete and third positions)
timescale of anthropoid evolution that estimated the Homo/               and 3.6 Ma (introns) if the mean mutation rates are assumed
Pan separation at approximately 6.5 Ma in all three data sets            (table 5).

38
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191                                                        MBE

                                                                                                                                                           Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
FIG. 1. Chronogram of the evolution of selected anthropoids. The timescale is the average of the estimates inferred in BEAST from the three data sets
studied. Bars on nodes are the 95% highest probability intervals of the age estimates. Numbers on nodes are the estimates of ancestral generation times,
also performed in BEAST.

    The scaled effective population size of the ((Homo, Pan),                  hominoids and Nomascus and twice the size of the basal
Gorilla) ancestor varied from 0.00215 (all) to 0.00405 (third)                 hominoid ancestor. Finally, the effective size of the anthropoid
(table 2). When mutation rates and ancestral generation                        ancestral population, the root node, was inferred to be close
times are used, the effective number of individuals was esti-                  to 1,000,000 individuals and the speciation between catar-
mated at approximately 51,000 individuals in coding regions                    rhines and platyrrhines to have occurred between 31.4 and
(all and third codon positions). In introns, this value decreased              39.0 Ma (table 5).
to 32,346 (table 3). After the divergence from Homo/Pan, the                      A global evaluation of the estimates indicated that, in
Gorilla lineage accumulated from 0.00284 (all) to 0.00579                      the majority of cases, the 95% probability intervals of the
(third) substitutions per site on average (table 4), which cor-                posterior distributions of effective population sizes obtained
responds to a time interval of approximately 6.0 Ma (table 5).                 from the three data sets presented overlaps (fig. 3). This is
    In the lineage ancestral to all great apes, the estimates of               most evident for the estimate of the population size of the
the scaled effective population size varied from 0.00677 (all)                 ancestor of great apes and Nomascus (fig. 3c). With the
to 0.01721 (introns) (table 2), which corresponds to an effec-                 exception of the ancestral Ne of the ((Homo, Pan), Gorilla)
tive size of 144,483 to 272,098 individuals (table 3). Speciation              lineage (fig. 3e), third codon positions resulted in the lowest
times between the Pongo lineage and other great apes varied                    estimates, followed by those obtained from the entire coding
from 10.9 to 13.8 Ma (table 5).                                                sequences. The only comparison that presented nonoverlap-
    Interestingly, the effective population size of the basal                  ping posterior distributions was the estimates from third
hominoid lineage, which represents the ancestral stock of                      codon positions and introns of the ancestral Ne of anthro-
the gibbon (Nomascus) and the great apes, was inferred                         poids (fig. 3a).
to be smaller than that of the ancestor of the great apes
alone. The scaled effective population size varied from
0.00297 to 0.00747 (table 2), which yielded from 108,333 to                    Theoretical Distribution of Gene Trees
133,014 individuals (table 3). Speciation times varied from                    The theoretical distribution of gene trees was obtained from a
14.7 to 16.9 Ma.                                                               species tree in which the ancestral branch of humans and
    For the ancestor of hominoids and cercopithecoids, the                     chimps measured 1.3 coalescent units. The ancestral branch
effective size of the ancestral population at this branch was                  of the African great apes was 3.7 coalescent units and was
estimated at 284,619 to 465,250 individuals (table 3), which is                receded by branches with 0.7, 2.5, and 2.2 coalescent units for
approximately three times larger than that of the ancestor of                  the ancestor of the great apes, the ancestor of hominoids and

                                                                                                                                                     39
Schrago et al. . doi:10.1093/molbev/mst191                                                                                                     MBE

                                                                                                                                                                Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
FIG. 2. Posterior distributions of ancestral generation times inferred in BEAST. The solid black lines represent the data set of third codon positions of the
coding sequences. The dashed black lines are the results from the data set with the entire coding sequence, and the solid gray lines represent the intron
data set. (a) Ancestor of Callithrix/other anthropoids; (b) ancestor of Macaca/Hominoidea; (c) ancestor of Nomascus/great apes; (d) ancestor of Pongo/
African great apes; (e) ancestor of gorillas/humans–chimpanzees; and (f) ancestor of humans and chimpanzees.

Table 2. Scaled Effective Population Size (h = 4Nelg) Obtained from               Table 3. Effective Population Sizes (Ne), Assuming the Inferred
the Three Data Sets Analyzed.                                                     Evolutionary Rates and Ancestral Generation Times.
                    All                  Third               Introns                                 All                  Third             Introns
Homo/Pan     0.00142 ± 0.00016     0.00274 ± 0.00038    0.00457 ± 0.00028         Homo/Pan      27,716 ± 3,088        28,352 ± 3,929     41,263 ± 5,079
Gorilla      0.00215 ± 0.00010     0.00405 ± 0.00020    0.00311 ± 0.00010         Gorilla       51,948 ± 2,393        51,925 ± 2,598     32,346 ± 3,421
Pongo        0.00677 ± 0.00077     0.00807 ± 0.00111    0.01721 ± 0.00118         Pongo        228,204 ± 25,919      144,483 ± 19,937   272,098 ± 34,631
Nomascus     0.00297 ± 0.00014     0.00538 ± 0.00028    0.00747 ± 0.00022         Nomascus     112,774 ± 5,430       108,333 ± 5,560    133,014 ± 14,846
Macaca       0.00671 ± 0.00030     0.01098 ± 0.00055    0.01989 ± 0.00054         Macaca       327,601 ± 14,456      284,619 ± 14,267   465,250 ± 46,526
Callithrix   0.02007 ± 0.00058     0.02788 ± 0.00101    0.06499 ± 0.00120         Callithrix 1,182,362 ± 33,931      871,899 ± 31,705 1,595,221 ± 153,377

the ancestor of Catarrhini primates, respectively. Using the                      followed by the topology that clustered Nomascus with the
parametric estimates obtained in BEAST and BPP, the most                          ((Homo, Pan), Gorilla) clade with the exclusion of Pongo
likely gene tree presented the same topology as the species                       (11.7%). The observed frequencies of these topologies
tree (table 6). This same topology was also the most fre-                         were 7.7% and 7.2%, respectively. The topologies associated
quently recovered among the individual loci (30.7%). The                          with alternative phylogenetic relationships within the
next four most likely gene trees also matched the empirically                     ((Homo, Pan), Gorilla) clade were equally likely (4.9%)
estimated topologies. Apart from the correct topology, the                        based on theory. This was also observed in the empirical
theoretical distribution predicted that the topology with the                     data, in which the frequencies of the topologies were 7%
(Pongo, Nomascus) clade would be the most likely (11.8%),                         and 6.9%.

40
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191                                                         MBE
Table 4. Scaled Ages of the Genetic Isolation of Lineages (s = Tl).             Discussion
                    All                  Third               Introns
                                                                                Our results showed that the effective population sizes of the
Homo/Pan     0.00200 ± 0.00005     0.00380 ± 0.00013    0.00363 ± 0.00007
                                                                                lineages leading to humans and chimpanzees exhibited a gen-
Gorilla      0.00284 ± 0.00004     0.00552 ± 0.00010    0.00579 ± 0.00004
                                                                                eral trend of sequential reduction of the effective number of
Pongo        0.00230 ± 0.00012     0.01271 ± 0.00024    0.01091 ± 0.00012
                                                                                individuals, from approximately 1,200,000 to 30,000 (fig. 4).
Nomascus     0.00766 ± 0.00006     0.01549 ± 0.00014    0.01469 ± 0.00008
                                                                                However, this trend was interrupted by a population expan-
Macaca       0.01041 ± 0.00005     0.02218 ± 0.00023    0.02159 ± 0.00012
                                                                                sion that occurred in the ancestral lineage of the great apes
Callithrix   0.01532 ± 0.00004     0.03584 ± 0.00032    0.03377 ± 0.00019
                                                                                (fig. 4). The basal anthropoid lineage, that is, the ancestor of
                                                                                platyrrhines and catarrhines, was inferred to be 3.4 times
                                                                                larger than the lineage that gave rise to extant catarrhines
Table 5. Ages of Genetic Isolation (in Ma), Assuming l = 4.9  10 10            alone (the divergence between Macaca and hominoids),
s/s/y for All Codon Positions, l = 9.0  10 10 s/s/y for Third Codon            which was approximately three times larger than the subse-

                                                                                                                                                            Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
Positions and l = 1.0  10 9 for Introns.                                       quent ancestral population (the ancestor of Nomascus and
                         All                 Third               Introns
                                                                                the great apes). There was then an expansion of 1.8 times in
Homo/Pan               4.1 ± 0.1            4.1 ± 0.1            3.6 ± 0.1
                                                                                the ancestral lineage of the great apes followed by a drastic
Gorilla                5.8 ± 0.1            6.0 ± 0.1            5.8 ± 0.1
                                                                                reduction of 4.7 times for the ancestor of humans, chimps,
Pongo                 11.9 ± 0.1           13.8 ± 0.2           10.9 ± 0.1
                                                                                and gorillas. Finally, the effective population size of the ances-
Nomascus              15.7 ± 0.1           16.9 ± 0.1           14.7 ± 0.1
                                                                                tral lineage of Homo/Pan was approximately 70% the size of
Macaca                21.3 ± 0.1           24.1 ± 0.2           21.6 ± 0.1
                                                                                the ((Homo, Pan), Gorilla) ancestral population.
Callithrix            31.4 ± 0.2           39.0 ± 0.3           33.8 ± 0.2
                                                                                    Of the parameters analyzed in this study, the size of the
                                                                                ancestral population of the Homo/Pan lineage is by far the

FIG. 3. Posterior distributions of absolute ancestral population sizes Ne, obtained using the inferred evolutionary rates and generation times. The solid
black lines represent the data set of third codon positions of the coding sequences. The dashed black lines are the results from the data set with the
entire coding sequence, and the solid gray lines represent the intron data set. (a) Ancestor of Callithrix/other anthropoids; (b) ancestor of Macaca/
Hominoidea; (c) ancestor of Nomascus/great apes; (d) ancestor of Pongo/African great apes; (e) ancestor of gorillas/humans–chimpanzees; and
(f) ancestor of humans and chimpanzees.

                                                                                                                                                      41
Schrago et al. . doi:10.1093/molbev/mst191                                                                                               MBE
         Table 6. Theoretical Probability of the Most Likely Gene Trees and Their Empirical Frequencies along the Loci Studied.
         Gene Tree Topology                                                           Theoretical Probability (%)   Empirical Frequency (%)
         ((((((Homo, Pan), Gorilla), Pongo), Nomascus),   Macaca),   Callithrix)                 46.1                         30.7
         (((((Homo, Pan), Gorilla), (Pongo, Nomascus)),   Macaca),   Callithrix)                 11.8                          7.7
         ((((((Homo, Pan), Gorilla), Nomascus), Pongo),   Macaca),   Callithrix)                 11.7                          7.2
         ((((((Homo, Gorilla), Pan), Pongo), Nomascus),   Macaca),   Callithrix)                   4.9                         7
         ((((((Pan, Gorilla), Homo), Pongo), Nomascus),   Macaca),   Callithrix)                   4.9                         6.9

most studied. The parametric estimates obtained here, which                        coalescence times of the loci studied (Edwards and Beerli
ranged from 27,000 to 41,000, are in agreement with previous                       2000). Thus, it is possible that the estimates of times and

                                                                                                                                                      Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
estimates of approximately 20,000 obtained by Rannala and                          rates were impacted by this difference. However, the sequen-
Yang (2003) and 47,000 by Hobolth et al. (2011). As those                          tial reduction of the effective population sizes of anthropoids
previous studies adopted different substitution rates and                          is actually independent of BEAST estimates, as the scaled
generation times for the ancestor of humans and chimps,                            effective population sizes also displayed this trend (fig. 4).
different parametric estimates are expected to be found.                           It might be argued that the times estimated in BEAST,
The estimates of the effective population size of the ancestral                    which were used as priors for the coalescence analyses in
lineage of humans, chimps, and gorillas were also similar to                       BPP, may have biased the estimates of . The large difference
those obtained in the multilocus analyses of Yang (2002) and                       between the posterior distributions of s and their priors
Rannala and Yang (2003) as well as the genomic analysis of                         obtained from BEAST indicates that the posteriors were
Hobolth et al. (2007). Thus, it appears that the effective sizes                   robust to prior assumptions.
of ancestral lineages within the Homininae have not under-                             When estimates of ancestral population sizes are con-
gone drastic population changes. The estimates of Ne                               trasted with the historical biogeography of anthropoids, an
obtained here for the ancestor of orangutans and the                               interesting association emerges. The drastic 5-fold decline in
African great apes were very large. This is also corroborated                      population size between the ancestral lineage of the great
by the recent analysis of Hobolth et al. (2011), which esti-                       apes and the ancestral lineage of the Homininae coincides
mated this parameter at 187,000. Therefore, a substantial re-                      with a possible dispersal event from Eurasia to Africa in the
duction of population size between the common ancestor of                          late Miocene. Such a hypothesis evidently assumes that the
orangutans and African great apes and the ancestor of the                          geographical distribution of the ancestor of the great apes was
Homininae is independently validated.                                              Eurasian. This is supported by the recent analysis of Springer
    The estimates of the effective population size at the                          et al. (2012), which is corroborated by earlier studies (Stewart
Gorilla/(Homo, Pan) ancestral branch are intriguing, because                       and Disotell 1998; Begun 2010). It has also been proposed that
the coding sequences presented an absolute Ne estimate                             great apes originated in Africa and posteriorly dispersed to
greater than the estimate obtained with intron sequences                           Asia (Moya-Sola et al. 1999). Although we cannot discard this
(fig. 3e). This result appears inconsistent, because introns                       hypothesis, the estimates of ancestral population sizes make it
are expected to be under relaxed selective pressures                               unlikely because a strong reduction of population size is not
(Charlesworth 2009). Notice, however, that, in agreement                           expected if the ancestor of great apes was also African.
with the theoretical expectation, the scaled effective popula-                     Moreover, because the Pongidae (represented in our analysis
tion size, , of intron sequences at this branch was greater                       by Pongo) and the Hylobatidae (represented by Nomascus)
than the value inferred for coding sequences (table 2). The                        are geographically distributed in Asia and the fossil record of
theoretically anomalous estimate was then obtained when                            the apes extends from Europe to Asia (Fleagle 2013), it seems
transforming  into absolute Ne by employing an incorrect                          reasonable to assume that the ancestor of Homininae dis-
evolutionary rate at this branch (generation times were fixed).                    persed from Eurasia to Africa. Other events of population
In fact, the intron sequences used in this study were not                          decline/expansion are not easily linked to historical biography
subjected to the molecular clock test because they were                            because of the absence of a clearer picture of the spatial
expected to evolve neutrally. As a result, the variance of evo-                    evolution of anthropoids.
lutionary rates along branches was greater in the intron data                          It is worth mentioning that, although the parametric
set ( 2 = 0.009) when compared with coding sequences                              estimates obtained from the different data sets differed, the
( 2 = 0.006) and third codon positions ( 2 = 0.003).                             general scenario of a sequential reduction of Ne along the
Therefore, it is likely that this unexpected result was due to                     lineage leading to humans and chimps was replicated
the fact that the mean rate of the intron data set was not                         (fig. 4). This finding shows that the use of coding regions
representative for every branch.                                                   to estimate past dynamics of population sizes of ancestral
    Regarding the inference of evolutionary rates and absolute                     species may not suffer from biases resulting from the action
times in BEAST, it should be remarked that the absolute times                      of natural selection. The correspondence between the
inferred are not equivalent to the speciation times (the                          frequency of the inferred gene trees and the theoretical dis-
parameter), because they were estimated from the mean                              tribution also suggests that the general scenario proposed
genetic distances among species, which are, in fact, mean                          here is a good approximation of the historical changes in

42
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191                                                        MBE

                                                                                                                                                           Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019

FIG. 4. Evolutionary demography of Anthropoidea. The upper graph depicts the box plots of the posterior distribution of the scaled effective population
sizes  along the anthropoid lineages leading to humans and chimpanzees. The black box plots represent the data set of third codon positions; estimates
from the entire coding sequence and from introns are shown in light and dark gray, respectively. The lower graph displays the chronogram of
anthropoid evolution scaled by the speciation times . The width of the branches of the lineages leading to Homo/Pan is proportional to their respective
Ne estimates, which are also shown alongside the branches. All Ne values were approximated. The geographical distributions of terminal taxons are also
shown. For the sake of intelligibility, only the ancestral distribution of Homo was shown.

                                                                                                                                                     43
Schrago et al. . doi:10.1093/molbev/mst191                                                                                 MBE

FIG. 5. Summary of the methodological procedure used in this study.

                                                                                                                                          Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
effective population sizes. The theoretical distribution of gene      phylogenetic method implemented in BEAST was used to
trees assumes that each gene tree is an independent realiza-          obtain prior distributions of node ages (million years ago,
tion of the multicoalescent process (Degnan and Salter 2005).         Ma) and evolutionary rates (in s/s/y) at branches. BEAST
This assumption is not met in real data because genes may be          was also used to infer the ancestral generation times (years)
physically linked. Moreover, a comparison between empirical           (step I). Step II consisted of the estimation of ancestral effec-
and theoretical distributions of gene trees does not account          tive population sizes in the BPP software using the multi-
for the errors associated with gene tree inference, a problem         species coalescent method of Rannala and Yang (2003).
that is also found in methods that estimate ancestral effective       Ancestral effective population size inference is performed
population sizes using topological incongruence (Yang 2002).          using evolutionary distances; therefore, Ne estimates are
Thus, differences between theoretical and empirical frequen-          scaled by the substitution rate  per generation g – the 
cies are expected to occur.                                           parameter ( = 4Neg). To obtain the effective number of
   As reported by Rannala and Yang (2003), the estimation             individuals, Ne, absolute evolutionary rates per generation
of ancestral Ne is complicated by the high correlation                must be provided. A summary of the entire analysis is given
between the scaled effective population size  and speciation         in figure 5.
time . In this study, the correlation between those parame-
ters was indeed high, varying from 0.8 between Homo/pan
and  Homo/pan (third codon positions) to 0.5 between Pongo
and  Pongo (all coding sequences). Such high correlation values      Data Set
could be alleviated by the adoption of strong prior distribu-         A total of 9,225 orthologous genes of Homo, Pan, Gorilla,
tions. However, this would also assume a large a priori knowl-        Pongo, Macaca, Nomascus, and Callithrix were downloaded
edge of the parameters we are interested in estimating. This          from the OrthoMam database. Only coding sequences (CDS)
approach sounds contradictory. This is why the posterior              were downloaded from OrthoMam. Introns of the same
distributions of branch lengths, obtained by the phylogenetic         9,225 genes were independently obtained by parsing the
method, were used.                                                    genomes available in Ensembl using a Perl script. The estima-
   In conclusion, this study indicated that the lineage from          tion of ancestral effective population sizes assumes that
the ancestor of all anthropoid primates to the ancestor of            evolutionary rates are homogeneous along the branches of
humans and chimps has undergone a serial reduction of ef-             the phylogeny, that is, the locus evolves under the molecular
fective population sizes. This pattern is corroborated by the         clock. Thus, alignments of all coding sequences were sub-
frequency of tree topologies inferred from the coding regions         jected to a likelihood ratio test (LRT) to verify which loci
sampled across the primate genomes. Of the inferred past              rejected the molecular clock hypothesis (Felsenstein 1988).
population scenarios, the most drastic change was the ap-             When testing for the molecular clock, the best fit evolutionary
proximately 5-fold reduction in effective population size be-         model was assigned to each alignment according to the
tween the ancestor of the Asian and African great apes and            model test implementation of HyPhy (Pond et al. 2005).
the ancestor of African great apes alone, suggesting that the         In coding sequences, only genes that failed to reject the
Homininae diversified after a dispersal event from a Eurasian         strict clock at 5% significance level were further analyzed
ancestor. Finally, the results presented here demonstrate the         (1,401 genes). An additional data set containing only third
relevance of phylogenetic approaches that explicitly incorpo-         codon positions of the clock-like genes was assembled.
rate the coalescent process to investigate long-term evolu-           This was done to reduce the effect of selection on sequences.
tionary events.                                                       The alignments of intronic regions were not submitted to
                                                                      molecular clock tests. Thus, evolutionary parameters were
Materials and Methods                                                 inferred for three data sets: 1) clock-like coding sequences,
To obtain estimates of the absolute ancestral population sizes        2) third codon positions of clock-like coding sequences, and
of Anthropoidea, a two-step strategy was adopted. First, the          3) introns of all genes analyzed.

44
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191                                     MBE
Step I—Estimation of Node Ages, Evolutionary Rates,                2010), which assumes a Brownian diffusion model that ac-
and Ancestral Generation Times                                     commodates branch-specific variation in dispersal rates.
Divergence time estimation was performed in BEAST v1.7.3           Thus, for example, instead of assuming that the generation
(Drummond et al. 2012) using the Bayesian relaxed molecular        time of the ancestor of Homo and Pan was a simple arith-
clock. Clock-like genes were concatenated in supermatrices         metic average between the values at the terminals (29.1 and
with 1,673,634 bp (all codons) and 557,878 (third codon po-        24.63), the model weights the ancestral generation time
sitions). The supermatrix of concatenated introns was              inference by the rates associated with the both branches
4,417,078 bp long. In all supermatrices, the model of nucleo-      that split from the ancestor node. Therefore, the use of
tide substitution was the GTR + G4 + I model, which was            arithmetic averages would imply equal probability of dis-
chosen by LRT in HyPhy (Pond et al. 2005). The prior distri-       persal of the ancestral generation time to the value of the
bution of rates along branches was modeled by the uncorre-         tips, while the relaxed random walk permits the dispersal of
lated lognormal distribution, and the prior distribution of        the ancestral value to be proportional to the rates associated

                                                                                                                                       Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
node ages was modeled by the Yule process. To approximate          with each branch.
the posterior density of parameters, BEAST relies on the
Markov chain Monte Carlo (MCMC) algorithm. The
MCMC was run for 50,000,000 generations, and chains were
                                                                   Step II—Estimation of Effective Population Sizes
sampled every 1000th cycle. The analysis was run twice to          The scaled effective population sizes of ancestral hominoids
check for convergence.                                             were estimated with BPP 2.1 software, which implements
                                                                   the method of Rannala and Yang (2003) and Burgess
Calibration Information                                            and Yang (2008). Coding sequences were analyzed individ-
To calibrate the tree, four prior distributions were assigned      ually. Introns of the same coding sequence were concate-
to node ages. 1) The Homo/Pan split was calibrated by a            nated to compose a single intronic sequence per gene.
normal prior with mean = 8.25 Ma and standard deviation            The decision of whether a locus was autosomal, mitochon-
(SD) = 0.9. This prior was based on Benton and Donoghue            drial or from the X or Y chromosome was based on the
(2007); the mean was the average of the maximum and                position of the marker in Homo sapiens. The variation of
minimum ages for split, and the SD was set so as to contain        substitution rates among loci was incorporated into the
the minimum and maximum values within the 99% highest              analysis by using relative rates, as implemented in Burgess
probability density (HPD) interval. 2) The divergence              and Yang (2008).
between Pongo and the great apes was constrained by a                  The BPP software program also conducts a Bayesian
gamma prior with shape = 15 and scale = 1, corresponding           inference with a MCMC algorithm. Markov chains were
to an HPD interval from 9.2 to 21.9 Ma. This calibration was       run for 50,000,000 generations and sampled every 1,000th
based on the fossil records of Lufengpithecus, Sivapithecus,       cycle. The analysis was replicated to check for convergence.
and Khoratpithecus from the Miocene of Asia (Hartwig               BPP requires the definition of prior distributions for the age
2002; Chaimanee et al. 2003, 2004). A gamma prior was              of the root (), that is, the age of the split between Callithrix
used on the Pongo split because of the large uncertainty           and catarrhines, and the  parameter. The parameter  is a
associated with the maximum age of the node. 3) The                measure of the speciation time per se, that is, the time of
Cercopithecoid/Hominoid split, a normal prior with                 the separation between populations, and not the mean ge-
mean = 28.45 and SD = 3.4, was entered based on Benton             netic divergence between sister lineages, that is, the mean
and Donoghue (2007) and followed the same rationale                coalescence time of interspecies gene pairs, which is the
applied to the Homo/Pan split. 4) Finally, the Platyrrhini/        genetic distance traditionally obtained in usual phylogenetic
Catarrhini divergence was modeled by a normal prior with           analyses (Edwards and Beerli 2000). Nevertheless,  is also
mean = 40 Ma and SD = 7 Ma. This is a flexible prior that          measured in number of substitutions per site. Because the
permits the age of the basal anthropoid to vary from               mean number of substitutions per site in a given branch is
27 Ma, the age of Branisella boliviana, the earliest               the product of the substitution rate per site per year and the
Platyrrhini fossil (Takai and Anaya 1996), to the Paleocene        time duration of the branch in years, the prior for  was
(~60 Ma)—the age of the early Primates (Fleagle 2013).             obtained by calculating the product of the posterior distri-
Reconstruction of Ancestral Generation Times                       butions of absolute substitution rates and the age of the
The reconstruction of ancestral generation times was per-          divergence between Callithrix and catarrhines (the root
formed simultaneously with the divergence time estimation          node) inferred in BEAST. This posterior distribution, the
in BEAST. For the terminal taxa, the values were obtained          result of the product of times and rates, was approximated
from the literature. The generation times of Homo (29.1            by a gamma distribution using the MASS package of the
years), Pan (24.63 years), and Gorilla (19.28 years) were set      R programming environment. This strategy rendered the
according to the recent work of Langergraber et al. (2012).        following gamma priors: G ( = 464.06, = 17859.16) for
The generation times of Pongo (15.4 years), Nomascus (8.4          the analysis of all codon positions, G ( = 918.32,
years), Macaca (3.4 years), and Callithrix (1.4 years) were          = 18,207.95) for third codon positions, and G ( = 963.71,
borrowed from Shumaker et al. (2008) and Ernest (2003).              = 16,199.36). It should be noted that the strategy used to
Ancestral state reconstruction was performed using a mul-          obtain the prior for  did not actually measure speciation
tivariate continuous relaxed random walk (Lemey et al.             times; it is a measure of the mean coalescence time T of the

                                                                                                                                 45
Schrago et al. . doi:10.1093/molbev/mst191                                                                                           MBE
genes (in s/s/y). The larger the ancestral effective population          Degnan JH, Rosenberg NA. 2008. Gene tree discordance, phylogenetic
size, the larger is the expected difference between T and .                 inference, and the multispecies coalescent. Trends Ecol Evol. 24:
                                                                             332–340.
In the absence of an informative prior for , the distribution           Degnan JH, Salter LA. 2005. Gene tree distributions under the coalescent
of the average coalescence time is the best guess from the                   process. Evolution 59:24–37.
data available. In both data sets, a gamma distribution G                Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayesian phylo-
( = 2, = 200) was assigned as prior for .                                   genetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 29:1969–1973.
                                                                         Edwards S, Beerli P. 2000. Perspective: gene divergence, population
                                                                             divergence, and the variance in coalescence time in phylogeographic
Calculation of Gene Tree Probabilities                                       studies. Evolution 54:1839–1854.
                                                                         Edwards SV. 2009. Is a new and general theory of molecular systematics
When the estimates of ancestral effective population sizes,                  emerging? Evolution 63:1–19.
speciation times, and generation times are known, the cal-               Ernest SKM. 2003. Life history characteristics of placental nonvolant
culation of the probabilities of gene trees given the species                mammals. Ecology 84:3402.
                                                                         Edwards SV, Liu L, Pearl DK. 2007. High-resolution species trees without

                                                                                                                                                     Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019
tree can be carried out (Degnan and Salter 2005). The the-
                                                                             concatenation. Proc Natl Acad Sci U S A. 104:5936–5941.
oretical probability distribution of gene trees for a given              Felsenstein J. 1988. Phylogenies from molecular sequences: inference
species tree can then be compared with the collection of                     and reliability. Annu Rev Genet. 22:521–565.
gene trees inferred from several loci across the genome. If              Fleagle JG. 2013. Primate adaptation and evolution. San Diego:
the estimates of ancestral population sizes are correct, we                  Academic Press.
expect the theoretical distribution of gene trees to closely             Goodman M. 1967. Deciphering primate phylogeny from macromolec-
                                                                             ular specificities. Am J Phys Anthropol. 26:255–276.
match the empirically determined gene trees. The theoret-                Goodman M, Barnabas J, Moore GW. 1972. Rates of molecular evolution
ical distribution of gene trees was obtained using COAL 2.1,                 in primate phylogeny. Am J Phys Anthropol. 37:437.
which requires as an input a species tree with internal                  Hartwig WC. 2002. The primate fossil record. New York: Cambridge
branches measured in coalescent units (2N generations).                      University Press.
To create such a tree, the time duration of the branches                 Heled J, Drummond AJ. 2010. Bayesian inference of species trees from
                                                                             multilocus data. Mol Biol Evol. 27:570–580.
(in generations) must be divided by twice the effective pop-             Hobolth A, Christensen OF, Mailund T, Schierup MH. 2007. Genomic
ulation size (Degnan and Rosenberg 2008). Therefore, the                     relationships and speciation times of human, chimpanzee, and
ancestral population sizes and speciation times previously                   gorilla inferred from a coalescent hidden Markov model. PLoS
estimated in BPP, as well as the ancestral generation times,                 Genet. 3:294–304.
were used. Effective population sizes and speciation times               Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. 2011.
                                                                             Incomplete lineage sorting patterns among human, chimpanzee,
were averaged across the three data sets.                                    and orangutan suggest recent orangutan speciation and widespread
                                                                             selection. Genome Res. 21:349–356.
Acknowledgments                                                          Hudson RR, Wu CI. 1992. Gene trees, species trees and the segregation
                                                                             of ancestral alleles. Genetics 131:509–513.
The author thanks the three anonymous reviewers and the                  Kingman JFC. 1982. On the genealogy of large populations. J Appl
editor for comments that significantly improved the manu-                    Probab. 19A:27–43.
script. This work was supported by the National Research                 Langergraber KE, Prufer K, Rowney C, et al. (20 co-authors). 2012.
                                                                             Generation times in wild chimpanzees and gorillas suggest earlier
Council of Brazil (CNPq) grants 308147/2009-0 and 307982/                    divergence times in great ape and human evolution. Proc Natl Acad
2012-2 and by FAPERJ grants 110.838/2010, 110.028/2011 and                   Sci U S A. 109:15716–15721.
111.831/2011 to C.G.S.                                                   Lemey P, Rambaut A, Welch JJ, Suchard MA. 2010. Phylogeography
                                                                             takes a relaxed random walk in continuous space and time. Mol
                                                                             Biol Evol. 27:1877–1885.
References                                                               Liu L, Pearl DK, Brumfield RT, Edwards SV. 2008. Estimating species trees
Begun DR. 2010. Miocene hominids and the origins of the African apes         using multiple-allele DNA sequence data. Evolution 62:2080–2091.
   and humans. Annu Rev Anthropol. 39:67–84.                             Liu L, Yu L, Kubatko L, Pearl DK, Edwards SV. 2009. Coalescent methods
Benton MJ, Donoghue PC. 2007. Paleontological evidence to date the           for estimating phylogenetic trees. Mol Phylogenet Evol. 53:320–328.
   tree of life. Mol Biol Evol. 24:26–53.                                Moya-Sola S, Kohler M, Alba DM. 1999. Primate evolution—in and out
Burgess R, Yang Z. 2008. Estimation of hominoid ancestral                    of Africa—Comments. Curr Biol. 9:R547–R548.
   population sizes under Bayesian coalescent models incorporating       Nei M. 1987. Molecular evolutionary genetics. New York: Columbia
   mutation rate variation and sequencing errors. Mol Biol Evol. 25:         University Press.
   1979–1994.                                                            Novembre J, Johnson T, Bryc K, et al. (12 co-authors). 2008. Genes mirror
Chaimanee Y, Jolly D, Benammi M, Tafforeau P, Duzer D, Moussa I,             geography within Europe. Nature 456:98–101.
   Jaeger JJ. 2003. A middle Miocene hominoid from Thailand and          Pamilo P, Nei M. 1988. Relationship between gene trees and species
   orangutan origins. Nature 422:61–65.                                      trees. Mol Biol Evol. 5:568–583.
Chaimanee Y, Suteethorn V, Jintasakul P, Vidthayanon C, Marandat B,      Pond SLK, Frost SDW, Muse SV. 2005. HyPhy: hypothesis testing using
   Jaeger JJ. 2004. A new orang-utan relative from the Late Miocene of       phylogenies. Bioinformatics 21:676–679.
   Thailand. Nature 427:439–441.                                         Rannala B, Yang Z. 2003. Bayes estimation of species divergence times
Charlesworth B. 2009. Fundamental concepts in genetics: effective            and ancestral population sizes using DNA sequences from multiple
   population size and patterns of molecular evolution and variation.        loci. Genetics 164:1645–1656.
   Nat Rev Genet. 10:195–205.                                            Sarich VM, Wilson AC. 1967. Immunological time scale for hominid
Chen FC, Li WH. 2001. Genomic divergences between humans and                 evolution. Science 158:1200–1203.
   other hominoids and the effective population size of the              Satta Y, Klein J, Takahata N. 2000. DNA archives and our nearest rela-
   common ancestor of humans and chimpanzees. Am J Hum                       tive: the trichotomy problem revisited. Mol Phylogenet Evol. 14:
   Genet. 68:444–456.                                                        259–275.

46
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191                                                       MBE
Shumaker RW, Wich SA, Perkins L. 2008. Reproductive life history traits         Tajima F. 1983. Evolutionary relationship of DNA-sequences in finite
    of female orangutans (Pongo spp.). Interdiscip Top Gerontol. 36:                populations. Genetics 105:437–460.
    147–161.                                                                    Takahata N, Nei M. 1985. Gene genealogy and variance of interpopula-
Scally A, Dutheil JY, Hillier LW, et al. (71 co-authors). 2012. Insights into       tional nucleotide differences. Genetics 110:325–344.
    hominid evolution from the gorilla genome sequence. Nature 483:             Takahata N, Satta Y, Klein J. 1995. Divergence time and population size
    169–175.                                                                        in the lineage leading to modern humans. Theor Popul Biol. 48:
Springer MS, Meredith RW, Gatesy J, et al. (12 co-authors). 2012.                   198–221.
    Macroevolutionary dynamics and historical biogeography of pri-              Takai M, Anaya F. 1996. New specimens of the oldest fossil platyrrhine,
    mate diversification inferred from a species supermatrix. PLoS One              Branisella boliviana, from Salla, Bolivia. Am J Phys Anthropol. 99:
    7:e49521.                                                                       301–317.
Stewart CB, Disotell TR. 1998. Primate evolution—in and out of Africa.          Wu CI. 1991. Inferences of species phylogeny in relation to segregation
    Curr Biol. 8:R582–R588.                                                         of ancient polymorphisms. Genetics 127:429–435.
Stoneking M, Krause J. 2011. Learning about human population                    Yang Z. 2002. Likelihood and Bayes estimation of ancestral population
    history from ancient and modern genomes. Nat Rev Genet. 12:                     sizes in hominoids using data from multiple loci. Genetics 162:
    603–614.                                                                        1811–1823.

                                                                                                                                                          Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019

                                                                                                                                                    47
You can also read