The Effective Population Sizes of the Anthropoid Ancestors of the Human-Chimpanzee Lineage Provide Insights on the Historical Biogeography of the ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Effective Population Sizes of the Anthropoid Ancestors of the Human–Chimpanzee Lineage Provide Insights on the Historical Biogeography of the Great Apes Carlos G. Schrago*,1 1 Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil *Corresponding author: E-mail: carlos.schrago@gmail.com, guerra@biologia.ufrj.br. Associate editor: Asger Hobolth Abstract The recent development of methods that apply coalescent theory to phylogenetic problems has enabled the study of the Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 population-level phenomena that drove the diversification of anthropoid primates. Effective population size, Ne, is one of the main parameters that constitute the theoretical underpinning of these new analytical approaches. For this reason, the ancestral Ne of selected primate lineages has been thoroughly investigated. However, for some of these lineages, the estimates of ancestral Ne reported in several studies present significant variation. This is the case for the common ancestor of humans and chimpanzees. Moreover, several ancestral anthropoid lineages have been ignored in the studies conducted so far. Because Ne is fundamental to understand historic species demography, it is a crucial component of a complete description of the historical scenario of primate evolution. It also provides information that is helpful for dif- ferentiating between competing biogeographical hypotheses. In this study, the effective population sizes of the anthro- poid ancestors of the human–chimp lineage are inferred using data sets of coding and noncoding sequences. A general pattern of a serial decline of population sizes is found between the ancestral lineage of Anthropoidea and that of Homo and Pan. When the theoretical distribution of gene trees was derived from the parametric estimates obtained, it closely corresponded to the empirical frequency of inferred gene trees along the genome. The most abrupt decrease of Ne was found between the ancestors of all great apes and those of the African great apes alone. This suggests the occurrence of a genetic bottleneck during the evolution of Homininae, which corroborates the origin of African apes from a Eurasian ancestor. Key words: species tree, primate evolution, speciation, coalescence. Introduction amount of sequence data available at the time. The coming of the genomics era has renewed interest on the gene tree/ The molecular phylogenetics of anthropoids, especially of species tree debate, and several methodological tools have the lineage leading to humans and chimpanzees, has been recently been developed (Edwards et al. 2007; Liu et al. extensively studied since the birth of the field of molecular Article 2008, 2009; Edwards 2009). Most of these new methods rely evolution (Sarich and Wilson 1967). From the start, molecular data have challenged long-held assumptions on the chronol- on multispecies coalescence and, therefore, explicitly estimate ogy of anthropoid diversification and also presented a novel population parameters such as effective population sizes interpretation of the speciation process of our primate ances- (Rannala and Yang 2003; Heled and Drummond 2010). tors (Goodman 1967; Goodman et al. 1972). The theoretical In this sense, effective population sizes (Ne) are necessary developments that followed the introduction of coalescent to understand the probability of gene trees given a species theory in the early 1980s (Kingman 1982) were also immedi- tree because the distribution of gene trees relies on knowl- ately applied to the problem of anthropoid evolution (Tajima edge of the number of coalescent units in internal branches, 1983; Takahata and Nei 1985; Pamilo and Nei 1988). In which are measured in 2Ne generations (Degnan and Salter particular, the problem of the phylogenetic relationship of 2005). Thus, a deeper understanding of primate phylogenetics Fast Track the Homo, Pan, and Gorilla genera was an extensively analyzed relies on accurate estimates of population-level parameters. topic (Satta et al. 2000). It became clear that the topological With this aim, several analyses based on multiple loci as well inconsistencies found among individual gene trees are not as whole genomes have been conducted within the last few caused only by stochastic and systematic errors of gene tree years. The estimates obtained have exhibited a large variance reconstruction (Wu 1991; Hudson and Wu 1992). The actions among studies. For example, the effective population size of of population-level phenomena, such as incomplete lineage the ancestor of humans and chimpanzees has been inferred sorting, also result in differences between gene trees and the to be as low as 12,000 (Yang 2002) and as high as 96,000 species tree (Nei 1987). (Chen and Li 2001) (table 1). This discrepancy is frequently This observation was largely ignored by molecular phylo- a consequence of the absolute substitution rates or genera- geneticists in the 1990s, perhaps because of the limited tion times assumed when transforming the scaled effective ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com Mol. Biol. Evol. 31(1):37–47 doi:10.1093/molbev/mst191 Advance Access publication October 11, 2013 37
Schrago et al. . doi:10.1093/molbev/mst191 MBE Table 1. Published Estimates of the Population Parameters of the Homo/Pan Ancestor Using Multilocus Data. Study h or Ne s l or Calibration Point g Method Takahata et al. (1995) 0.005 4.6 (0.0046) 10 9 15 Likelihood Chen and Li (2001) 52,000–96,000 4.8–6.4 Orangutan divergence at 12–16 Ma 15–20 Topological mismatch Yang (2002) 12,000–21,000 5.2–4.9 10 9 20 Likelihood Rannala and Yang (2003) 0.002–0.0026 4.8 10 9 20 Bayesian Hobolth et al. (2007) 65,000 4.1 Orangutan divergence at 18 Ma 25 HMM Burgess and Yang (2008) 0.0061 0.0039 10 9 15 Bayesian Scally et al. (2012) 3.7 (5.5–7.0) 10 9 20–25 HMM Hobolth et al. (2011) 47,000 4 10 9 20 HMM Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 population size in the effective number of individuals Ne (fig. 1). The Gorilla divergence occurred at approximately (Langergraber et al. 2012). 9 Ma, preceded by Pongo (~18 Ma), Nomascus (20 Ma), and Aside from the common ancestor of Homo and Pan, the Macaca (28 Ma). The root node, the separation between cat- effective population size of the ancestor of ((Homo, Pan), arrhines and platyrrhines, was dated at approximately 54 Ma. Gorilla), that is, the African great apes, has also been investi- The mean substitution rate of the data set with all codon gated using multilocus approaches, and compared with the positions was estimated at 0.5 10 9 substitutions/site/year ancestor of Homo/Pan, the estimates are relatively more (s/s/y), that for third codon positions of coding sequences stable among studies (Rannala and Yang 2003; Hobolth was 0.9 10 9 s/s/y and for introns 1.0 10 9 s/s/y. The et al. 2007; Burgess and Yang 2008). Estimates of the effective ancestral state reconstruction of generation times indicated population size of the ancestor of both Asian and African that generation times increased gradually along the lineage great apes were also obtained using multiloci and genomic that led to humans and chimpanzees (fig. 1). The generation data (Rannala and Yang 2003; Hobolth et al. 2011). On the time of the Homo/Pan ancestor was 26.3 years. In the ((Homo, other hand, the ancestor of the hominoids, catarrhines, and Pan), Gorilla) ancestral population it was inferred to be anthropoids are rare (Burgess and Yang 2008). Such estimates 21.2 years, whereas the generation time of the ancestor of are necessary for the evaluation of primate phylogenetics in the great apes was estimated to be 15.2 years and that of the light of the multispecies coalescent process. In fact, phyloge- Nomascus/great apes ancestral lineage 13.5 years. The gener- netics is not the only research field that would gain insights ation times of the ancestral stock that gave rise to hominoids from the knowledge of ancestral effective population sizes. and cercopithecoids and to platyrrhines and catarrhines Other pattern-based areas of investigation, such as historical were inferred to be 10.5 and 8.7 years, respectively. The biogeography, could also potentially resolve conflicting lengths of the credibility intervals of the estimates of ancestral hypotheses using population-level parameters. For example, generation times were wide. As expected by a Brownian it was the union of phylogeny and evolutionary demography motion process, the variance of the estimates increased that elucidated the evolution of human populations from the shallowest to the root node of tree (fig. 2). (Novembre et al. 2008; Stoneking and Krause 2011). In this study, the effective population sizes of the lineages Ancestral Effective Population Sizes leading to humans and chimpanzees are estimated using inferred evolutionary rates and ancestral generation times. In the data set with all codon positions, the scaled effective This strategy avoids the adoption of simplistic values of stan- size of the population of the ancestor of Homo and Pan dard substitution rates and generation times. The evolution- was estimated to be 0.00142 for coding regions (table 2). ary demography of the anthropoid ancestors of the Homo/ The value rose to 0.00274 and 0.00457 with third codon Pan lineage exhibited a general pattern of a continuous positions and introns, respectively. This increase is expected, decrease of effective population sizes. The estimates of ances- as the rate of evolution is higher in third codon positions tral demography resulted in a close match between the and in introns. When multiplied by mean substitution rates expected and empirical distribution of gene trees along the and the inferred ancestral generation time (26.3 years), genome. Finally, the use of ancestral population sizes to shed the effective population size of the Homo/Pan ancestor light on the historical biogeography of the great apes is was calculated to range from 27,716 individuals using all discussed. codons of the coding region to 41,263 using only intron sequences (table 3). The average number of substitutions Results per site in sequences since the complete genetic isolation of the human and chimpanzee lineages was inferred Evolutionary Rates and Inference of Ancestral at 0.002 (complete coding regions), 0.0038 (third codon Generation Times positions), and 0.00363 (introns) (table 4), which corresponds The phylogenetic analysis implemented in BEAST rendered a to calendar times of 4.1 Ma (complete and third positions) timescale of anthropoid evolution that estimated the Homo/ and 3.6 Ma (introns) if the mean mutation rates are assumed Pan separation at approximately 6.5 Ma in all three data sets (table 5). 38
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191 MBE Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 FIG. 1. Chronogram of the evolution of selected anthropoids. The timescale is the average of the estimates inferred in BEAST from the three data sets studied. Bars on nodes are the 95% highest probability intervals of the age estimates. Numbers on nodes are the estimates of ancestral generation times, also performed in BEAST. The scaled effective population size of the ((Homo, Pan), hominoids and Nomascus and twice the size of the basal Gorilla) ancestor varied from 0.00215 (all) to 0.00405 (third) hominoid ancestor. Finally, the effective size of the anthropoid (table 2). When mutation rates and ancestral generation ancestral population, the root node, was inferred to be close times are used, the effective number of individuals was esti- to 1,000,000 individuals and the speciation between catar- mated at approximately 51,000 individuals in coding regions rhines and platyrrhines to have occurred between 31.4 and (all and third codon positions). In introns, this value decreased 39.0 Ma (table 5). to 32,346 (table 3). After the divergence from Homo/Pan, the A global evaluation of the estimates indicated that, in Gorilla lineage accumulated from 0.00284 (all) to 0.00579 the majority of cases, the 95% probability intervals of the (third) substitutions per site on average (table 4), which cor- posterior distributions of effective population sizes obtained responds to a time interval of approximately 6.0 Ma (table 5). from the three data sets presented overlaps (fig. 3). This is In the lineage ancestral to all great apes, the estimates of most evident for the estimate of the population size of the the scaled effective population size varied from 0.00677 (all) ancestor of great apes and Nomascus (fig. 3c). With the to 0.01721 (introns) (table 2), which corresponds to an effec- exception of the ancestral Ne of the ((Homo, Pan), Gorilla) tive size of 144,483 to 272,098 individuals (table 3). Speciation lineage (fig. 3e), third codon positions resulted in the lowest times between the Pongo lineage and other great apes varied estimates, followed by those obtained from the entire coding from 10.9 to 13.8 Ma (table 5). sequences. The only comparison that presented nonoverlap- Interestingly, the effective population size of the basal ping posterior distributions was the estimates from third hominoid lineage, which represents the ancestral stock of codon positions and introns of the ancestral Ne of anthro- the gibbon (Nomascus) and the great apes, was inferred poids (fig. 3a). to be smaller than that of the ancestor of the great apes alone. The scaled effective population size varied from 0.00297 to 0.00747 (table 2), which yielded from 108,333 to Theoretical Distribution of Gene Trees 133,014 individuals (table 3). Speciation times varied from The theoretical distribution of gene trees was obtained from a 14.7 to 16.9 Ma. species tree in which the ancestral branch of humans and For the ancestor of hominoids and cercopithecoids, the chimps measured 1.3 coalescent units. The ancestral branch effective size of the ancestral population at this branch was of the African great apes was 3.7 coalescent units and was estimated at 284,619 to 465,250 individuals (table 3), which is receded by branches with 0.7, 2.5, and 2.2 coalescent units for approximately three times larger than that of the ancestor of the ancestor of the great apes, the ancestor of hominoids and 39
Schrago et al. . doi:10.1093/molbev/mst191 MBE Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 FIG. 2. Posterior distributions of ancestral generation times inferred in BEAST. The solid black lines represent the data set of third codon positions of the coding sequences. The dashed black lines are the results from the data set with the entire coding sequence, and the solid gray lines represent the intron data set. (a) Ancestor of Callithrix/other anthropoids; (b) ancestor of Macaca/Hominoidea; (c) ancestor of Nomascus/great apes; (d) ancestor of Pongo/ African great apes; (e) ancestor of gorillas/humans–chimpanzees; and (f) ancestor of humans and chimpanzees. Table 2. Scaled Effective Population Size (h = 4Nelg) Obtained from Table 3. Effective Population Sizes (Ne), Assuming the Inferred the Three Data Sets Analyzed. Evolutionary Rates and Ancestral Generation Times. All Third Introns All Third Introns Homo/Pan 0.00142 ± 0.00016 0.00274 ± 0.00038 0.00457 ± 0.00028 Homo/Pan 27,716 ± 3,088 28,352 ± 3,929 41,263 ± 5,079 Gorilla 0.00215 ± 0.00010 0.00405 ± 0.00020 0.00311 ± 0.00010 Gorilla 51,948 ± 2,393 51,925 ± 2,598 32,346 ± 3,421 Pongo 0.00677 ± 0.00077 0.00807 ± 0.00111 0.01721 ± 0.00118 Pongo 228,204 ± 25,919 144,483 ± 19,937 272,098 ± 34,631 Nomascus 0.00297 ± 0.00014 0.00538 ± 0.00028 0.00747 ± 0.00022 Nomascus 112,774 ± 5,430 108,333 ± 5,560 133,014 ± 14,846 Macaca 0.00671 ± 0.00030 0.01098 ± 0.00055 0.01989 ± 0.00054 Macaca 327,601 ± 14,456 284,619 ± 14,267 465,250 ± 46,526 Callithrix 0.02007 ± 0.00058 0.02788 ± 0.00101 0.06499 ± 0.00120 Callithrix 1,182,362 ± 33,931 871,899 ± 31,705 1,595,221 ± 153,377 the ancestor of Catarrhini primates, respectively. Using the followed by the topology that clustered Nomascus with the parametric estimates obtained in BEAST and BPP, the most ((Homo, Pan), Gorilla) clade with the exclusion of Pongo likely gene tree presented the same topology as the species (11.7%). The observed frequencies of these topologies tree (table 6). This same topology was also the most fre- were 7.7% and 7.2%, respectively. The topologies associated quently recovered among the individual loci (30.7%). The with alternative phylogenetic relationships within the next four most likely gene trees also matched the empirically ((Homo, Pan), Gorilla) clade were equally likely (4.9%) estimated topologies. Apart from the correct topology, the based on theory. This was also observed in the empirical theoretical distribution predicted that the topology with the data, in which the frequencies of the topologies were 7% (Pongo, Nomascus) clade would be the most likely (11.8%), and 6.9%. 40
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191 MBE Table 4. Scaled Ages of the Genetic Isolation of Lineages (s = Tl). Discussion All Third Introns Our results showed that the effective population sizes of the Homo/Pan 0.00200 ± 0.00005 0.00380 ± 0.00013 0.00363 ± 0.00007 lineages leading to humans and chimpanzees exhibited a gen- Gorilla 0.00284 ± 0.00004 0.00552 ± 0.00010 0.00579 ± 0.00004 eral trend of sequential reduction of the effective number of Pongo 0.00230 ± 0.00012 0.01271 ± 0.00024 0.01091 ± 0.00012 individuals, from approximately 1,200,000 to 30,000 (fig. 4). Nomascus 0.00766 ± 0.00006 0.01549 ± 0.00014 0.01469 ± 0.00008 However, this trend was interrupted by a population expan- Macaca 0.01041 ± 0.00005 0.02218 ± 0.00023 0.02159 ± 0.00012 sion that occurred in the ancestral lineage of the great apes Callithrix 0.01532 ± 0.00004 0.03584 ± 0.00032 0.03377 ± 0.00019 (fig. 4). The basal anthropoid lineage, that is, the ancestor of platyrrhines and catarrhines, was inferred to be 3.4 times larger than the lineage that gave rise to extant catarrhines Table 5. Ages of Genetic Isolation (in Ma), Assuming l = 4.9 10 10 alone (the divergence between Macaca and hominoids), s/s/y for All Codon Positions, l = 9.0 10 10 s/s/y for Third Codon which was approximately three times larger than the subse- Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 Positions and l = 1.0 10 9 for Introns. quent ancestral population (the ancestor of Nomascus and All Third Introns the great apes). There was then an expansion of 1.8 times in Homo/Pan 4.1 ± 0.1 4.1 ± 0.1 3.6 ± 0.1 the ancestral lineage of the great apes followed by a drastic Gorilla 5.8 ± 0.1 6.0 ± 0.1 5.8 ± 0.1 reduction of 4.7 times for the ancestor of humans, chimps, Pongo 11.9 ± 0.1 13.8 ± 0.2 10.9 ± 0.1 and gorillas. Finally, the effective population size of the ances- Nomascus 15.7 ± 0.1 16.9 ± 0.1 14.7 ± 0.1 tral lineage of Homo/Pan was approximately 70% the size of Macaca 21.3 ± 0.1 24.1 ± 0.2 21.6 ± 0.1 the ((Homo, Pan), Gorilla) ancestral population. Callithrix 31.4 ± 0.2 39.0 ± 0.3 33.8 ± 0.2 Of the parameters analyzed in this study, the size of the ancestral population of the Homo/Pan lineage is by far the FIG. 3. Posterior distributions of absolute ancestral population sizes Ne, obtained using the inferred evolutionary rates and generation times. The solid black lines represent the data set of third codon positions of the coding sequences. The dashed black lines are the results from the data set with the entire coding sequence, and the solid gray lines represent the intron data set. (a) Ancestor of Callithrix/other anthropoids; (b) ancestor of Macaca/ Hominoidea; (c) ancestor of Nomascus/great apes; (d) ancestor of Pongo/African great apes; (e) ancestor of gorillas/humans–chimpanzees; and (f) ancestor of humans and chimpanzees. 41
Schrago et al. . doi:10.1093/molbev/mst191 MBE Table 6. Theoretical Probability of the Most Likely Gene Trees and Their Empirical Frequencies along the Loci Studied. Gene Tree Topology Theoretical Probability (%) Empirical Frequency (%) ((((((Homo, Pan), Gorilla), Pongo), Nomascus), Macaca), Callithrix) 46.1 30.7 (((((Homo, Pan), Gorilla), (Pongo, Nomascus)), Macaca), Callithrix) 11.8 7.7 ((((((Homo, Pan), Gorilla), Nomascus), Pongo), Macaca), Callithrix) 11.7 7.2 ((((((Homo, Gorilla), Pan), Pongo), Nomascus), Macaca), Callithrix) 4.9 7 ((((((Pan, Gorilla), Homo), Pongo), Nomascus), Macaca), Callithrix) 4.9 6.9 most studied. The parametric estimates obtained here, which coalescence times of the loci studied (Edwards and Beerli ranged from 27,000 to 41,000, are in agreement with previous 2000). Thus, it is possible that the estimates of times and Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 estimates of approximately 20,000 obtained by Rannala and rates were impacted by this difference. However, the sequen- Yang (2003) and 47,000 by Hobolth et al. (2011). As those tial reduction of the effective population sizes of anthropoids previous studies adopted different substitution rates and is actually independent of BEAST estimates, as the scaled generation times for the ancestor of humans and chimps, effective population sizes also displayed this trend (fig. 4). different parametric estimates are expected to be found. It might be argued that the times estimated in BEAST, The estimates of the effective population size of the ancestral which were used as priors for the coalescence analyses in lineage of humans, chimps, and gorillas were also similar to BPP, may have biased the estimates of . The large difference those obtained in the multilocus analyses of Yang (2002) and between the posterior distributions of s and their priors Rannala and Yang (2003) as well as the genomic analysis of obtained from BEAST indicates that the posteriors were Hobolth et al. (2007). Thus, it appears that the effective sizes robust to prior assumptions. of ancestral lineages within the Homininae have not under- When estimates of ancestral population sizes are con- gone drastic population changes. The estimates of Ne trasted with the historical biogeography of anthropoids, an obtained here for the ancestor of orangutans and the interesting association emerges. The drastic 5-fold decline in African great apes were very large. This is also corroborated population size between the ancestral lineage of the great by the recent analysis of Hobolth et al. (2011), which esti- apes and the ancestral lineage of the Homininae coincides mated this parameter at 187,000. Therefore, a substantial re- with a possible dispersal event from Eurasia to Africa in the duction of population size between the common ancestor of late Miocene. Such a hypothesis evidently assumes that the orangutans and African great apes and the ancestor of the geographical distribution of the ancestor of the great apes was Homininae is independently validated. Eurasian. This is supported by the recent analysis of Springer The estimates of the effective population size at the et al. (2012), which is corroborated by earlier studies (Stewart Gorilla/(Homo, Pan) ancestral branch are intriguing, because and Disotell 1998; Begun 2010). It has also been proposed that the coding sequences presented an absolute Ne estimate great apes originated in Africa and posteriorly dispersed to greater than the estimate obtained with intron sequences Asia (Moya-Sola et al. 1999). Although we cannot discard this (fig. 3e). This result appears inconsistent, because introns hypothesis, the estimates of ancestral population sizes make it are expected to be under relaxed selective pressures unlikely because a strong reduction of population size is not (Charlesworth 2009). Notice, however, that, in agreement expected if the ancestor of great apes was also African. with the theoretical expectation, the scaled effective popula- Moreover, because the Pongidae (represented in our analysis tion size, , of intron sequences at this branch was greater by Pongo) and the Hylobatidae (represented by Nomascus) than the value inferred for coding sequences (table 2). The are geographically distributed in Asia and the fossil record of theoretically anomalous estimate was then obtained when the apes extends from Europe to Asia (Fleagle 2013), it seems transforming into absolute Ne by employing an incorrect reasonable to assume that the ancestor of Homininae dis- evolutionary rate at this branch (generation times were fixed). persed from Eurasia to Africa. Other events of population In fact, the intron sequences used in this study were not decline/expansion are not easily linked to historical biography subjected to the molecular clock test because they were because of the absence of a clearer picture of the spatial expected to evolve neutrally. As a result, the variance of evo- evolution of anthropoids. lutionary rates along branches was greater in the intron data It is worth mentioning that, although the parametric set ( 2 = 0.009) when compared with coding sequences estimates obtained from the different data sets differed, the ( 2 = 0.006) and third codon positions ( 2 = 0.003). general scenario of a sequential reduction of Ne along the Therefore, it is likely that this unexpected result was due to lineage leading to humans and chimps was replicated the fact that the mean rate of the intron data set was not (fig. 4). This finding shows that the use of coding regions representative for every branch. to estimate past dynamics of population sizes of ancestral Regarding the inference of evolutionary rates and absolute species may not suffer from biases resulting from the action times in BEAST, it should be remarked that the absolute times of natural selection. The correspondence between the inferred are not equivalent to the speciation times (the frequency of the inferred gene trees and the theoretical dis- parameter), because they were estimated from the mean tribution also suggests that the general scenario proposed genetic distances among species, which are, in fact, mean here is a good approximation of the historical changes in 42
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191 MBE Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 FIG. 4. Evolutionary demography of Anthropoidea. The upper graph depicts the box plots of the posterior distribution of the scaled effective population sizes along the anthropoid lineages leading to humans and chimpanzees. The black box plots represent the data set of third codon positions; estimates from the entire coding sequence and from introns are shown in light and dark gray, respectively. The lower graph displays the chronogram of anthropoid evolution scaled by the speciation times . The width of the branches of the lineages leading to Homo/Pan is proportional to their respective Ne estimates, which are also shown alongside the branches. All Ne values were approximated. The geographical distributions of terminal taxons are also shown. For the sake of intelligibility, only the ancestral distribution of Homo was shown. 43
Schrago et al. . doi:10.1093/molbev/mst191 MBE FIG. 5. Summary of the methodological procedure used in this study. Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 effective population sizes. The theoretical distribution of gene phylogenetic method implemented in BEAST was used to trees assumes that each gene tree is an independent realiza- obtain prior distributions of node ages (million years ago, tion of the multicoalescent process (Degnan and Salter 2005). Ma) and evolutionary rates (in s/s/y) at branches. BEAST This assumption is not met in real data because genes may be was also used to infer the ancestral generation times (years) physically linked. Moreover, a comparison between empirical (step I). Step II consisted of the estimation of ancestral effec- and theoretical distributions of gene trees does not account tive population sizes in the BPP software using the multi- for the errors associated with gene tree inference, a problem species coalescent method of Rannala and Yang (2003). that is also found in methods that estimate ancestral effective Ancestral effective population size inference is performed population sizes using topological incongruence (Yang 2002). using evolutionary distances; therefore, Ne estimates are Thus, differences between theoretical and empirical frequen- scaled by the substitution rate per generation g – the cies are expected to occur. parameter ( = 4Neg). To obtain the effective number of As reported by Rannala and Yang (2003), the estimation individuals, Ne, absolute evolutionary rates per generation of ancestral Ne is complicated by the high correlation must be provided. A summary of the entire analysis is given between the scaled effective population size and speciation in figure 5. time . In this study, the correlation between those parame- ters was indeed high, varying from 0.8 between Homo/pan and Homo/pan (third codon positions) to 0.5 between Pongo and Pongo (all coding sequences). Such high correlation values Data Set could be alleviated by the adoption of strong prior distribu- A total of 9,225 orthologous genes of Homo, Pan, Gorilla, tions. However, this would also assume a large a priori knowl- Pongo, Macaca, Nomascus, and Callithrix were downloaded edge of the parameters we are interested in estimating. This from the OrthoMam database. Only coding sequences (CDS) approach sounds contradictory. This is why the posterior were downloaded from OrthoMam. Introns of the same distributions of branch lengths, obtained by the phylogenetic 9,225 genes were independently obtained by parsing the method, were used. genomes available in Ensembl using a Perl script. The estima- In conclusion, this study indicated that the lineage from tion of ancestral effective population sizes assumes that the ancestor of all anthropoid primates to the ancestor of evolutionary rates are homogeneous along the branches of humans and chimps has undergone a serial reduction of ef- the phylogeny, that is, the locus evolves under the molecular fective population sizes. This pattern is corroborated by the clock. Thus, alignments of all coding sequences were sub- frequency of tree topologies inferred from the coding regions jected to a likelihood ratio test (LRT) to verify which loci sampled across the primate genomes. Of the inferred past rejected the molecular clock hypothesis (Felsenstein 1988). population scenarios, the most drastic change was the ap- When testing for the molecular clock, the best fit evolutionary proximately 5-fold reduction in effective population size be- model was assigned to each alignment according to the tween the ancestor of the Asian and African great apes and model test implementation of HyPhy (Pond et al. 2005). the ancestor of African great apes alone, suggesting that the In coding sequences, only genes that failed to reject the Homininae diversified after a dispersal event from a Eurasian strict clock at 5% significance level were further analyzed ancestor. Finally, the results presented here demonstrate the (1,401 genes). An additional data set containing only third relevance of phylogenetic approaches that explicitly incorpo- codon positions of the clock-like genes was assembled. rate the coalescent process to investigate long-term evolu- This was done to reduce the effect of selection on sequences. tionary events. The alignments of intronic regions were not submitted to molecular clock tests. Thus, evolutionary parameters were Materials and Methods inferred for three data sets: 1) clock-like coding sequences, To obtain estimates of the absolute ancestral population sizes 2) third codon positions of clock-like coding sequences, and of Anthropoidea, a two-step strategy was adopted. First, the 3) introns of all genes analyzed. 44
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191 MBE Step I—Estimation of Node Ages, Evolutionary Rates, 2010), which assumes a Brownian diffusion model that ac- and Ancestral Generation Times commodates branch-specific variation in dispersal rates. Divergence time estimation was performed in BEAST v1.7.3 Thus, for example, instead of assuming that the generation (Drummond et al. 2012) using the Bayesian relaxed molecular time of the ancestor of Homo and Pan was a simple arith- clock. Clock-like genes were concatenated in supermatrices metic average between the values at the terminals (29.1 and with 1,673,634 bp (all codons) and 557,878 (third codon po- 24.63), the model weights the ancestral generation time sitions). The supermatrix of concatenated introns was inference by the rates associated with the both branches 4,417,078 bp long. In all supermatrices, the model of nucleo- that split from the ancestor node. Therefore, the use of tide substitution was the GTR + G4 + I model, which was arithmetic averages would imply equal probability of dis- chosen by LRT in HyPhy (Pond et al. 2005). The prior distri- persal of the ancestral generation time to the value of the bution of rates along branches was modeled by the uncorre- tips, while the relaxed random walk permits the dispersal of lated lognormal distribution, and the prior distribution of the ancestral value to be proportional to the rates associated Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 node ages was modeled by the Yule process. To approximate with each branch. the posterior density of parameters, BEAST relies on the Markov chain Monte Carlo (MCMC) algorithm. The MCMC was run for 50,000,000 generations, and chains were Step II—Estimation of Effective Population Sizes sampled every 1000th cycle. The analysis was run twice to The scaled effective population sizes of ancestral hominoids check for convergence. were estimated with BPP 2.1 software, which implements the method of Rannala and Yang (2003) and Burgess Calibration Information and Yang (2008). Coding sequences were analyzed individ- To calibrate the tree, four prior distributions were assigned ually. Introns of the same coding sequence were concate- to node ages. 1) The Homo/Pan split was calibrated by a nated to compose a single intronic sequence per gene. normal prior with mean = 8.25 Ma and standard deviation The decision of whether a locus was autosomal, mitochon- (SD) = 0.9. This prior was based on Benton and Donoghue drial or from the X or Y chromosome was based on the (2007); the mean was the average of the maximum and position of the marker in Homo sapiens. The variation of minimum ages for split, and the SD was set so as to contain substitution rates among loci was incorporated into the the minimum and maximum values within the 99% highest analysis by using relative rates, as implemented in Burgess probability density (HPD) interval. 2) The divergence and Yang (2008). between Pongo and the great apes was constrained by a The BPP software program also conducts a Bayesian gamma prior with shape = 15 and scale = 1, corresponding inference with a MCMC algorithm. Markov chains were to an HPD interval from 9.2 to 21.9 Ma. This calibration was run for 50,000,000 generations and sampled every 1,000th based on the fossil records of Lufengpithecus, Sivapithecus, cycle. The analysis was replicated to check for convergence. and Khoratpithecus from the Miocene of Asia (Hartwig BPP requires the definition of prior distributions for the age 2002; Chaimanee et al. 2003, 2004). A gamma prior was of the root (), that is, the age of the split between Callithrix used on the Pongo split because of the large uncertainty and catarrhines, and the parameter. The parameter is a associated with the maximum age of the node. 3) The measure of the speciation time per se, that is, the time of Cercopithecoid/Hominoid split, a normal prior with the separation between populations, and not the mean ge- mean = 28.45 and SD = 3.4, was entered based on Benton netic divergence between sister lineages, that is, the mean and Donoghue (2007) and followed the same rationale coalescence time of interspecies gene pairs, which is the applied to the Homo/Pan split. 4) Finally, the Platyrrhini/ genetic distance traditionally obtained in usual phylogenetic Catarrhini divergence was modeled by a normal prior with analyses (Edwards and Beerli 2000). Nevertheless, is also mean = 40 Ma and SD = 7 Ma. This is a flexible prior that measured in number of substitutions per site. Because the permits the age of the basal anthropoid to vary from mean number of substitutions per site in a given branch is 27 Ma, the age of Branisella boliviana, the earliest the product of the substitution rate per site per year and the Platyrrhini fossil (Takai and Anaya 1996), to the Paleocene time duration of the branch in years, the prior for was (~60 Ma)—the age of the early Primates (Fleagle 2013). obtained by calculating the product of the posterior distri- Reconstruction of Ancestral Generation Times butions of absolute substitution rates and the age of the The reconstruction of ancestral generation times was per- divergence between Callithrix and catarrhines (the root formed simultaneously with the divergence time estimation node) inferred in BEAST. This posterior distribution, the in BEAST. For the terminal taxa, the values were obtained result of the product of times and rates, was approximated from the literature. The generation times of Homo (29.1 by a gamma distribution using the MASS package of the years), Pan (24.63 years), and Gorilla (19.28 years) were set R programming environment. This strategy rendered the according to the recent work of Langergraber et al. (2012). following gamma priors: G ( = 464.06, = 17859.16) for The generation times of Pongo (15.4 years), Nomascus (8.4 the analysis of all codon positions, G ( = 918.32, years), Macaca (3.4 years), and Callithrix (1.4 years) were = 18,207.95) for third codon positions, and G ( = 963.71, borrowed from Shumaker et al. (2008) and Ernest (2003). = 16,199.36). It should be noted that the strategy used to Ancestral state reconstruction was performed using a mul- obtain the prior for did not actually measure speciation tivariate continuous relaxed random walk (Lemey et al. times; it is a measure of the mean coalescence time T of the 45
Schrago et al. . doi:10.1093/molbev/mst191 MBE genes (in s/s/y). The larger the ancestral effective population Degnan JH, Rosenberg NA. 2008. Gene tree discordance, phylogenetic size, the larger is the expected difference between T and . inference, and the multispecies coalescent. Trends Ecol Evol. 24: 332–340. In the absence of an informative prior for , the distribution Degnan JH, Salter LA. 2005. Gene tree distributions under the coalescent of the average coalescence time is the best guess from the process. Evolution 59:24–37. data available. In both data sets, a gamma distribution G Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayesian phylo- ( = 2, = 200) was assigned as prior for . genetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 29:1969–1973. Edwards S, Beerli P. 2000. Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic Calculation of Gene Tree Probabilities studies. Evolution 54:1839–1854. Edwards SV. 2009. Is a new and general theory of molecular systematics When the estimates of ancestral effective population sizes, emerging? Evolution 63:1–19. speciation times, and generation times are known, the cal- Ernest SKM. 2003. Life history characteristics of placental nonvolant culation of the probabilities of gene trees given the species mammals. Ecology 84:3402. Edwards SV, Liu L, Pearl DK. 2007. High-resolution species trees without Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 tree can be carried out (Degnan and Salter 2005). The the- concatenation. Proc Natl Acad Sci U S A. 104:5936–5941. oretical probability distribution of gene trees for a given Felsenstein J. 1988. Phylogenies from molecular sequences: inference species tree can then be compared with the collection of and reliability. Annu Rev Genet. 22:521–565. gene trees inferred from several loci across the genome. If Fleagle JG. 2013. Primate adaptation and evolution. San Diego: the estimates of ancestral population sizes are correct, we Academic Press. expect the theoretical distribution of gene trees to closely Goodman M. 1967. Deciphering primate phylogeny from macromolec- ular specificities. Am J Phys Anthropol. 26:255–276. match the empirically determined gene trees. The theoret- Goodman M, Barnabas J, Moore GW. 1972. Rates of molecular evolution ical distribution of gene trees was obtained using COAL 2.1, in primate phylogeny. Am J Phys Anthropol. 37:437. which requires as an input a species tree with internal Hartwig WC. 2002. The primate fossil record. New York: Cambridge branches measured in coalescent units (2N generations). University Press. To create such a tree, the time duration of the branches Heled J, Drummond AJ. 2010. Bayesian inference of species trees from multilocus data. Mol Biol Evol. 27:570–580. (in generations) must be divided by twice the effective pop- Hobolth A, Christensen OF, Mailund T, Schierup MH. 2007. Genomic ulation size (Degnan and Rosenberg 2008). Therefore, the relationships and speciation times of human, chimpanzee, and ancestral population sizes and speciation times previously gorilla inferred from a coalescent hidden Markov model. PLoS estimated in BPP, as well as the ancestral generation times, Genet. 3:294–304. were used. Effective population sizes and speciation times Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. 2011. Incomplete lineage sorting patterns among human, chimpanzee, were averaged across the three data sets. and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21:349–356. Acknowledgments Hudson RR, Wu CI. 1992. Gene trees, species trees and the segregation of ancestral alleles. Genetics 131:509–513. The author thanks the three anonymous reviewers and the Kingman JFC. 1982. On the genealogy of large populations. J Appl editor for comments that significantly improved the manu- Probab. 19A:27–43. script. This work was supported by the National Research Langergraber KE, Prufer K, Rowney C, et al. (20 co-authors). 2012. Generation times in wild chimpanzees and gorillas suggest earlier Council of Brazil (CNPq) grants 308147/2009-0 and 307982/ divergence times in great ape and human evolution. Proc Natl Acad 2012-2 and by FAPERJ grants 110.838/2010, 110.028/2011 and Sci U S A. 109:15716–15721. 111.831/2011 to C.G.S. Lemey P, Rambaut A, Welch JJ, Suchard MA. 2010. Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol. 27:1877–1885. References Liu L, Pearl DK, Brumfield RT, Edwards SV. 2008. Estimating species trees Begun DR. 2010. Miocene hominids and the origins of the African apes using multiple-allele DNA sequence data. Evolution 62:2080–2091. and humans. Annu Rev Anthropol. 39:67–84. Liu L, Yu L, Kubatko L, Pearl DK, Edwards SV. 2009. Coalescent methods Benton MJ, Donoghue PC. 2007. Paleontological evidence to date the for estimating phylogenetic trees. Mol Phylogenet Evol. 53:320–328. tree of life. Mol Biol Evol. 24:26–53. Moya-Sola S, Kohler M, Alba DM. 1999. Primate evolution—in and out Burgess R, Yang Z. 2008. Estimation of hominoid ancestral of Africa—Comments. Curr Biol. 9:R547–R548. population sizes under Bayesian coalescent models incorporating Nei M. 1987. Molecular evolutionary genetics. New York: Columbia mutation rate variation and sequencing errors. Mol Biol Evol. 25: University Press. 1979–1994. Novembre J, Johnson T, Bryc K, et al. (12 co-authors). 2008. Genes mirror Chaimanee Y, Jolly D, Benammi M, Tafforeau P, Duzer D, Moussa I, geography within Europe. Nature 456:98–101. Jaeger JJ. 2003. A middle Miocene hominoid from Thailand and Pamilo P, Nei M. 1988. Relationship between gene trees and species orangutan origins. Nature 422:61–65. trees. Mol Biol Evol. 5:568–583. Chaimanee Y, Suteethorn V, Jintasakul P, Vidthayanon C, Marandat B, Pond SLK, Frost SDW, Muse SV. 2005. HyPhy: hypothesis testing using Jaeger JJ. 2004. A new orang-utan relative from the Late Miocene of phylogenies. Bioinformatics 21:676–679. Thailand. Nature 427:439–441. Rannala B, Yang Z. 2003. Bayes estimation of species divergence times Charlesworth B. 2009. Fundamental concepts in genetics: effective and ancestral population sizes using DNA sequences from multiple population size and patterns of molecular evolution and variation. loci. Genetics 164:1645–1656. Nat Rev Genet. 10:195–205. Sarich VM, Wilson AC. 1967. Immunological time scale for hominid Chen FC, Li WH. 2001. Genomic divergences between humans and evolution. Science 158:1200–1203. other hominoids and the effective population size of the Satta Y, Klein J, Takahata N. 2000. DNA archives and our nearest rela- common ancestor of humans and chimpanzees. Am J Hum tive: the trichotomy problem revisited. Mol Phylogenet Evol. 14: Genet. 68:444–456. 259–275. 46
The Effective Population Sizes of Ancestral Anthropoids . doi:10.1093/molbev/mst191 MBE Shumaker RW, Wich SA, Perkins L. 2008. Reproductive life history traits Tajima F. 1983. Evolutionary relationship of DNA-sequences in finite of female orangutans (Pongo spp.). Interdiscip Top Gerontol. 36: populations. Genetics 105:437–460. 147–161. Takahata N, Nei M. 1985. Gene genealogy and variance of interpopula- Scally A, Dutheil JY, Hillier LW, et al. (71 co-authors). 2012. Insights into tional nucleotide differences. Genetics 110:325–344. hominid evolution from the gorilla genome sequence. Nature 483: Takahata N, Satta Y, Klein J. 1995. Divergence time and population size 169–175. in the lineage leading to modern humans. Theor Popul Biol. 48: Springer MS, Meredith RW, Gatesy J, et al. (12 co-authors). 2012. 198–221. Macroevolutionary dynamics and historical biogeography of pri- Takai M, Anaya F. 1996. New specimens of the oldest fossil platyrrhine, mate diversification inferred from a species supermatrix. PLoS One Branisella boliviana, from Salla, Bolivia. Am J Phys Anthropol. 99: 7:e49521. 301–317. Stewart CB, Disotell TR. 1998. Primate evolution—in and out of Africa. Wu CI. 1991. Inferences of species phylogeny in relation to segregation Curr Biol. 8:R582–R588. of ancient polymorphisms. Genetics 127:429–435. Stoneking M, Krause J. 2011. Learning about human population Yang Z. 2002. Likelihood and Bayes estimation of ancestral population history from ancient and modern genomes. Nat Rev Genet. 12: sizes in hominoids using data from multiple loci. Genetics 162: 603–614. 1811–1823. Downloaded from https://academic.oup.com/mbe/article-abstract/31/1/37/1049819 by guest on 10 January 2019 47
You can also read