Reconstructing the Human Genetic History of Mainland Southeast Asia: Insights from Genome-Wide Data from Thailand and Laos
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Reconstructing the Human Genetic History of Mainland Southeast Asia: Insights from Genome-Wide Data from Thailand and Laos Wibhu Kutanan *,†,1 Dang Liu †,2 Jatupol Kampuansai,3,4 Metawee Srikummool,5 Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 Suparat Srithawong,1 Rasmi Shoocongdej,6 Sukrit Sangkhano,7 Sukhum Ruangchai,8 Pittayawat Pittayaporn,9 Leonardo Arias 2,10 and Mark Stoneking*,2 1 Department of Biology, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand 2 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany 3 Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand 4 Research Center in Bioresources for Agriculture, Industry and Medicine, Chiang Mai University, Chiang Mai, Thailand 5 Department of Biochemistry, Faculty of Medical Science, Naresuan University, Phitsanulok, Thailand 6 Department of Archaeology, Faculty of Archaeology, Silpakorn University, Bangkok, Thailand 7 School of Public Health, Walailak University, Nakhon Si Thammarat, Thailand 8 Department of Physics, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand 9 Department of Linguistics and Southeast Asian Linguistics Research Unit, Faculty of Arts, Chulalongkorn University, Bangkok, Thailand 10 Centre for Linguistics, Faculty of Humanities, Leiden University, Leiden, The Netherlands † These two authors are co-first authors and contributed equally to this work. *Corresponding authors: E-mails: wibhu@kku.ac.th; stoneking@eva.mpg.de. Associate editor: Bing Su Abstract Thailand and Laos, located in the center of Mainland Southeast Asia (MSEA), harbor diverse ethnolinguistic groups encompassing all five language families of MSEA: Tai-Kadai (TK), Austroasiatic (AA), Sino-Tibetan (ST), Hmong-Mien (HM), and Austronesian (AN). Previous genetic studies of Thai/Lao populations have focused almost exclusively on uniparental markers and there is a paucity of genome-wide studies. We therefore generated genome-wide SNP data for 33 ethnolinguistic groups, belonging to the five MSEA language families from Thailand and Laos, and analyzed these together with data from modern Asian populations and SEA ancient samples. Overall, we find genetic structure according to language family, albeit with heterogeneity in the AA-, HM-, and ST-speaking groups, and in the hill tribes, Article that reflects both population interactions and genetic drift. For the TK speaking groups, we find localized genetic structure that is driven by different levels of interaction with other groups in the same geographic region. Several Thai groups exhibit admixture from South Asia, which we date to 600–1000 years ago, corresponding to a time of intensive international trade networks that had a major cultural impact on Thailand. An AN group from Southern Thailand shows both South Asian admixture as well as overall affinities with AA-speaking groups in the region, suggesting an impact of cultural diffusion. Overall, we provide the first detailed insights into the genetic profiles of Thai/Lao ethnolinguistic groups, which should be helpful for reconstructing human genetic history in MSEA and selecting pop- ulations for participation in ongoing whole genome sequence and biomedical studies. Key words: genome-wide, Mainland Southeast Asia, population interaction, South Asian admixture, cultural diffusion. Introduction Laos are in the center of MSEA and are characterized by a Mainland Southeast Asia (MSEA), consisting of Myanmar, diverse landscape involving highlands and lowlands, long Cambodia, Vietnam, western Malaysia, Laos, and Thailand, coastlines, and many rivers. North-versus-south movements is a region of enormous diversity, with a population of are facilitated by several rivers, including the Mekong, Chao 263 million people speaking 229 languages belonging to Phraya, and Salaween which are considered to be a key factor five major language families: Tai-Kadai (TK), Austroasiatic for population movement from southern China and upper (AA), Sino-Tibetan (ST), Hmong-Mien (HM), and MSEA to lower MSEA. In addition, the Malay Peninsula to the Austronesian (AN) (Eberhard et al. 2020). Thailand and south acts as a crossroad, facilitating east-versus-west ß The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Open Access Mol. Biol. Evol. 38(8):3459–3477 doi:10.1093/molbev/msab124 Advance Access publication April 27, 2021 3459
Kutanan et al. . doi:10.1093/molbev/msab124 MBE movement by sea and by the narrow width of the Kra (Higham and Thodsarat 2012; Higham 2014). Moreover, the Isthmus (the narrowest part of the Malay Peninsula). HM- and ST-speaking hill tribes in the mountainous areas of The geographic heterogeneity of Thailand and Laos is northern Thailand, northern Myanmar, northern Laos, and reflected in the ethnolinguistic diversity of the region. There southern China migrated to the region during historical times, are 68.6 million people in Thailand and 6.8 million in Laos, 200 years ago (ya) (Schliesinger 2000; Penth and Forbes Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 speaking 159 languages belonging to all five major MSEA 2004). Taken together, the archaeological and linguistic evi- language families (Eberhard et al. 2020). TK languages are dence suggest a complex population structure and history of widespread in southern China and MSEA and are quite prev- the ethnolinguistic groups of Thailand and Laos. alent in present-day Thailand and Laos, spoken by 89.4% of This population structure and history remains largely Thais and 65.7% of Laotians. The major TK speaking groups in unexplored by genetic studies, which have almost exclusively northern, northeastern, central, and southern Thailand are analyzed autosomal short tandem repeat (STR) loci, and mi- known as Khonmueang, Lao Isan, Central Thai, and tochondrial DNA (mtDNA), and male specific Y chromosome Southern Thai or Khon Tai, respectively (Eberhard et al. (MSY) sequences. These studies revealed the relative genetic 2020). AA languages are next in predominance, spoken by heterogeneity of the AA groups and homogeneity of TK 4.0% of Thais and 26.2% of Laotians. In addition, this area is groups (Kutanan et al. 2014, 2017, 2019; Srithawong et al. also inhabited by historical migrants who speak ST, HM, and 2015, 2020; Kampuansai et al. 2017, 2020) and contrasting AN languages (frequencies of 3.2%, 0.2%, and 2.8%, respec- male and female genetic histories in the region, especially for tively, in Thailand; and 2.9%, 4.7%, and 0% in Laos) (Eberhard the matrilocal versus patrilocal hill tribes (Oota et al. 2001; et al. 2020). The AA, HM, and ST languages are spoken mainly Besaggio et al. 2007; Kutanan et al. 2018a, 2019, 2020). While by highlanders (the hill tribes) in northern and western genome-wide data provide much richer insights into popu- Thailand, and in midland and upland regions in Laos, al- lation structure and genetic history, previous genome-wide though AA languages are also spoken by some lowland studies of Thai/Lao populations are either primarily from groups, for example, the Mon. AN-speaking groups, such as northern populations (HUGO Pan-Asian SNP Consortium the Thai Malay (SouthernThai_AN), are distributed in the 2009; Xu et al. 2010; Lipson et al. 2018) or do not provide Southern Provinces of Thailand, bordering with Malaysia. any information on ethnolinguistic background Archaeological records document a long history of human (Wangkumhang et al. 2013; Lazaridis et al. 2014). Therefore, occupation of the area, with modern human remains dated we here generated genome-wide SNP data for 452 individuals to 46–63 thousand years ago (kya) in northern Laos (Demeter from 33 ethnolinguistic groups from Thailand and Laos, in- et al. 2012). In addition, cultural remains of SEA hunter- cluding two southern Thai groups that have not been in- gatherers (e.g., flake stone tools of the Hoabınhian culture) volved in any previous genetic studies, speaking languages have been found in northern Thailand dating to 35–40 kya that encompass all five language families in MSEA. We ana- (Shoocondej 2006), and in southern Thailand dating to 27–38 lyzed the allele and haplotype sharing within and between the kya (Anderson 1990). The transition from a hunter-gatherer Thai/Lao groups and compared them with both modern tradition to a Neolithic agricultural lifestyle occurs 4 kya all Asian populations and nearby SEA ancient samples. Our across Thailand and Laos (Higham and Thodsarat 2012; results provide several new insights into the genetic prehis- Higham 2014); agriculture in MSEA probably has its origins tory of MSEA through the lens of populations from Thailand in the valley of the Yangtze River in China (Higham and and Laos. Thodsarat 2012), and ancient DNA evidence indicates that present-day AA speaking groups in MSEA are most closely related to Neolithic agricultural communities (McColl et al. Results and Discussions 2018; Lipson et al. 2018). Overview of Genetic Structure and Allele Sharing However, the common languages shared by Thais and We generated genome-wide SNP data for 452 individuals Laotians are TK languages, not AA languages. The origin of from 32 populations from Thailand and 1 population from the TK languages is thought to be in what is now southern or Laos; when combined with previously published data from 3 southeastern China, and they probably spread to MSEA dur- Thai populations (Lipson et al. 2018; Lazaridis et al. 2014), ing the Iron Age (Pittayaporn 2014). Whether the spread of there are 482 Thai/Lao samples belonging to 36 populations TK languages occurred via demic diffusion (an expansion of (fig. 1). We also merged our data with data from modern people that brought both their genes and their language) or Asian populations generated on the same platform and SEA cultural diffusion (language spread with at most minor move- ancient samples (supplementary table 1, Supplementary ment of people) has been debated (Sangvichien 1966; Material online; supplementary fig. 1, Supplementary Nakbunlung 1994; Pittayaporn 2014). Previous genetic studies Material online). We began with principal component anal- of uniparental lineages have generally supported demic diffu- ysis (PCA) to investigate the overall population structure of sion for the maternal side but cultural diffusion from the AA the merged data set and identify any outliers (supplementary people for the paternal side for major Thai/Lao TK groups fig. 2, Supplementary Material online). After outliers were (Kutanan et al. 2017, 2018b, 2019). Archaeological evidence removed, PC1 separates South Asian (SA) from East Asian suggests other population contacts in the region, for example, (EA) groups, with the Kharia (#44), Onge (#45), and Uygur objects from India that appear during the late Bronze Age and (#65) located in between (fig. 2A; supplementary fig. 3, Iron Age and involve the AA-speaking Khmer and Mon Supplementary Material online). PC2 separates Northeast 3460
Genome-Wide Data of Thai and Lao Populations . doi:10.1093/molbev/msab124 MBE Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 FIG. 1. Map showing the location of the 36 Thai/Lao ethnolinguistic groups analyzed in this study, color-coded according to language family. Asian (NEA) groups from SEA groups. With respect to the #34) and SouthernThai_TK (#35), as well as the previously major MSEA linguistic groups, ST and HM groups are gener- published Thai-HO (#36; this population is from the Human ally separated from the AA, TK, and AN groups on PC2, while Origins data set of Lazaridis et al. 2014, with no further details the latter three overlap one another (fig. 2B). Exceptionally, available), Mamanwa (#46) and Cambodian (#51), all show the Karen speaking ST groups (Karen_ST; #7–9) also overlap additional affinity toward the SA populations (fig. 2A and B). the AA, TK, and AN groups (fig. 2B), while the ST-speaking Interestingly, the AN-speaking group from Thailand Lahu from Thailand (#6) and China (#56) and the HM- (SouthernThai_AN; #4), is not close to the AN groups from speaking IuMien (#3) are grouped with the AA-speaking Taiwan (Amis and Atayal) or Indonesia (Semende and Kinh (#52) and close to the northern Thai TK groups Borneo; #47-48), but rather they are near the AN-speaking (N_TK; #21–26). Strikingly, four Thai groups from this study, Negrito group Mamanwa (#46) from the Philippines, and the that is, the AA-speaking Mon (Monic_AA; #20), AN-speaking Monic_AA, C_TK and S_TK groups. When the PCA was SouthernThai_AN (#4), and TK-speaking CentralThai (C_TK; performed on only SEA individuals, four poles were observed: 3461
Kutanan et al. . doi:10.1093/molbev/msab124 MBE A 0.06 6464 67 67 64 63 64 63 63 64 B 61 61 61 61 6161 6161 6161 57 59 5959 59 59 59 59 59 64 66 66 66 66 5959 59 67 66 66 66 63 64 58 59 59 59 66 66 6463 64 63 67 57 5858 58 58 5958 59 5959 59 62 63 59 59 65 62 62 67 62 6263 6268 63 57 57 58 5757 57 60 60 62 67 67 67 62 6267 686868 68 68 68 68 68 6868 68 68 58 58 57 57 60 60 59 65 675768 68 68 68 68 68 57 5959 60 65 65 61 61 5959 61 61 61 6161 59 59 60 55 55 59 60 5955 59 65 65 65 61 61 61 59 595959 59 0.025 60555555 59 65 65 65 Northeast Asian 58 5959 60 55 55 60 5455 59 5758 58 5858 59 585959 5959 595954 545455 54 59 57 59 5 54 5459 2 5454 58 57 60660 281 0.03 5857 57 60 60 57 0 59 59 5 5 2131212 2 22 55 15959 59 50 3 11 Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 60 55 59 5555 60 5959 5 5959 159 55 55 55 60 59 55 595959 1 59 60 59 59 55 55 55 60 5455 2 3 54 59 54 54 5 54 54 54 59 221222 28 12 51 56 3 55 2 55 59 1 5913 11 2 20 36 1212 12 56 49 50 49 4950 5 5059 59 1 59 59 311 59 20 12 12 12 566 5249 50 3 5049 49 50 50 49 50 50 49 50 40 39 55 59 1 20 2020 20 20 12 52 56 49 4039 39 39 233 0.000 202020 20 20 20 20 20 20 35 46 46 46 52 56 52 52 3 53 49 52 40 39 403939 40 3939 51 56 20 36 65252 339 9 2020 35 20 46 24 5 56 624 53 PC2 40 12 2020 24 24 24 6 6 PC2 39 39 36 20 35 46 2424 25 622 56 53 6 25 6 3940 40 4040 40 40 4040 20 20 12 5649 12 6 49 49 50 49 20 2020 20 20 2046 34 34 2425 25 252125 22 53 6 3 56 26 53 22 63 65 653 22 53 22 25 26 53 53 40 So 12 56 50 49 50 50 50 4646 46 46 20 35 34 34 46 25 25 25 88 1125 22 325 1526256 256 25 22 262226 26 4040 40 20 12 12 52 5652 350 4949 50 49 49 35 4620 46 20 25 34 11 88810 23 21 6212225 0.00 40 41 uth 20 2020 20 2020 20 20 35 12 4646 5252 46 52 52 353 20 35464646 3446 34 4620 20 2036 20 20 20 24 810910 910 11 878 21 25 25 28 26 2326 2323282827 4141 41 41 4141 41 41414243 As 20 202020202020 36 24 20 4624 24 25622 56 52 5 56 22 56 53 6 24 25 66 3435 36 34 3420 20 36 36 34 51 25 11 7711 723 11 9910 9 23 810 11 23 62328 28 23 27 27 2628 28 41 41 42 4242 42 43 ian 20 20 20 35 20 35 20 20 46 3434 46 24 2424 2524 25 366 53 22 53 25 22 26 625 56 26 53 53 22 34 35 2020 20 11 10 1177 28 921 13 47 134736 2128 947 713 2728 28 27 28 23 42 43 424 4243 42343 20 46 46 4646 35 35 34 25342525 3446 25 2221 825 811 21 2525 36 15 25 26 6 25 653 22 26 26 22 22 35 35 3434 34 5134 7721 34 47 4747 47 28 47 27 43 43 42 42 4343 4343 43 43 46 4646 20 3446 20 46 20 20 36 20 20 2410 11 88 11 10 98 8 21 252325 21 26 2327 26 23 28 35 35 4 4 35 20 34 9 13 13 13 30 28 282827 2035 46 4634 2020 810 25 628 28 20 20 13 43 42 3536 34 36 34 342020 36 20 34 51 25 11 10 11 23 10 1197 711 7 10 10 8923 7 9 2813 28 23 21 23 27 23 27 28 28 26 28 28 27 35 44 4 4 4 36 34 4 51 1047 13 29 7 13 28 32 3535 34 3420 34 3520 34 34 11 7 34921 51 13 47 2113947 747 74736 47 47 13 4727 2823 28 27 4 20 29 34 29 2129 29 29 3232 2732 30 33 28 32 30 30322832 32 44 45 354 4 35 35 2034 35 44 436 202034 5110 13 47 13 13 13 13 28 30 28 28 3227 4 4 44 4 2911 2929 29 292929 29 31 31 28312830 32 32 32 32 32 44 45 4545 45 45 4 4 4 20 2929 29 21 29 29 727 32 28 32 30 3028 32 32 28 33 −0.025 4 29 29 10 28 29 31 33 2828 2831 3132 45 45 45 29 30 32 32 3032 32 51 28 31 4 4444 34 11 29 2929 29 29 29 29 29 31 31 28 31 2832 32 51 33 29 31 30 1414 31 3328 20 44 4444 2910 28 29 28 31 28 33 32 51 29 48 33 48 33 4833 44 44 44 44 4444 44 5151 332931 28 31 30 31 14 33 1428 51 483018 48 4848 48 1414 1414 14 −0.03 44 51 2948 33 48 33 33 48 33 4833 14 27 44 30 14 14 14 51 3328 30 15 51 48 48 48 48 18 33 14 30 33 48 3027 14 51 Eigenvalue 51 30 33 28 18 18 18 16 14 51 18 14 18 1616 10 Southeast Asian 1816 1818 16 181818 16 16 16 18 18 1816 16 16 16 181819 19 1815 16 19 18 18 18 19 19 18 19 19 18 5 1919 118 19158 16 19 17 17 17 17 15 15 1716 1617 17 17 15 17 1517 19 3119 1919 15 17 1717 19 31 19 19 19 15 15 17 17 −0.050 19 15 15 17 15 17 0 18 15 15 15 18 151515 14 14 1 2 3 4 5 6 7 8 9 10 1516 15 16 −0.06 PC −0.075 −0.050 −0.025 0.000 −0.01 0.00 0.01 0.02 C PC1 PC1 Thai/Lao populations Comparative modern populations Comparative ancient populations CT LF SG K=5 K=6 H Hmo IumongD So ie gNaw Lis the---- a------ La u- rnT--3 -------1 Ka --------- i_A ----2 Ka en -------- N-- KarenPw --------- 4 La re Pa o-- -------- LawanSkda ---- -6 -5 Pawa_Waw ng----- B lau_E est---- ---- ---- Khlangngasteern---- --8 -7 H mu-------- rn----- ---- H inP ------------- ----10 9 M inM ay- -------- ---- ---- Solab al- ---- -------- 12 11 Br --- --- --- --- 14 -1 M --- ------------- ---- Y Kuhan Phue---- Shuan------- Khann----------- Lu ue 24 --- ------ Blae-- an ----22---2 La ck---- g-- ---- 1 La Ph n- N uta ---- Saaw i--- ----- Ka k-- -------- ---- C lue --------------- 29 So lT ---- ---- ---3---3 Th the i--- ---- ---- M ai- rn --- ---- --3 Frbuti---- ha ---- 33 2 Br nc ----- i_T ---3 G min Lo ati- iwa Vis i- M wa ---4 ----- 9 Kh --- rah O ria- ---- in-- M ge ----- ---- --42 Se anw---- ---- ---- Bome a-- --- --4 3 Am At is- o--e--- ---- ----- C aya ------------- 46 45 Kinmbl------------------ D h od --- --- --- ---- M -- -- n -- --- 48 7 Shiao--------- --------- ---- La e-- ---- -------- ---- 50 -49 Yi-hu --- ---- --- ---- ---- N ----_C ---- ---- ------5 -51 Haxi- -------- ---- ---------2 Tu Tujia- Xib---- ---- H o- --- --- Oezh---- ---- ---- U roq en ---- ---- ---- D guen- -------- -----60 M ur --- --- --- --- --- Ja ng---- --------------62-61 H H Pha N Gu ae N ua ha N N am N N on N am aiC N N ak BA IA IA atK H H on H up aiC t t r - y en a -- -- -- a ai- -- ia -- -- -- -4 an -- -- --- -- -5 -5 y a o- o- F -G aC n -M C C -N -M T -H -T H -T P o -O -L aie i-H i-S H i-K u o uja _ n Mn on --- -- -- --1 -1 ala b 1 am --- -- -- o - - - e -V -L o ah h u - -- - - pa ola--- --- --- --- hu---- h --- ot Ta --- --- -23 oI dh ---- ri- u n ju -- r a- ri-- ---- ---- -------- on--- --- ------- e -- -- u ha -- -- 1 0 a --- m rn nd --- ---- 4 -N h -- am aL Tie oy tra ng --- --- --- --- -- -- -- --- 4 3 on m an ha av aiD un --- --- --- ---- 6 5 ia i- --- --- sa in Hu Tie r T e ne -- --- --- --6 --6 m r - - uiN an - n- --- -- ab ju n --- --- --- --- --- H ing n --- --- --- --- gL no - Ba Ca e se -------- ---- 4 3 --- --- --- -- aD - -- -- - - - a --- -- --- 5 a - gU - - --- --- --- --5 -5 at ng ap --- --- --1 --1 - on u T ng c ve --- --- --- --- - --- --- --- ie ag --- --- --- 6 5 --- --- 8 7 u - - - - ju - - --- --- gR --- K- --- --- -6 65 - u - - --- --- --- - an ng - --- --- --- --- --- - 3 - --- -35 --- --- 6 - - ak - - --- --- --- - --- -58 -57 -- C --- --- - - --- --- --- - --- -27 26 av - - --- - -2 -19 - - - --- - --- 68 67 2 --- e 0 --- --4 --3 -2 4 59 4 0 3 6 8 Country (CT) Language family (LF) Subgroup (SG) Thailand Philippines China Hmong−Mien Central Sudanic Turkic Hmong_HM N_TK Laos Indonesia Mongolia Austronesian Indo−European Mongolic Karen_ST NE_TK DR Congo Taiwan Japan Sino−Tibetan Dravidian Japonic Palaungic_AA C_TK France Cambodia Malaysia Austroasiatic Andamanese Khmu_Katu_AA S_TK India Vietnam Myanmar Tai−Kadai Tungusic Monic_AA Other FIG. 2. Population structure analyses. (A) Plot of PC1 versus PC2 for the SNP data for individuals from South Asia, Northeast Asia, and Southeast Asia. Individuals are numbered according to population, as indicated in supplementary table 1, Supplementary Material online and in the population labels in panel (C). Thai/Lao groups are colored by language family according to the key at the bottom of panel (C) while other groups are in gray (see supplementary fig. 3, Supplementary Material online for the same PC plot with all samples colored by country and by language family). The eigenvalues from PC1 to PC10 are shown on the bottom left side. (B) Plot focusing on Southeast Asian and Chinese populations speaking AA, AN, HM, ST, and TK languages, zoomed-in from (A). Thai/Lao groups are colored according to subgroup while other groups are in grey. (C) ADMIXTURE results for K ¼ 5 and K ¼ 6. Each individual is represented by a bar, which is partitioned into K colored segments that represent the individual’s estimated membership fractions in each of the K ancestry components. Populations are separated by black lines for modern populations and excavation sites and time periods are separated by black lines for ancient samples. The three colored bars at the top of the plot indicate the country (top), language family (middle), and subgroup (bottom) for each sample, according to the key at the bottom. The PCA analysis was performed on the pruned data set of 842 individuals and 153,191 SNPs, while the ADMIXTURE analysis was performed on the pruned data set of 895 individuals (including 10 Mbuti, 10 French, and 33 ancient individuals) and 158,772 SNPs; the highly drifted modern populations (Onge, Mlabri, and Mamanwa) and ancient samples were projected in ADMIXTURE analyses (see PCA with ancient samples projected in supplementary fig. 3, Supplementary Material online). those groups showing additional affinity to SA populations; Amis and Atayal from Taiwan, and a blue component the Khmuic/Katuic AA speaking groups (Khmu_Katu_AA); enriched in Khmu_Katu_AA groups from Thailand. Most the Lahu ST speaking groups; and the HM groups; which of the Thai/Lao TK-speaking groups show two major sources implies additional admixture or drift has happened in these (black and blue) with the purple component as a minor groups relative to the other SEA groups supplementary fig. 4, source, except that the C_TK and S_TK groups and Thai- Supplementary Material online). HO have a substantial fraction of the pink component, as do We then performed ADMIXTURE analysis to further in- the Monic_AA and Southern Thai_AN. This indication of vestigate population structure. The lowest cross validation potential relatedness with SA groups is consistent with the error occurred at K ¼ 5 and K ¼ 6 (supplementary fig. 5, PCA results (fig. 2A and B). At K ¼ 6, there appears a green Supplementary Material online); corresponding results are component that separates French from South Asian popula- shown in fig. 2C. For K ¼ 5, there is a brown component tions (fig. 2C). This green component substantially reduces associated with Mbuti, a pink component appearing in the pink component in the NEA groups but has a negligible French and Indian groups, a purple component enriched in effect on the SA-related Thai groups. Although increasing K NEA groups, a black component dominant in AN-speaking values are associated with higher cross-validation errors, the 3462
Genome-Wide Data of Thai and Lao Populations . doi:10.1093/molbev/msab124 MBE additional new components reveal additional population the Khmu_Katu_AA and NortheasternThai_TK (NE_TK) structure (supplementary fig. 6, Supplementary Material on- groups, but N-Oakaie shares more with the ST-speaking line). At K ¼ 7, 8, and 9, the Lahu from Thailand and China, Lisu and Lahu groups and HM-speaking Hmong and the HM-speaking Hmong (Hmong_HM), and Karen_ST IuMien groups. The Iron Age samples show overall less groups from Thailand are enriched for their own sources, allele-sharing with Thai/Lao groups, whereas the Bronze Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 respectively. At K ¼ 11, the Soa and Bru (Katuic speaking Age and historical samples from Vietnam and Malaysia populations of the Khmu_Katu_AA group) stand out with show higher sharing with the Thai/Lao TK and HM groups, a light brown component. in agreement with the ADMIXTURE results (fig. 2C). Our To analyze population relationships based on allele shar- results support previous findings (Lipson et al. 2018; McColl ing, we calculated outgroup f3-statistics of the form f3(X, Y; et al. 2018): the Hoabınhian samples are genetically related to outgroup) that measure the shared drift between populations Andamanese Onge: the Neolithic samples share ancestry with X and Y since their divergence from the outgroup (Mbuti). the AA populations (except for the N-Oakaie sample from Higher outgroup f3 values indicate more shared drift between Myanmar, which shares ancestry with ST speaking popula- populations. The SouthernThai_AN, Monic_AA, C_TK, and tions); and most Bronze/Iron Age samples are genetically re- S_TK groups and Thai-HO exhibit the lowest f3-values with lated to both AN and TK speaking populations. However, the other populations/ancient samples and also with each other inclusion of many more ethnolinguistic groups in our study (fig. 3), while the HM speaking populations show the stron- brings additional insights, for example, not all AA populations gest sharing with each other. TK populations exhibit close (Mon and Palaung) are equally related to Neolithic samples, genetic affinity with each other, except for the C_TK, S_TK, suggesting genetic heterogeneity and the complexity of SEA and Thai-HO groups, and also share alleles with the HM prehistory. speaking populations, consistent with results of the Based on the overview provided by the PCA, ADMIXTURE, ADMIXTURE analysis at K ¼ 8 (supplementary fig. 6, and outgroup f3 results, we focus on the following aspects of Supplementary Material online). There is higher sharing be- the data: genetic structure and heterogeneity of Austroasiatic tween the Thai/Lao groups and other SEA and southern speaking groups; genetic structure of the hill tribes; differences Chinese groups (i.e., TK, HM, and non-NEA ST Chinese among the four major TK speaking groups according to geo- groups) than with SA and NEA groups (fig. 3). The highest graphic region; and South Asian-related admixture. sharing was between Thai Lahu and Chinese Lahu. The Amis and Atayal share more alleles with the TK groups than with Genetic Structure and the Heterogeneity of the SouthernThai_AN group from Thailand (fig. 3), in agree- Austroasiatic Speaking Groups ment with ADMIXTURE results (fig. 2C; supplementary fig. 6, AA speakers (comprising 102 million people speaking 167 Supplementary Material online). languages) are widespread across Asia, from South Asia When ancient samples are included in the analyses of ge- (Bangladesh and India) to southern China and MSEA netic structure and allele sharing, the two Hoabınhian sam- (Eberhard et al. 2020). There are two competing hypotheses ples (#69–70) are projected close to the Onge on PCA of AA origins that are related to rice cultivation, namely South (supplementary fig. 3, Supplementary Material online), while Asian versus Southeast Asian origins (Diffloth 2005; Chaubey most of the Neolithic samples (#71–79) fall with the AA and et al. 2011); the latter is supported by genetic evidence AN groups. However, the N-Oakaie sample (#78) from (Chaubey et al. 2011). The AA people in SEA are most likely Myanmar is closer to ST and HM groups. Most of the related to farmers who cultivated rice and millet and moved Bronze/Iron Ages samples (#80–82) cluster with the TK from their homeland, probably located near the Yangtze and AA samples except for the BA-NuiNap samples (#80) River, to the coast and then down the rivers of mainland from Vietnam, which are close to the Neolithic samples. China to SEA 4 kya (Weber et al. 2010; van Driem 2017; With respect to ADMIXTURE result at K ¼ 5 (fig. 2C), the Lipson et al. 2018; McColl et al. 2018). However, prior to the Hoabınhian samples show a major pink component with movement of prehistoric AA-related groups southward, minor blue and purple components, while all of the present-day MSEA (both upland and lowland) was home Neolithic samples exhibit a major blue component with mi- to hunter-gatherers whose descendants are genetically re- nor black, pink, and purple components, except that the lated to groups in southern Thailand and west Malaysia, purple component is enriched in the N-Oakaie sample such as the Maniq and Jehai (Jinam et al. 2012). The from Myanmar, and reduced/lacking in the N-GuaChaCave Neolithic farmer expansion did not completely replace the samples from Malaysia and the N-TamPaLing and N- hunter-gatherers but admixed with some of them, as TamHang samples from Laos. The purple component is reflected by both ancient and modern DNA studies (Lipson also enriched in the IA-LongLongRak Iron Age samples et al. 2018; McColl et al. 2018; Kutanan et al. 2017; Liu et al. from Thailand. The black component is substantially in- 2020). creased in the Bronze Age and historical samples, such as Previous genetic and linguistic evidence suggested hetero- the BA-NuiNap and Hi-HonHaiCoTien samples from geneity of the Thai AA people (Xu et al. 2010; Kampuansai Vietnam and the Hi-SupuHujung and Hi-Kinabatagan sam- et al. 2017; Kutanan et al. 2017; Eberhard et al. 2020) but ples from Malaysia (a similar pattern is seen in the Thai/Lao further genetic groupings have not yet been investigated. TK groups). In the outgroup f3 result (fig. 3), the ancient We obtained data for 11 AA speaking populations which samples N-TamPaLing and N-TamHang share more with can be clustered into four linguistic groups: Monic branch 3463
Kutanan et al. . doi:10.1093/molbev/msab124 MBE CT LF Country Language HmongNjua (CT) family (LF) HmongDaw Thailand Hmong−Mien IuMien SouthernThai_AN Laos Austronesian Lisu France Sino−Tibetan Lahu India Austroasiatic KarenPwo Philippines Tai−Kadai Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 KarenPadaung Indonesia Indo−European KarenSkaw Taiwan Dravidian Lawa_Western Lawa_Eastern Cambodia Andamanese Palaung Vietnam Tungusic Blang China Turkic Khmu Mongolia Mongolic HtinPray Japan Japonic HtinMal Mlabri Malaysia Soa Myanmar Bru Mon Subgroup (SG) Yuan Khuen Hmong_HM Phuan Karen_ST Shan Palaungic_AA Khonmueang Khmu_Katu_AA Lue BlackTai Monic_AA Laotian N_TK LaoIsan NE_TK Phutai C_TK Nyaw S_TK Saek Other Kalueang CentralThai SouthernThai_TK Thai HmongNjua HmongDaw IuMien SouthernThai_AN Lisu Lahu KarenPwo KarenPadaung KarenSkaw Lawa_Western Lawa_Eastern Palaung Blang Khmu HtinPray HtinMal Mlabri Soa Bru Mon Yuan Khuen Phuan Shan Khonmueang Lue BlackTai Laotian LaoIsan Phutai Nyaw Saek Kalueang CentralThai Thai SouthernThai_TK SG LF Thai/Lao groups 0.22 0.24 0.26 0.28 0.3 CT LF HmongNjua HmongDaw IuMien SouthernThai_AN Lisu Lahu KarenPwo KarenPadaung KarenSkaw Lawa_Western Lawa_Eastern Palaung Blang Khmu HtinPray HtinMal Mlabri Soa Bru Mon Yuan Khuen Phuan Shan Khonmueang Lue BlackTai Laotian LaoIsan Phutai Nyaw Saek Kalueang CentralThai SouthernThai_TK Thai French SG Brahmin_Tiwari Gujarati Lodhi Vishwabrahmin Mala Kharia Onge Mamanwa Semende Borneo Amis Atayal Cambodian Kinh Dai Miao She Lahu_C Yi Naxi Han Tujia Tu Xibo Hezhen Oroqen Uygur Daur Mongola Japanese Ho−PhaFaen Ho−GuaChaCave N−GuaChaCave N−ManBac N−NamTun N−MaiDaDieu N−HonHaiCoTien N−TamPaLing N−TamHang N−Oakaie N−LoyangUjungCave BA−NuiNap IA−VatKomnou IA−LongLongRak Hi−HonHaiCoTien Hi−SupuHujung Hi−Kinabatagan LF Comparative modern groups Comparative ancient groups FIG. 3. Population allele sharing profiles based on f3 statistics. Heatmap of outgroup f3 statistics (Thai/Lao groups, X; Mbuti) among Thai/Lao groups (upper) panel, and between Thai/Lao and other comparative modern Asian populations and ancient samples (lower). Black blocks denote missing values. The two colored bars at the top of the plot indicate the country (top) and language family (bottom) for each comparative population; and those on the side indicate language family (left) and subgroup (right) for each Thai/Lao group, according to the key at the right. 3464
Genome-Wide Data of Thai and Lao Populations . doi:10.1093/molbev/msab124 MBE CT LF HmongNjua HmongDaw IuMien SouthernThai_AN Lisu Lahu KarenPwo KarenPadaung >=25 KarenSkaw Avg. of summed CP length (cM) Lawa_Western Lawa_Eastern Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 Palaung 20 Blang Khmu HtinPray HtinMal 15 Mlabri Soa Bru Country Language Mon Yuan 10 (CT) family (LF) Khuen Thailand Hmong−Mien Phuan Shan Laos Austronesian Khonmueang 5 France Sino−Tibetan Lue India Austroasiatic BlackTai Philippines Tai−Kadai Laotian LaoIsan Indonesia Indo−European Phutai Taiwan Dravidian Nyaw Cambodia Andamanese Saek Vietnam Tungusic Kalueang CentralThai China Turkic A SouthernThai_TK Mongolia Mongolic Thai Japan Japonic HmongNjua Malaysia HmongDaw Myanmar IuMien SouthernThai_AN Lisu Subgroup (SG) Lahu Hmong_HM KarenPwo Karen_ST KarenPadaung >=10 Palaungic_AA KarenSkaw Lawa_Western Khmu_Katu_AA Avg. of summed IBD length (cM) Lawa_Eastern Monic_AA Palaung N_TK Blang Khmu 7.5 NE_TK HtinPray C_TK HtinMal S_TK Mlabri Other Soa Bru 5 Mon Yuan Khuen Phuan Shan 2.5 Khonmueang Lue BlackTai Laotian LaoIsan Phutai Nyaw Saek Kalueang CentralThai SouthernThai_TK Thai B HmongNjua HmongDaw IuMien SouthernThai_AN Lisu Lahu KarenPwo KarenPadaung KarenSkaw Lawa_Western Lawa_Eastern Palaung Blang Khmu HtinPray HtinMal Mlabri Soa Bru Mon Yuan Khuen Phuan Shan Khonmueang Lue BlackTai Laotian LaoIsan Phutai Nyaw Saek Kalueang CentralThai Thai French Brahmin_Tiwari Gujarati Lodhi Vishwabrahmin Mala Kharia Onge Mamanwa Semende Borneo Amis Atayal Cambodian Kinh Dai Miao She Lahu_C Yi Naxi Han Tujia Tu Xibo Hezhen Oroqen Uygur Daur Mongola Japanese SouthernThai_TK SG LF FIG. 4. Haplotype sharing profiles as inferred by the ChromoPainter and IBD analyses. The color bars at the top denote the countries and language families while the color bars at the left denote countries and subgroups, according to the keys. (A) Heatmap of ChromoPainter results in which the recipient Y (Thai/Lao groups) is painted by donor X (Thai/Lao and other modern Asian populations), with Y denoted by each row and X denoted by each column. The heatmap is scaled by the average length in centimorgans of the summed painted chromosomal chunks of the recipient individuals from the donor individuals. (B) Heatmap of IBD sharing among Thai/Lao comparisons and between Thai/Lao and other modern Asian populations. The heatmap is scaled by the average length in centimorgans of summed IBD blocks shared between individuals from the two groups. Black blocks denote missing values. (Mon); Khmuic branch (HtinMal, HtinPray, Mlabri, and Supplementary Material online). These revealed some finer Khmu); Katuic branch (Soa and Bru); and Palaungic branch structure within the AA groups: the Mon_AA group shows (Lawa_Eastern, Lawa_Western, Palaung and Blang) (Diffloth excess sharing with Indian donors (discussed in more detail 2005; Sidwell 2014). However, based on the PCA (fig. 2B), the below); Khmu_Katu_AA groups show strong intragroup Thai AA speaking groups can be roughly divided into three sharing but less sharing with other groups except for between groups: Palaungic_AA (Lawa_Western, Lawa_Eastern, the Soa and most NE_TK groups; and Palaungic_AA groups Palaung and Blang; #10–13); Khmu_Katu_AA (Khmu, show various sharing patterns, for example, a broad sharing HtinPray, HtinMal, Mlabri, Soa and Bru; #14–19); and profile of the Blang with several other groups versus strong Monic_AA (Mon; #20). The ADMIXTURE results at K ¼ 5 self-painting only of the Palaung, and strong sharing among also indicated that the AA-speaking groups can be clustered the Lawa_Eastern, Lawa_Western, Karen_ST groups, and TK- into three groups: the Palaungic_AA group exhibits two ma- speaking Shan. jor sources (blue and purple) with the black component as a We next computed f4-statistics of the form f4(group 1, minor source; the Monic_AA group possesses the pink com- group 2; group 3, Mbuti), where group 1 and group 2 are ponent; and the Khmu_Katu_AA group has a reduced fre- different AA groups while group 3 is from a different language quency of the purple component (fig. 2C). family/subgroup. By convention, a Z-score > 3 or < 3 To further investigate the genetic structure of AA indicates that group 3 shares significant excess ancestry groups, we carried out haplotype-based analyses, namely with group 1 or 2, respectively; nonsignificant Z-scores indi- ChromoPainter and sharing of segments that are identical cate that groups 1 and 2 form a clade and share equivalent by descent (IBD) (fig. 4; supplementary figs. 7–9, amounts of ancestry with group 3. The AA groups show a 3465
Kutanan et al. . doi:10.1093/molbev/msab124 MBE very heterogenous profile, for example, when compared to includes several admixture events, and indicates that the the AA-Palaung, most of the other AA groups have additional Khmu_Katu_AA and Palaungic_AA subgroups are more affinity to TK and AN groups (and some with the ST groups), closely-related, while the Monic_AA subgroup is distin- whereas the Palaung shows excess sharing with all the other guished from these by N-Indian-related ancestry, in agree- groups compared to the Mon (supplementary fig. 10A, ment with the results of other analyses (figs. 2 and 4A). Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 Supplementary Material online; supplementary table 2, Overall, the genetic evidence indicates that the Thai AA Supplementary Material online). Consistent with the haplo- speaking populations fall into 3 primary groups: Monic_AA, type sharing profile of the Palaungic_AA group, the Blang has Khmu_Katu_AA and Palaungic_AA (figs. 2–4; supplemen- a broader excess sharing with all other subgroups except for tary fig. 12, Supplementary Material online). The language the TK-speaking Khuen, the ST-speaking Lisu, and the HM of Mon is in the Monic branch, the sister clade of Aslian groups, and the Lawa groups seem to have additional affinity and Nicobarese, while the linguistic branch of to the Khmu_Katu_AA and the Karen_ST groups (supple- Khmu_Katu_AA groups are Khmuic for HtinMal, HtinPray, mentary fig. 11A, Supplementary Material online; supplemen- Mlabri, and Khmu, and Katuic for Soa and Bru; the Palaungic tary table 3, Supplementary Material online). However, within branch includes languages of the Lawa_Eastern, the Khmu_Katu_AA group, the Khmuic branch groups tend Lawa_Western, Palaung, and Blang. In contrast to linguistic to show excess sharing with the Palaungic_AA and the ST studies placing Khmuic and Palaungic languages in the same groups compared to the Katuic branch groups (supplemen- clade (Diffloth 2005), we find a closer relationship between tary fig. 11B, Supplementary Material online; supplementary populations who speak Khmuic and Katuic, which might be table 3, Supplementary Material online). explained by the concept of center of gravity (Blench 2015). We further investigated the groupings among AA Thai/ This idea proposes that after the Neolithic expansion of AA Lao groups by f4-statistics of the form f4(East Asian group, ancestors from southern China to MSEA, early AA speakers Han Chinese; AA Thai/Lao group, Mbuti), to see if any of the were concentrated along the middle Mekong in present-day AA groups showed different affinities with any East Asian northern Laos. Some groups subsequently moved westward groups in comparison with Han Chinese (supplementary ta- and were the ancestors of Palaungic and Monic groups, and ble 3, Supplementary Material online). Based on the allele and during this process they came into contact with different haplotype sharing profiles (figs. 3 and 4), we used Atayal, Dai, linguistic groups (e.g. Mon with Burmese ancestors, Cambodian, Miao, and Naxi as representative East Asian Lawa_Eastern and Lawa_Western with Karen_ST, and groups speaking AN, TK, AA, HM, and ST languages, respec- Palaung with ST groups from NEA), as shown by population tively. The grouping among AA Thai/Lao groups was also structure and relationship analyses and f4 tests (figs. 2–4; supported by this test; the Monic_AA show excess sharing supplementary fig. 11, Supplementary Material online; sup- only with the Dai, while the Khmu_Katu_AA and plementary table 3, Supplementary Material online). These Palaungic_AA groups are distinguished by the former sharing different contact histories would promote subsequent differ- excess ancestry with Atayal and having no significant Z-scores entiation of the Palaungic and Monic groups from their with Cambodian versus Han, while the latter have no signif- Khmuic and Katuic ancestors. Meanwhile, the Khmuic and icant Z-scores with Atayal and share excess ancestry with Han Katuic ancestors might have moved up and down the when compared with Cambodian. These results suggest more Mekong and had more contact with each other, thus ac- AN/TK and AA related ancestry in the Khmu_Katu_AA counting for their closer genetic relationship with each other. group, and more Han related ancestry in the Palaungic_AA In this region, the Khmuic and Katuic speaking people may group. have also interacted with TK groups in Laos and Northeastern We finally built admixture graphs using AdmixtureBayes, Thailand, promoting their genetic affinity (figs. 2B, 3, and 4; and then further investigated these admixture graphs with supplementary table 3, Supplementary Material online). qpGraph. To begin with, we built a backbone admixture However, some differentiation between the Khmuic and graph with the outgroup Mbuti, N_Indian, and the following Katuic groups can be seen in the haplotype sharing (fig. 4) representative East Asian groups: AA-speaking Cambodian, and ADMIXTURE results for K ¼ 10 (supplementary fig. 6, AN-speaking Atayal, TK-speaking Dai, HM-speaking Miao, Supplementary Material online). Additional studies of AA and ST-speaking Naxi (fig. 5A). Another f4 test with Amis, groups from Thailand (e.g. Pearic and Khmer speaking She, and Yi as alternative AN, HM, and ST representative groups) and other MSEA countries are needed to provide groups, respectively, was performed to verify that our choice more insights into the genetic structure of AA-speaking of representative groups is not biased in distinguishing the people. fine-scale relationships within each language family (supple- mentary fig. 13, Supplementary Material online). In the back- Genetic Structure of the Hill Tribes bone graph, the first split separates the N_Indian from the Consisting of 700,000 people, there are nine officially rec- East Asian groups, then the Naxi are separated from the other ognized hill tribes in Thailand: the AA-speaking Lawa groups. The ancestor of Atayal and Dai is admixed from (Lawa_Eastern and Lawa_Western), Htin (HtinMal and ancestors of N_Indian and Miao with 6% and 94% ancestry, HtinPray) and Khmu; the HM-speaking Hmong respectively. The ancestor of Cambodian is admixed with 73% (HmongNjua and HmongDaw) and IuMien; and the ST- ancestry from the ancestor of Dai and 27% from the ancestor speaking Karen (KarenPwo, KarenPadaung, and of all East Asian groups. The graph of AA groups (fig. 5B) KarenSkaw), Lahu, Lisu, and Akha (Schliesinger 2000, 2001; 3466
Genome-Wide Data of Thai and Lao Populations . doi:10.1093/molbev/msab124 MBE A r B r C r 31 31 30 30 31 31 Mbuti Mbuti Mbuti 2 7 8 3 9 Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 N_Indian 1 10 10 18 9 N_Indian 2 ~0 72% 28% 5% 1 2 1 6% Naxi ~0 1 95% 0.4 94% 0.4 Naxi 5 1 Miao 27% 21% 0.5 1 13% Miao 71% 2 0.5 1 Paluangic_AA 29% 2 79% 0.5 Hmong_HM Dai 12 0.3 ~0 Atayal 0.1 19% 81% 87% 10% 90% 73% IuMien 0.3 N_Indian Dai 0.3 ~0 2 1 Monic_AA Cambodian Khmu_Katu_AA Cambodian Austroasiatic Austronesian Hmong−Mien Sino−Tibetan D E F Tai−Kadai r r r 31 31 28 28 31 31 Mbuti Mbuti Mbuti 1 14 1 7 10 11 2 10 2 2 2 9 37% N_Indian N_Indian 63% 0.1 0.1 27% 1 0.3 1 1 1 12% NE_TK 73% 0.5 5 2 0.5 1 27% 17% 2 ~0 88% Dai 13% Lisu Naxi 90% N_Indian ~0 7 6 4 10% 83% 0.3 0.2 Cambodian Atayal Lahu 71% 42% N_TK C_TK 73% 1 58% 0.1 ~0 87% 74% 26% 29% S_TK SouthernThai_AN 0.3 2 2 Cambodian Karen_ST Miao FIG. 5. Admixture graphs for the Thai/Lao groups, for each language family. The node r denotes the root. White nodes denote backbone populations. Backbone population labels and Thai/Lao nodes are colored according to language family. Dashed arrows represent admixture edges, while solid arrows are drift edges reported in units of FST 1,000. (A) backbone populations (worst-fitting Z ¼ 0.861). (B) AA groups (worst- fitting Z ¼ 2.101). (C) HM groups (worst-fitting Z ¼ 2.028). (D) ST groups (worst-fitting Z ¼ 2.873). (E) TK groups (worst-fitting Z ¼ 2.270). (F) AN group (worst-fitting Z ¼ 1.713). 3467
Kutanan et al. . doi:10.1093/molbev/msab124 MBE Penth and Forbes 2004). Living in a remote and isolated re- contrast, the Karen in Thailand are refugees who claim to gion of Thailand, the hill tribes are of interest for their cultural be the first settlers in Myanmar before the arrival of Mon and variation in postmarital residence patterns, that is, patrilocal- Burmese people, and moved from Myanmar beginning ity versus matrilocality (Oota et al. 2001; Besaggio et al. 2007; around 1750 A.D. due to the growing influence of the Kutanan et al. 2019, 2020). Most of the hill tribes are isolated Burmese (Kuroiwa and Verkuyten 2008; Gravers 2012). The Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 from the lowlanders and from each other, which enhances Lawa share ancestry with the Karen_ST (fig. 4; supplementary genetic drift and inbreeding, as found in previous studies of fig. 5, Supplementary Material online), in agreement with autosomal STR (Kampuansai et al. 2017) and mtDNA and previous findings of shared MSY haplotypes (Kutanan et al. MSY variation (Kutanan et al. 2020). 2020). Genetic relatedness between Karen and Lawa groups Here, we investigated eight of the official hill tribes (all but was also reported in a previous genome wide study (Xu et al. the Akha) and the Mlabri, who are not officially regarded as a 2010). In northern Thailand, Lawa and Karen had been in hill tribe but live in the mountainous area. All of them exhibit contact with one another since around the 13th century high within-population IBD sharing (supplementary fig. 8, A.D., during the Lanna Period (Lewis and Lewis 1984). Supplementary Material online), as expected given the results Because the languages of AA-speaking Lawa and ST- of previous studies that suggested high levels of isolation and speaking Karen are different, geographic proximity along strong genetic drift. The Mlabri in particular show the greatest the border between northern/northwestern Thailand and levels by far of within-group IBD sharing, in agreement with Myanmar is the most likely factor that promoted admixture their enhanced self-painting in the ChromoPainter analysis between these groups. (fig. 4A). In the ADMIXTURE results at K ¼ 10, four groups The IuMien and Hmong are descended from proto-HM stand out with their own ancestry components (supplemen- groups from central and southern China (Wen et al. 2005) tary fig. 6, Supplementary Material online): Lahu (light green), and are linguistically related; there is no significant sharing of Karen_ST (gray), Htin (Mal and Pray), Khmu (mint), and ancestry between HM and non-HM groups in the f4 analyses Hmong_HM (peach). In contrast, the Lawa (Eastern and (supplementary fig. 10B, Supplementary Material online; sup- Western), IuMien, and Lisu do not stand out in the plementary table 2, Supplementary Material online). ADMIXTURE analysis, and they have relatively less within However, they still behave differently in many analyses group IBD sharing compared to other hill tribes (supplemen- (figs. 3–5; supplementary figs. 6 and 12, Supplementary tary fig. 8, Supplementary Material online). This was further Material online). The Hmong show genetic signatures of iso- revealed by excess allelic sharing with many other populations lation, such as higher IBD sharing within groups (supplemen- in the f4 results (supplementary tables 2 and 3, tary fig. 8, Supplementary Material online), in agreement with Supplementary Material online) and haplotype sharing with a previous study of uniparental markers (Kutanan et al. 2020), other groups (fig. 4A; supplementary fig. 7, Supplementary whereas the IuMien show affinities not only with the Hmong Material online). but also with TK speaking groups and ST speaking Lahu from The Lawa belong to the Palaungic_AA group, which in the both Thailand and China (fig. 4). The differential affinities of admixture graph for AA groups receives ancestry from the HM groups to TK and ST groups has also been shown in two ST-Naxi (fig. 5B). We further built admixture graphs for the recent genome-wide studies (Liu et al 2020; Xia et al. 2019). In HM and ST hill tribes. For the HM groups (fig. 5C), there is a addition, the sharing of features between IuMien (but not divergence between the Dai and a Miao-Hmong clade, while Hmong_HM) and Sinitic languages (Blench 2008) indicates the IuMien are admixed with 29% ancestry from an ancestor that IuMien similarities with other East Asian populations is of the Hmong and 71% from an ancestor of the Dai. The evident both genetically and linguistically. The higher genetic additional TK-related ancestry in IuMien is consistent with isolation of the Hmong could reflect cultural isolation arising haplotype-sharing and f4 results (fig. 4; supplementary fig. 12, from a strong preference for marriage within Hmong groups, Supplementary Material online). The graph of ST groups while the lower genetic isolation of the IuMien could reflect indicates that Lisu, Lahu and Naxi form a clade, while the the pronounced IuMien cultural preference for adoption Karen_ST have additional Cambodian-related ancestry (Schliesinger 2000; Jonsson 2005; Besaggio et al. 2007). (fig. 5D); this AA-related admixture in the Karen is in agree- The Lisu and the Lahu are originally from southern China ment with the haplotype-sharing results (fig. 4), and the di- and speak closely related languages that belong to the Loloish vision of Lahu/Lisu versus Karen_ST groups is also supported branch of ST (Bradley 1997). Shared genetic ancestry between by f4 results (supplementary fig. 10C, Supplementary Material Lisu and Lahu is evident in the haplotype sharing and admix- online). ture graph results (figs. 4 and 5D), although there are differ- These results indicate that not all hill tribes can be char- ences: Lisu have mixed ancestries probably due to Sinicization acterized simply by high degrees of isolation and genetic drift; in southern China before movement to Thailand (Schliesinger the Lawa, IuMien, and Lisu instead seem to have had more 2000) or interactions with northern Thai lowlanders after interactions with other groups, and so we will focus further settlement in Thailand (Penth and Forbes 2004), while the discussion on these three hill tribes. The Lawa (Eastern and Lahu are more isolated, e.g. the ADMIXTURE result for K ¼ 7 Western) are the native groups of northern Thailand and (supplementary fig. 6, Supplementary Material online) and inhabited lowland areas before some of them moved to the the IBD sharing results (supplementary fig. 8, highlands (Lawa_Western) while others remained in the low- Supplementary Material online), in agreement with a previ- lands or mid-lands (Lawa_Eastern) (Nahhas 2007). By ous study of uniparental markers (Kutanan et al. 2020). There 3468
Genome-Wide Data of Thai and Lao Populations . doi:10.1093/molbev/msab124 MBE is strong ancestry sharing between the Thai Lahu and Chinese groups that expanded from China. A previous genome- Lahu (figs. 3 and 4), and the Chinese Lahu are moreover wide study also reported substructure of Thais in each region genetically similar to Vietnamese Lahu (Liu et al. 2020), indi- (Wangkumhang 2013); however, these previous studies did cating a close relationship among Lahu from MSEA and not investigate this substructure in detail. China. In this study, we investigated one TK population from Laos Downloaded from https://academic.oup.com/mbe/article/38/8/3459/6255759 by Max Planck Institut Fuer Evolutionaere Anthropologie user on 23 September 2021 Finally, though the Mlabri are not officially regarded as a and 15 TK populations from Thailand that can be grouped by hill tribe, this minority group is of interest due to their unique geographic region: northern Thailand (N_TK), northeastern hunting-gathering life style, enigmatic origin, and very small Thailand (NE_TK), Central Thailand (C_TK), and Southern census size (400 individuals) (Eberhard et al. 2020). The Thailand (S_TK). Based on the PCA (fig. 2B), the TK groups Mlabri language belongs to the Khmuic branch of AA lan- from different geographic regions in Thailand show different guages that is also spoken by their neighbors, Htin (Mal and relationships; the N_TK groups are close to the Palaungic_AA Pray subgroups) and Khmu, suggesting shared common an- groups, AA-speaking Kinh, AN groups from Taiwan (#49–50) cestry, and oral tradition indicates that the Htin are the and the Philippines (#46), while the northeastern Thai TK ancestors of the Mlabri (Oota et al. 2005). A previous groups (NE_TK; Black Tai, Lao Isan, Phutai, Nyaw, Saek, and genome-wide study also supported genetic affinities between Kalueang; #27 and #29-33) are close to the Khmu_Katu_AA the Mlabri and the HtinMal (Xu et al. 2010), while uniparental groups. The TK speaking Laotian (#28) are grouped with the studies indicate paternal relationships among Mlabri, NE_TK groups. The central and southern Thai TK groups HtinMal, HtinPray, and Khmu and an oral tradition, versus (C_TK and S_TK; CentralThai and SouthernThai_TK; #34 maternal genetic relationships among Mlabri and Katuic- and #35) and Thai-HO (#36) are close to the Monic_AA speaking Soa and Bru from northeastern Thailand (Kutanan groups. In accordance with the PCA results, ADMIXTURE et al. 2018a). Our present results also support genetic relat- results at K ¼ 11 also show the different TK-speaking groups edness among Mlabri, Htin (Mal and Pray), Khmu, Soa, and can be distinguished: the blue component is now enriched Bru within the Khmu_Katu_AA group (fig. 2B; supplemen- mostly in the N_TK group, the additional light brown com- tary figs. 6 and 7, Supplementary Material online). The Mlabri, ponent is enriched in the NE_TK group, and the C_TK and Htin, Khmu, Soa, and Bru all migrated from Laos about 100– S_TK group possess the additional pink component as men- 200 years ago (Schliesinger 2000), thus close relatedness tioned previously (supplementary fig. 6, Supplementary among them might reflect gene flow among various groups Material online). in Laos before their independent migrations to Thailand. Some finer structure within the Thai TK groups is revealed However, the Mlabri stand out among these groups in exhib- by ChromoPainter analysis (fig. 4A; supplementary fig. 7, iting extremely high levels of within-group IBD sharing (sup- Supplementary Material online): N_TK populations show plementary fig. 8, Supplementary Material online), indicating strong sharing with each other and the Dai, though the strong genetic drift and isolation, consistent with previous Shan show additional sharing with the Lawa_Eastern and investigations of mtDNA, Y chromosome, and autosomal Karen_ST groups. The NE_TK groups show strong sharing diversity (Oota et al. 2005; Xu et al. 2010; Kutanan et al. with the Khmu_Katu_AA group, Cambodian, Borneo, and 2018a). Both the small census size and recent origin within Dai. Notably, the Laotian show a relatively broader sharing the past 1,000 years (Oota et al. 2005), combined with geo- profile and high sharing with the HM groups, whereas the graphic isolation, could account for the very low genetic di- BlackTai show a strong selfpainting profile. In addition to versity of this group. strong sharing with Khmu_Katu_AA groups, the C_TK group shows an excess sharing with the Indian donors, which is Differences among the Four Major TK Speaking similar to the profile of Thai-HO. The S_TK group also shows Groups According to Geographic Region a similar profile as C_TK but additional sharing with the AN- With an origin from south/southeastern China (Sun et al. speaking Mamanwa, Borneo, and Semende, which is similar 2013; Pittayaporn 2014), the TK language family comprises to the profile of the SouthernThai_AN (who show even around 95 languages spoken by 80 million people in north- stronger and broader sharing with the other AN groups). east India, southern China, Vietnam, Myanmar, Cambodia, The results of f4-statistics of the form f4(TK group 1, TK Thailand, and Laos (Eberhard et al. 2020). A common origin group 2; non-TK group 3, Mbuti) show that, in particular, the of TK and AN language families in southern China was sug- profiles of NE_TK and N_TK groups show strong excess shar- gested previously based on linguistic and genetic evidence ing with each other and the HM groups, followed by ST and (Thurgood 1994; Sagart 2004; Kutanan et al. 2018b; Yang AA groups (supplementary fig. 11C–E, Supplementary et al. 2020). The TK languages spread to MSEA around 1–2 Material online; supplementary table 3, Supplementary kya (Pittayaporn 2014), and previous genetic studies esti- Material online). Many of the highest Z-scores come from mated an expansion time for TK groups 2 kya (Kutanan comparisons involving the Laotian population (supplemen- et al. 2019) and found relatedness between modern TK pop- tary figs. 10D and 11C, Supplementary Material online; sup- ulations and ancient Iron Age samples (McColl et al 2018). plementary tables 2 and 3, Supplementary Material online), in MtDNA and MSY data indicate contrasting genetic variation agreement with their broader haplotype sharing profiles and genetic differences between major TK groups in the (fig. 4). In addition, we found that Thai-HO and North, Northeast, and Central regions of Thailand (Kutanan CentralThai form a clade in all the tests (Z scores within 6 et al. 2019), suggesting different migration routes of TK 1.5), suggesting their close relationship in agreement with 3469
You can also read