Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

okarinaofsteiner's East Eurasian megathread [reposts]
#1
Link to the Proboards thread this one is copied from

Copying + pasting blurbs from my Anthrogenica megathread here:

First post from June 2020:
Quote:Hello Anthrogenica! Some of you may know me as that person from Anthroscape who posted a lot about how East Eurasians score on GEDmatch calculators. Not proud of being associated with the "race realism" aspect Anthroscape (I was always uncomfortable with the association between "phenotype classification" and racial supremacist ideas), but I find population genetics/history interesting, and I stayed on the forum to participate in the community of Overseas Asians that developed there. There was a lot of discussion of geography, economics, immigration, pop culture, and diaspora Asians' relationships with their native and host cultures in the "Nordsinid/Mittelsinid/Sudsinid" megathread.

Anyway, I thought I'd make a thread here for re-sharing the content I posted on Anthroscape as a hobby. I never intended for my content to be associated with racialist ideas or ideologies, and thought this could be a better site for hosting that information, since it's more focused on human population genetics.

My first few posts in the megathread were reviews/summaries of existing autosomal genetics papers that I knew of before I ordered my 23andMe test in 2017.


Analysis of East Asia Genetic Substructure Using Genome-Wide SNP Arrays from 2008

Quote:The populations including those from the HGDP, HapMap, the I-control database, a Korean sample set and East Asian Americans. For all but the East Asian American and Korean samples set, genotypes were available from online databases. These included HapMap subjects (44 CHB and 44 JPT) and HGDP subjects (10 Cambodian, 10 Dai, 24 Hazara, 9 Hezhen, 27 Japanese, 10 Miaozu, 7 Naxi, 8 Oroqen, 10 She, 10 Tu, 10 Tujia, 8 Xibo, 13 Yakut and 44 Han Chinese) from the I-ControlDB (www.illumina.com/iControlDB, Illumina, San Diego, CA). Genotypes from other HGDP subjects (10 Daur, 8 Lahu, 9 Mongola, 10 Uygur, 10 Yi,) were from the NIH Laboratory of Neurogenetics (http://neurogenetics.nia.nih.gov/paperdata/public/).

The EAS American samples were individuals born in the respective EAS country and were from Vietnam (22 subjects), Philippines (17 subjects) and different regions of the Peoples Republic of China (23 subjects) and Taiwan (9 subjects)... 32 Chinese American samples were recruited in Houston TX. Of the Chinese American participants (CHA), 28 also indicated their general origin from regions within China (6 north, 10 south, 3 central and 9 subjects Taiwan).

[Image: fPfENR0.png]

CHB is a sample of Han Chinese people from Beijing, CHA is a sample of Chinese Americans from Texas, and TWN is a subset of CHA referred to as "Chinese Americans from Taiwan". The PCA shows clear Northern Han and Southern Han clusters, although the southern cluster is more diffuse. The distribution of the CHA samples corresponds with their regional origin within China; 7 of the 23 CHA samples fall within the Northern Han cluster, which corresponds to the 6 Chinese Americans who indicated they were of Northern Chinese origin. The "Southern Han" CHA samples also form two separate clusters which appear to correspond to Central and South China.


Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation from 2009

Quote:Abstract

Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional ‘‘north-south’’ population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused ‘‘outliers,’’ probably because of the impact of modern migration of peoples.

At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.
North-South Structure among Han Chinese. a) Selected Chinese provinces. b) CHB (Beijing university students, purple), Shanghai (orange), and Singapore (yellow). c) Guangdong dialect groups- Cantonese, Hakka, and Teochew.
[Image: 1-s2.0-S0002929709004716-gr1.jpg]

Map of selected provinces and where they plot on the PCA
[Image: 1-s2.0-S0002929709004716-gr2.jpg]

Another visualization of the north-south differences among Han subgroups. Red = SEA-like, yellow = Japanese-like, brown = "Continental"
[Image: 1-s2.0-S0002929709004716-gr3.jpg]


TL;DR:

  1. Most of the regional genetic differences among Han Chinese occur along a north-south axis.
  2. Due to the uneven geography of the country and a history of north->south expansion, Northern Chinese are more homogeneous than Southern Chinese.
  3. Southern China is more mountainous than the north, which resulted in strong founder effects among different subgroups separated by mountain ranges. This resulted in greater linguistic and genetic diversity among and within different Southern Chinese provinces.
  4. This means any rigorous estimate of how "northern" or "southern" different Chinese subgroups are needs to take native topolect and known regional ancestry into account, especially for Southern Chinese provinces/groups.



Quick note on the 1000 Genomes reference populations/datasets:
[Image: MdlDtjH.png]


CHB (n = 103) -> Chinese university students studying in Beijing (cosmopolitan origin as indicated by their PCA clustering).

CHS (n = 105) -> Han Chinese sample of Hunan and Fujian origin (source: http://anthropogenesis.kinshipstudies.or...t-of-asia/)

Quote:“We next tested certain obvious predictions of the out of East Asia model. First, the model predicts lower diversity in people directly associated with the original AMH and higher diversity in people resulting from admixture of AMH with archaic humans. We calculated the hom PGD in slow SNPs as well as het numbers for each of the 25 groups totaling 2534 individuals in 1kGP. The lowest hom PGD level was found in LWK followed by slightly higher level in CHS (Supplementary Figure S8-A). However, LWK has significantly higher numbers of het than CHS (Supplementary Figure S8-B ). As high level heterozygosity indicates high genetic diversity and would reduce hom distance, it is likely that CHS has lower genetic diversity than LWK. We further found that within CHS (made of 72 individuals from Hunan and 36 from Fujian), Hunan samples have lower hom PGD and het numbers than Fujian samples (Supplementary Figure S8-CD). These results indicate that CHS, in particular Hunan people, have lowest genetic diversity levels among the 25 groups in 1kGP. Given that known admixed groups such as MXL and PUR showed the highest genetic diversity or PGD (Supplementary Figure S8-A), it may be inferred that CHS or Hunan people may have the least amount of admixture and hence represent the original AMH group, at least among the 25 groups sampled here.”

The other populations of interest are
* JPT (Japanese from Tokyo)
* KHV (Kinh/ethnic Vietnamese from Ho Chi Minh City)
* CDX (Chinese Dai from Xishuangbanna in Yunnan province)

1000 Genomes paper: https://www.nature.com/articles/nature11632
Mulay 'Abdullah and FR9CZ6 like this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#2
2017 paper on regional autosomal genetic differences among Han Chinese within China. This paper inspired me to do independent amateur research on how Han Chinese from different regions of Greater China cluster on GEDmatch calculators, because I wanted to see how I compared to other Chinese people haha.

These samples were taken from hospitals throughout the country- data samples are labeled according to place of birth. (All the Zhejiang samples are of people born in Zhejiang province, all the Shaanxi samples are of people born in Shaanxi province, etc.) I was a bit surprised to see how much more northern Fujian was from Guangdong- I figured Fujian would be more southern-shifted considering how linguistically divergent the Min branch (native to Fujian) of spoken Chinese is from other Chinese varieties that are more directly descended from Middle Chinese.

https://www.biorxiv.org/content/10.1101/162982v1
A comprehensive map of genetic variation in the world’s largest ethnic group - Han Chinese
Charleston W. K. Chiang, Serghei Mangul, Christopher R. Robles, Warren W. Kretzschmar, Na Cai, Kenneth S. Kendler, Sriram Sankararam, Jonathan Flint
doi: https://doi.org/10.1101/162982
Now published in Molecular Biology and Evolution doi: 10.1093/molbev/msy170

https://www.tapatalk.com/groups/anthrosc...74452.html
Quote:Chinese and East Asian users may be interested in this tweet and study:

https://twitter.com/CharlestonCWKC/status/885540817649246208

Shows population structure in PCA between provinces of China:
Quote:This is cool because they found an East-West dimension beyond the simple North Chinese-South Chinese difference.
[Image: ZKn8GBdl.jpg]
Due to sampling issues, this is effectively just a plot of Shaanxi, Liaoning, Jiangsu, Zhejiang, Sichuan, and Guangdong samples. You could remove all the other provinces and the plot would look exactly the same aside from provincial averages.

Few cities in their supplement:
[Image: CIFmIALl.png]

Beijing mostly falls within the Northern China cluster, Shanghai mostly falls within the Jiangzhe (Yangtze Delta) cluster, and Chongqing mostly falls within the range you see with Sichuan samples. Beijing and Shanghai both seem to have a few "southern" outliers, but only a few of the Shanghai samples (and virtually none of the Beijing samples) overlap with the Guangdong samples.

N-count for each province in the survey. Heilongjiang (HLJ) and Liaoning (LN) in the Northeast are wildly overrepresented relative to population size, as are Shaanxi in the northwest and Jiangsu and Zhejiang in the Yangtze Delta (near Shanghai). Meanwhile, Anhui (AH), Hunan (HUN), Fujian (FJ), and Guangxi (GX)- all southern rice-growing provinces- are underrepresented.

[Image: 0RS8myCl.jpg]
[Image: uSbqgGhl.png]

(Razib Khan’s analysis) https://www.gnxp.com/WordPress/2017/08/0...-of-china/

Quote:The most important thing about this preprint is not that the sample size is large enough that they could detect low frequency variants and add to the catalog. No, for me, it is that they sampled so many of the provinces. As you can see in the figure up top just like Europe China’s Han population recapitulate the map of China. That is, populations arrange themselves spatially when projected onto a principle components analysis plot in the same manner that they do geographically. This is a new finding in some ways because previous sampling strategies had not been robust enough to detect the east-west cline (though to be honest if you looked at the Chinese samples in the 1000 Genomes there was suggestion of this).
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#3
https://anthrogenica.com/showthread.php?20657-okarinaofsteiner-s-East-Eurasian-GEDmatch-megathread&p=756414&viewfull=1#post756414 (on page 3 of the original thread, I linked the GenoPlot archive)

There was an interesting article on Han Chinese population genetics released sometime in 2018, based on low-pass, noninvasive prenatal testing samples from various women who were Chinese nationals.

Among other things they compared the allele distribution of Han women from various provinces with 5 of the 1000 Genomes populations: CHB (Han Chinese from Beijing), CHS (Southern Han Chinese), JPT (ethnic Japanese from Tokyo), CDX (ethnic Dai from Xishuangbanna Yunnan), and KHV (Kinh Vietnamese from Ho Chi Minh City). This basically shows how genetically similar different Han Chinese subgroups are to those populations.

Screenshots of allele sharing patterns between Han and 1000 Genomes East Asian groups:
[Image: CtmgSybm.png] [Image: sdm5Bhjm.png] [Image: yCMJi2Zm.png]
[Image: g9wlMjBm.png] [Image: pZr6ND7m.jpg]


ARTICLE| VOLUME 175, ISSUE 2, P347-359.E14, OCTOBER 04, 2018
Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History
Open Archive DOI:https://doi.org/10.1016/j.cell.2018.08.016


Quote:Highlights
• Genome sequencing from low-pass noninvasive prenatal testing samples
• GWAS of 141,431 low-pass genomes reveals 16 unknown genetic associations
• Patterns of clinically relevant viral infection in maternal plasma
• Insights into the genetic structure and history of the Chinese population

Summary
We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.

2022 commentary from me: (linked the GenoPlot archive)

This paper was referenced in an earlier 2022 Quora answer on genetic diversity among Han Chinese vs Uyghurs. Some of the clinically significant genetic differences are highlighted in the paper, which I never really looked at before.
[Image: g1MMOnP.png]

"Figure 3. Genetic Adaptation in Han Chinese Population

(A) Manhattan plot showing the detected selection signals in Han Chinese population across the first principal component. VEP annotated names of the gene loci under selection are displayed.
(B–G) Derived allele frequency per Chinese administrative division for the lead SNP in loci under selection across latitude. Shown is the derived allele frequency distribution of the lead SNP...
(H–O) Allele frequency per administrative division for the ClinVar pathogenic variants with a significant difference of allele frequencies across North, Central, and South regions...
 
B = [CR1 loci](https://en.wikipedia.org/wiki/Complement_receptor_1) (immune system). Derived allele more common in the south, especially away from the Yangtze Delta.
C = [FADS2](https://en.wikipedia.org/wiki/FADS2) (fatty acid metabolism). Derived allele allegedly linked to increased intelligence in breast-fed babies in some studies but has not been replicated (i.e. probably not real). More common in the north, then the Yangtze basin, then the South China Sea coast.
D = [ELK2AP-MIR4507](https://www.ecosia.org/search?q=ELK2AP-M...%20cluster) (cancer related?) Derived allele more common in the north, but the north-south transition in frequency occurs further north
E = [ABCC11](https://en.wikipedia.org/wiki/ABCC11) (earwax/body odor, this one is more commonly known). Derived allele = dry earwax, north-south frequency transition occurs further south than C such that it's basically Guangxi vs everywhere else
F = [DOCK9](https://en.wikipedia.org/wiki/Dock9) ([mutations of this have been connected to bipolar disorder](https://zenodo.org/record/1234913)). Derived allele is most common in Guangxi, less common in the north, and much rarer in Qinghai than the rest of the north. (https://www.nature.com/articles/ejhg2011139). I wonder if this is in any way related to stereotypes of southern Chinese being less "direct" and more group-oriented than northern Chinese, which is arguably more of a hallmark of rice culture vs wheat culture.
G = [LILRA3](https://en.wikipedia.org/wiki/LILRA3) (immune system).
 
The rest of the ones were said to have stronger north-south regional differences

H = [Meckle syndrome (type 2)](https://en.wikipedia.org/wiki/Meckel%E2%...r_syndrome). Lethal rare genetic disease. Derived allele concentrated in Gansu and Hainan, but also Guangdong, Fujian, and Jilin
I = [Complement component 9](https://en.wikipedia.org/wiki/Complement_component_9) (immune system). Derived allele has uniform frequency across most of China proper, but more common in Guangxi, Guangdong, Shanghai, and Liaoning
J = Deafness. Most common in Guangxi, Guangdong, Fujian, and Hunan. Relatively rare in Hainan, Guizhou, and the north.
K = [Aceruloplasminemia](https://en.wikipedia.org/wiki/Aceruloplasminemia) (copper deficiency in brain, causes adult neurological problems). Derived allele most common in the inland north, also somewhat common in Hainan and Yungui (Yunnan + Guizhou)
L = [Usher syndrome](https://en.wikipedia.org/wiki/Usher_syndrome) (deafblindness). Derived allele more common in south than north, most common in Hainan, also more common in Xinjiang
M = Albinism (red pigmentation). Derived allele map is basically a map of West Eurasian admixture among Han Chinese lol (Qinghai, Gansu, Shanxi most frequent, much less common in the coastal north and nonexistent south of the Qinling-Huaihe line)
N = non-syndromic genetic deafness. Derived allele is more common everywhere in the south than anywhere in the north, but more common in Shaanxi, Beijing, and Liaoning than other parts of the north.
O = [G6PD CANTON](https://en.wikipedia.org/wiki/Glucose-6-...deficiency) (ELI5- metabolism issue, can't eat fava beans). This variant is most common in Shanxi, Guangdong, Guangxi, Yunnan, and to a lesser extent Fujian- but not Hainan, and is rare elsewhere. [Seems different from the shared higher rates of esophageal cancer among Shanxi and Fujian/Chaoshan Han](https://pubmed.ncbi.nlm.nih.gov/20559544/)

Too lazy to reformat my Reddit markdown (link) for current forum formatting at the moment



Will start copy + pasting my reposted GEDmatch MDLP K23b analyses in the next few days.
Mulay 'Abdullah likes this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#4
Tried to make Imgur albums for my Anthrogenica photo albums. My pre-2021 (Anthroscape era) Imgur albums weren't super organized so these may not be exactly the same.


[2018] MDLP K23b results for East Asians (https://anthrogenica.com/album.php?albumid=167)
collection of MDLP K23b plots for East Asian GEDmatch samples
https://imgur.com/a/YqLuTir


[2019] DNAConnect.org Chinese adoptees MDLP K23b (https://anthrogenica.com/album.php?albumid=193 )
Source: http://research-china.weebly.com/gedmatc...igins.html
Samples were gathered from GEDmatch.com in early 2019
https://imgur.com/a/B7mC7gt


[2019] Speculative MDLP K23b Han averages for Chinese provinces (https://anthrogenica.com/album.php?albumid=195)
based on the DNAConnect.org dataset (which is almost entirely from the rice-growing provinces) + private individuals of known regional Chinese ancestry
https://imgur.com/a/cn9vBpQ


[2023] Speculative MDLP K23b Han averages for Chinese provinces (https://anthrogenica.com/album.php?albumid=239)
based on 23mofang averages listed here:
https://i.imgur.com/sYXNCyH.jpg
and my attempt to model 23mofang ancestry components in terms of MDLP K23b reference populations

Quote:N_Han = 0.08 Amerind, 0.08 Ancestral_Altaic, 0.12 S_C_Asian, 0.14 Arctic, 0.09 S_Indian, 0.03 Australoid, 18.43 Austronesian, 0.09 Caucasian, 0 Archaic_Human, 0.01 E_Afr, 0.59 E_Siberian, 0.04 EEF, 0 Khoisan, 0.18 Melano_Polynesian, 0.01 Archaic_Afr, 0.06 Near_East, 0.04 N_Afr, 0.25 Paleo_Sib, 0 Afr_Pygmy, 46.17 S_E_Asia, 0 SSA, 33.55 Tungus_Altaic, 0.04 EHG

S_Han = [4 parts Han_ (HGDP) + 2 parts Korean (avg) + 2 parts Vietnamese + 2 parts Chinese_Dai + 1 part Tujia + 1 part She + 1 part Ami + 1 part Naxi]

Daic (Tai-Kradai) = 5/16 "Chinese_Dai" + 1/16 part "Tai_Lue" + 2/16 parts "Jiamao" + 2/16 parts "Zhuang" + 5/16 parts "Dai" + 1/16 part "Ami_Taiwan"
Tungusic = average of: "Xibo" + "Oroqen" + "Hezhen" + "Daur"
Hmong-Mien = average of: "Yao" + "Miao" + "Hmong_Miao" + "Hmong"
Japanese = "Japanese" (the one that scores highest on T_A)
Korean = average of: "Korean_" + "Korean_KR"
Lahu = "Lahu"
Buryat = "Buryat"
Yakut = "Yakut" (doesn't include the other "Yakut" reference pop)
Uyghur = average of: "Uygur" + "Uygur-Han"
Tibeto-Burman = average of: "Tibetian_Madou" + "Tibetian_TTR" + "Naxi" + "Yi"
https://imgur.com/a/XzyouJR
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#5
https://genoplot.com/discussions/topic/2...thread/283 (https://genoplot.com/discussions/post/643802)

[Image: s9roVJz.png]

My commentary on ph2ter's map-
Quote:In the most recent version, we can see that

1. The Non-East Eurasian component of Aeta/PH Negritos is modeled as Leang Panninge.
2. Non-Igorot Filipinos and Koreans are modeled as having a little Jomon (but not Taiwanese aborigines).
3. The Non-East Eurasian component in Wallaceans is modeled as Papuan/Australian aborigine-like. A good chunk of the East Eurasian ancestry in many Wallaceans is Austroasiatic-like, certainly more than in Filipinos.
4. Malay, Lao, and Siamese have both Simulated_AASI and IRN_Shahr-i-Sokhta. Cambodian and Khmer only seem to have the first one.
5. The Non-East Eurasian component of MY Negritos is mostly Hoabinhian-like and Simulated_AASI-like, but is also modeled with some Leang Panninge and Papuan-like component.
6. The South Asian groups that have Afanesievo also have Steppe_MBLA
7. The West-East cline for Chokhopani vs Upper_YR_LN among Himalayan-speaking groups in Mainland SEA is interesting.
8. Mongolic groups in China have a lot of Himalayan (Sinitic-related) ancestry
JMcB, Mulay 'Abdullah, ph2ter like this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#6
Repost of my Anthrogenica repost of my Anthroscape post, part 1

As some of you may know I'm a huge fan of the MDLP K23b calculator. It has a lot more ancestry components than many other GEDmatch calculators, which gives it higher resolution. But what makes MDLP K23b really useful is its high number of reference populations (620). Having multiple reference populations for larger ethnic groups is useful for comparing different subgroups within an ethnicity. I don't know of any other calculators that have different Han Chinese reference populations for different language groups or countries like MDLP K23b does. So as an East Asian and ethnic Chinese who's interested in population genetics, this tool has been very useful in learning about how subgroups of different Asian ethnicities differ from each other genetically.

The MDLP K23b calculator has 23 different reference populations, 21 of which are shown in this graph I found on Anthrogenica. (Some have slightly different names; the ones that aren't shown are Archaic_African and Archaic_Human).

Using this
[Image: BmZRlTQl.jpg]

and this
[Image: ywwKBS2l.jpg]
I was able to create these global PCA plots of the MDLP K23b ancestry components and reference populations.
[Image: k58pS5xl.png]

This global PCA shows where the East Asian and South Asian reference populations (green) lie on this PCA plot. SE Asian populations are shifted to the left and slightly down compared to East Asians, while Siberian populations are more to the right and also slightly down. The South Asian populations are shown as a cline between "South_Central"Asian/ANI" and "South_Indian/ASI", even though South Asians typically have significant amounts of "EEF", "Caucasian", and "Ancestral_Altaic/ANE" on MDLP K23b as well.
JMcB and Mulay 'Abdullah like this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#7
Repost of my Anthrogenica repost of my Anthroscape post(s), part 2


Description of East Eurasian ancestry components:
• "Arctic": indigenous peoples of the Arctic. Peaks in Eskimo/Inuit (~90%+) and Chukchi (~60%) in Asia
• "Paleo_Siberian": northeast Russian Far East. Peaks in Itelmen, Koryak (~90%), and Chukchi (~40%)
• "East_Siberian": general native Siberian. Peaks in Nganassan (~97%) and Nenets (60-70%)
• "Tungus_Altaic": general "NE Asian/CJK". Peaks in Ulchi (62%) and Daur + Hezhen (~50%)
• "South_East_Asia": general Continental East Asia + Tibeto-Burman. Peaks in Naga (~70%), Naxi + Yi, Lahu, and Tibetans (~63%)
• "Austronesian": general Southern Mongoloid. Peaks in [a href="https://i.imgur.com/QUADHzr.png"]Han_ and Han_North are from the HGDP dataset[/a]Taiwanese aborigines and in Igorot (~100%)

"Australoid": Peaks in the Papuan highlands. Non-Papuan Eastern Indonesians and Island SE Asian Negritos generally score 10-20%.

"Melano-Polynesian": Peaks in the Melanesian islands east of Papua. Non-Papuan Eastern Indonesians and Island SE Asian Negritos generally score ~10%.
Australian aborigines seem to be a mix of both but slightly more Australoid (~50% Australoid vs ~40% Melano_Polynesian); SE Asians are generally more "Australoid" than "Melano-Polynesian"

"Amerindian": Native American. Peaks in Asia among Selkup and certain Central Asian groups (~2%).

"South_Indian": AASI, Onge (~55%), and "East Veddoid". Negritos score 15-20%.

For the purposes of this thread and in my spreadsheet I defined "East Eurasian" as Paleo_Siberian, East_Siberian, Tungus_Altaic, South_East_Asia, and Austronesian. Arctic and Amerindian are technically still "Mongoloid" in the genetic sense but I wanted to focus on East Asia-specific ancestry components.


One of the first things I noticed when I started examining my own GEDmatch Genesis results in 2018, and those of my GEDmatch One-to-Many DNA Relatives- was that the Chinese results (and the Han Chinese reference populations) all tended to score similar levels of "South_East_Asia" in MDLP K23b, with most of the variance being in "Tungus_Altaic" and "Austronesian".

I noticed that Han Chinese have a T_A/Austronesian cline; this basically functions as a North-South cline since almost all Han Chinese score between 45-50% S_EAsian, and the three components T_A, AN, and S_EAsian comprise 95%+ of most individuals’ ancestry in MDLP K23b. I classified samples with more T_A than Austronesian as "Northern" and vicer versa for "Southern" based on how the Han Chinese reference populations scored. I'm pretty sure [a href="https://i.imgur.com/QUADHzr.png"]Han_ and Han_North are from the HGDP dataset[/a], and I think the other South Chinese ones are from [a href="https://i.imgur.com/7bYxW0c.png"]the HUGO Pan-Asia SNP Consortium study[/a].

* Han_North (HGDP): 52.17% S_EA, 15.43% AN, 29.95 T_A, 98.39% East Eurasian
* Han-Mandarin: 43.51% S_EA, 23.73% AN, 25.29% T_A, 94.59% East Eurasian
* Hakka (HUGO TW-HB?): 45.23% S_EA, 32.66% AN, 19.67% T_A, 98.89% East Eurasian
* Chinese_Taiwan (HUGO TW-HA?): 44.18% S_EA, 33.42% AN, 19.59% T_A, 98.6% East Eurasian
* Han_Singapore (HUGO SG-CN?): 45.54% S_EA, 33.14% AN, 17.67% T_A, 97.93% East Eurasian
* Han_ (HGDP): 50.37% S_EA, 32.72% AN, 16.41% T_A, 99.82% East Eurasian
* Cantonese (HUGO CN-GA?): 45.37% S_EA, 38.27% AN, 13.23% T_A, 97.87% East Eurasian


Out of curiosity, I decided to collect Chinese and other East Asian MDLP K23b results from the GEDmatch One-to-Many database to see how they compared to my own. Won't elaborate on my methodology, but I think a lot of the samples I collected have been removed from public view from now.

At first glance, there seemed to be a "break" in the Chinese North-South cline, around where there were equal amounts of "Tungus_Altaic" and "Austronesian". I assumed the cluster with more Austronesian broadly represented Han Chinese from south of the Qinling Mountains and Lower Yangtze River, while the cluster with more Tungus_Altaic broadly repsented Han Chinese from north of the Qinling Mountains and the Lower Yangtze.
[Image: lUQxR2nl.png]

Upon further observation, I noticed that actual Chinese samples didn't completely overlap with the reference populations. For example, the "Northern China" like cluster was more Tungus_Altaic and less South_East_Asia than the Han_North and Han-Mandarin reference populations. When I tried splitting my early dataset into 4 smaller clusters, I got:

1) a "northern" cluster with more T_A than AN closer to Han_North,
2) a "central" cluster with roughly equal amounts of T_A and AN positioned closer to Han-Mandarin,
3) a "south-central" cluster that extended from 2 to the Taiwanese/Hakka/Singapore reference populations,
4) a "farther south" cluster that extended from the Taiwanese/Hakka/Singapore reference populations to the Cantonese reference population
[Image: aosEKk6l.png]

For further analysis/ease of displaying information, I created a "North-South" scale for East Asian ancestry, based on the relative proportion of "Tungus_Altaic", "South_East_Asian", and "Austronesian" within the East Asian ancestry component. I did this so it'd be easier to compare different Chinese subgroups to each other, and Chinese from other East/Southeast/Central Asians. It was originally scaled in such a way that a Han Chinese person who was 100% "East Asian" on MDLP K23b and scored equal amounts of "T_A" and "AN" would score 0.5 on my scale. Later I added "East_Siberian" and "Paleo_Siberian" to my scale, weighted more "negatively" than "Tungus_Altaic".

Below is a chart of the "North-South" scale, versus the "East Asian" (T_A + S_EA + AN + East_Sib + Paleo_Sib) percentage- for the earlier version of my Han Chinese GEDmatch sample dataset. The "online" samples are from Chinese (PRC national) netizens who shared their results on the WeGene forums. Considerably more northern-shifted than the GEDmatch samples, and probably more representative of the Han Chinese in Mainland China.
[Image: F1fql7Il.png]
JMcB and Mulay 'Abdullah like this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#8
Repost of my Anthrogenica repost of my Anthroscape post(s), part 3.1


Posting my full East Eurasian dataset from Sept 2018 now. All MDLP K23b data samples were obtained through the One-to-Many DNA Comparison tool on GEDmatch Genesis during 2018. I verified the ethnicity/nationality of each sample by checking their listed name, email address (incl. foreign country email domains), and cross-checking their calculator results with the reference populations.



I can't guarantee that all of the gathered samples are of the ethnic background I assigned them as, since I didn't contact any of the sample providers directly. I also can't guarantee that these are all unrelated individuals, even though One-to-Many lets you see if any matches are close relatives.


GEDmatch One-to-Many Sample count:
7 Tibet/Nepal/Bhutan
4 (South) Chinese/Japanese mix
13 Other Mixed East Asian
16 Thailand/Laos/Myanmar/Hmong
14 Cambodia
22 Nusantara (*probably mostly native Indonesians, but could also be MY, SG, BN, Moros, or Thai Malays)
25 Overseas SEA Chinese (*one person is LatinAm mixed)
31 Japanese
52 Korean
94 Northern Chinese (assigned based on MDLP K23b results, not known ancestry)
160 Southern Chinese (assigned based on MDLP K23b results, not known ancestry)
80 Vietnamese
68 Filipino
@redwine, @shazouzu and @jortita (found them on GEDmatch)
12 Euro + EastAsian mixes
Various non-East Asian samples for the global PCA plots (incl. 6 Latin American mestizos and 10 Central Asians provided by @tsakhur)


General comments on the GEDmatch samples:
  1. It was harder to find non-mixed Japanese samples. Many have ASI/Papuan/Melanesian-like "noise" that may reflect Jomon ancestry, which is why there are a lot of "outliers" on the PCA plot.
  2. Koreans are the most homogeneous and have the least non-East Eurasian admixture. They form a tight cluster on the PCA close that partially overlaps with the Japanese but not with the Northern Han.
  3. The Northern Han (labeled as "N_Chinese") have more non-East Eurasian admixture than the Southern Han; many have noticeable East_Siberian, Arctic, and West Eurasian admixture. This is shown on the PCA plots as a downward blue trail from the main CJK line.
  4. The Southern Han are labeled as "Chinese" because most of the Chinese samples I found on GEDmatch were Southern Han and the distribution of Northern Han on the PCA plot stood out to me. Aside from South Chinese from Guangdong and Fujian, they don't have any ASI-like admixture and are equally Papuan/Melanesian-admixed as the Northern Han.
    --> Mainlander outliers aside, Taiwanese generally cluster with the Minnan and Hakka reference populations and HK/Canto generally cluster with the Cantonese and Hakka reference populations. Taiwanese tend to be Austronesian shifted but are also less divergent from the main Chinese cluster on the PCA, while HK/Canto tend to be more non-East Eurasian admixed and Viet/SE Asian shifted.
  5. @redwine was right about the Viet reference populations being off on Tungus_Altaic and Austronesian; they're accurate in terms of predicting where Kinh Vietnamese typically plot on the global PCA though. Non-Hoa Viets are shifted away from the South Chinese on the PCA due to being 1-3% South_Indian, but aside from the Hmong and the more "pure Austronesian" Filipinos they're the most autosomally "CJK East Asian" SE Asians.
  6. @tsakhur and @sage75 are both right about Filipinos' autosomal DNA. Filipinos are more Australoid and Melano_Polynesian admixed than Mainland SE Asians but most are at least 90% East Eurasian. The ones with native-sounding names tend to score higher on Austronesian; some score over 60%. I'm guessing the Filipinos scoring more than 10% Tungus_Altaic are Chinese mixed. Many are Hispanic/LatinAm mixed and can score 3-5% European/Amerindian/SSA.
  7. Cambodians and Nusantarans are further away from the "CJK East Asian" cluster on the PCA than Filipinos. Autosomally Indonesians seem to be an hybrid of Filipinos and Cambodians, albeit slightly shifted towards Papuans and Melanesians.
  8. Most Thais, Laotians, Cambodians, Burmese, and West Malaysians/Indonesians score 10-15% "South_Indian"/ASI, which is a significant amount of non-"Mongoloid" admixture. Even Vietnamese and Filipinos tend to score 2-3% "South_Indian" (Filipinos also typically score 2-5% "Australoid" + "Melano_Polynesian"). The only East Asians who commonly scored more than ~1% on any of those components are a few of the Japanese samples, which likely reflects the calculator picking up on Jomon-derived SNPs that are exclusive to Japanese.
All of these differences shift "Mongoloid" SE Asians farther apart from CJK East Asians than the north-south differences in East Eurasian ancestry among CJK East Asians, because the East Eurasian ancestry components are all very close to each other on the global PCA, whereas "South_Indian" and "Australoid" + "Melano_Polynesian" are quite a bit farther away.


254 "Han Chinese" GEDmatch samples, with data points of known regional ancestry singled out. T_A vs AN plot. This gave me some idea of what the regional structure of Han Chinese population genetics was. The range and position of the "Guangdong" samples was consistent with where the "Chinese_Taiwan" TW-HA, "Hakka" TW-HB, "Cantonese" CN-GZ, and Chinese Singaporean reference populations plot. All of the northern Chinese samples cluster around the "northernmost" end of the Han Chinese range. (The light green one is of a Chinese immigrant to the US who is of Fuzhounese ancestry, while the red one is from southern Shaanxi province, south of the Qinling mountain range. )

  [Image: YJ5Fh0z.png]

Another version of the above, with ~32 samples of suspected Cantonese speakers labeled "HK/Canto" and ~27 samples suspected to be Taiwanese or ROC-affiliated labeled "TW [ROC]". The "HK/Canto" samples generally fell between the "Cantonese (CN-GZ)" and "Hakka (TW-HB )" ref pops, while the "TW [ROC]" samples that weren't obviously of waishengren ancestry tended to be slightly more T_A shifted than the "Chinese_Taiwan (TW-HA)" ref pop.
[Image: pgf6sEW.png]

Speculative regional clusters for Han Chinese (I made this in late 2018, this was guesswork)
[Image: wmiH4rR.png]


CJKVP East Asia- T_A vs Austronesian graph
[Image: d4B4OSF.png]

CJKVP East Asia- % East Eurasian vs N-S cline graph
[Image: nknDDTn.png]

Means and Medians- T_A vs Austronesian graph
[Image: lcIt3cR.png]

Means and Medians- % East Eurasian vs N-S cline graph
[Image: 8jr4Ni6.png]

East Asians vs Jomon vs Hoabinhian- it was interesting to see how Jomon and Hoabinhian scored on this calculator. This doesn't literally mean Jomon is ~70% "East Asian" or Hoabinhian is 30% "East Asian". Rather, Jomon and Hoabinhian share a certain amount of drift with Proto-East Asian that is distinct from AASI or Papuan proper. Although Jomon seems to share more drift, since it scores higher on MDLP K23b "East Asian".
[Image: dQ8PjWa.png]
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#9
Repost of my Anthrogenica repost of my Anthroscape post(s), part 3.2

I also made some plots of what these GEDmatch samples would look like on a global PCA for MDLP K23b


East Eurasian samples in reference to the East Asian, "Australoid" and "Melano-Polynesian" (left), "Arctic" (@ 11.00, 5.00), and "Amerindian" (@9.60, 3.60) ancestry components)
[Image: u69QxrS.png]

East Eurasian samples, zoomed in so you can see the different ethnic groups more clearly
[Image: JQssrvs.png]

East Eurasian samples, density plot
[Image: fQUjgu7.png]

same as above but with "TW [ROC]" and "HK/Canto" Chinese samples singled out
[Image: HRP5fU8.png]

All East Eurasian samples, with smaller groups singled out
[Image: 61TKLIC.png]

All East Eurasian samples, with SEA Chinese individuals singled out
[Image: KJRiFdI.png]

All East Eurasian samples, with obviously mixed-ethnic Asians singled out (labels are guesses to their ancestry based on their GEDmatch Oracle results)
[Image: ucN5DAk.png]



Global PCA of North Asian reference populations
[Image: ZsC8g1R.png]

Global PCA of North Asian reference populations, with GEDmatch samples included
[Image: ctHQrYJ.png]
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#10
Repost of my Anthrogenica repost of my Anthroscape post(s), part 3.3


Global MDLP K23b plots of GEDmatch samples. All were made sometime in 2018, maybe early 2019

Global PCA plot of East Eurasian GEDmatch samples, including 12 Central Asians (Kazakh, Altaian, Mongol, Tuva, Kyrgyz), and 6 mostly Amerindian Latin American mestizos. The Central Asian and Latin American samples were courtesy of @tsakhur [a.k.a. Kheshigten and u/Xamzarqan]
[Image: RWGfSz4.png]

Global PCA plot of GEDmatch samples of Euro to Asia ancestry
[Image: YrFbaHV.png]

Global PCA plot with global GEDmatch samples
[Image: QTRctxT.png]


Feel free to message me if you want access to the private GEDmatch MDLP K23b results I used for these graphs!
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#11
Repost of my Anthrogenica repost of my Anthroscape post(s), part 4?

(10-30-2023, 08:07 PM)okarinaofsteiner Wrote: Repost of my Anthrogenica repost of my Anthroscape post(s), part 3.1


Posting my full East Eurasian dataset from Sept 2018 now. All MDLP K23b data samples were obtained through the One-to-Many DNA Comparison tool on GEDmatch Genesis during 2018. I verified the ethnicity/nationality of each sample by checking their listed name, email address (incl. foreign country email domains), and cross-checking their calculator results with the reference populations.



I can't guarantee that all of the gathered samples are of the ethnic background I assigned them as, since I didn't contact any of the sample providers directly. I also can't guarantee that these are all unrelated individuals, even though One-to-Many lets you see if any matches are close relatives.


GEDmatch One-to-Many Sample count:
7 Tibet/Nepal/Bhutan
4 (South) Chinese/Japanese mix
13 Other Mixed East Asian
16 Thailand/Laos/Myanmar/Hmong
14 Cambodia
22 Nusantara (*probably mostly native Indonesians, but could also be MY, SG, BN, Moros, or Thai Malays)
25 Overseas SEA Chinese (*one person is LatinAm mixed)
31 Japanese
52 Korean
94 Northern Chinese (assigned based on MDLP K23b results, not known ancestry)
160 Southern Chinese (assigned based on MDLP K23b results, not known ancestry)
80 Vietnamese
68 Filipino
@redwine, @shazouzu and @jortita (found them on GEDmatch)
12 Euro + EastAsian mixes
Various non-East Asian samples for the global PCA plots (incl. 6 Latin American mestizos and 10 Central Asians provided by @tsakhur)

GEDmatch One-to-Many Sample count:
Cambodia (n = 14)
Nusantara (n = 22)
Japanese (n = 31)
Korean (n = 52)
Northern Chinese (n=94) (assigned based on MDLP K23b results, not known ancestry)
Southern Chinese (n = 160) (assigned based on MDLP K23b results, not known ancestry)
Vietnamese (n = 80)
Filipino (n = 68)

Means and Medians for my East Asian population samples | N-S cline vs % East Eurasian. I think the grey dots are "normalized" medians, calculated from the aggregate results (sum of all the ancestry components), whereas the black dots are calculated from finding the median of each ancestry component.
[Image: 8jr4Ni6.png]

Means and Medians for my East Asian population samples | Global PCA cline. Means are shifted away from the East Eurasian ancestry components because of outliers with more "non-East Eurasian" noise and/or more Hoabinhian/Papuan/South Asian-like ancestry.
[Image: hQAsXT0.png]


Japanese mean- 42.12% T_A, 35.13% S_EA, 19.51% AN, 0.84% Paleo_Sib, 0.60% E_Sib, 0.46% Amerind, 0.42% S_Indian. 98.20% East Eurasian, 0.3702
Japanese median- 42.07% T_A, 35.09% S_EA, 19.47% AN, 0.72% Paleo_Sib, 0.14% E_Sib, 0.12% Amerind, 0% S_Indian. 98.24% East Eurasian, 0.3687

Korean mean- 40.73% S_EA, 38.24% T_A, 18.49% AN, 1.68% E_Sib, 0.21% Paleo_Sib, 0.19% Amerind. 99.36% East Eurasian, 0.3816
Korean median- 40.87% S_EA, 38.21% T_A, 18.35% AN, 1.67% E_Sib, 0% Paleo_Sib, 0% Amerind. 99.80% East Eurasian, 0.3798

"N"_Chinese mean- 46.39% S_EA, 29.94% T_A, 21.22% AN, 0.91% E_Sib, 0.18% Paleo_Sib, 0.21% Amerind. 98.62% East Eurasian, 0.4448
"N"_Chinese median- 46.25% S_EA, 29.95% T_A, 21.06% AN, 0.36% E_Sib, 0% Paleo_Sib, 0% Amerind. 99.31% East Eurasian, 0.4456

"S"_Chinese mean- 46.99% S_EA, 33.15% AN, 18.85% T_A, 0.91% E_Sib, 0.10% Paleo_Sib, 0.21% Amerind. 99.25% East Eurasian, 0.5694
"S"_Chinese median- 46.93% S_EA, 32.59% AN, 19.51% T_A, 0% E_Sib, 0% Paleo_Sib, 0.21% Amerind. 99.39% East Eurasian, 0.5607

TW [ROC] mean- 46.71% S_EA, 31.13% AN, 21.18% T_A, 0.08% S_Ind, 0.11% Aus, 0.17% E_Sib, 0.14% Paleo_Sib, 0.07% Amerind. 99.34% East Eurasian, 0.5485
TW [ROC] median- 47.06% S_EA, 32.17% AN, 20.37% T_A, 0% S_Ind, 0% Aus, 0% E_Sib, 0% Paleo_Sib, 0% Amerind. 99.69% East Eurasian, 0.5583

HK/Canto mean- 46.50% S_EA, 33.62% AN, 18.47% T_A, 0.14% S_Ind, 0.11% Aus, 0.23% E_Sib, 0.09% Paleo_Sib, 0.17% Amerind. 98.92% East Eurasian, 0.5733
HK/Canto median- 46.45% S_EA, 35.62% AN, 16.74% T_A, 0% S_Ind, 0% Aus, 0% E_Sib, 0% Paleo_Sib, 0% Amerind. 99.15% East Eurasian, 0.5983

Viet mean- 45.04% S_EA, 40.61% AN, 10.14% T_A, 2.05% S_Ind, 0.45% Aus, 0.17% Mel-Poly, 0.13% E_Sib, 0.22% Paleo_Sib, 0.17% Amerind. 96.14% East Eurasian, 0.6549
Viet median- 45.06% S_EA, 41.07% AN, 9.69% T_A, 1.93% S_Ind, 0.24% Aus, 0% Mel-Poly, 0% E_Sib, 0% Paleo_Sib, 0% Amerind. 96.45% East Eurasian, 0.6601

Filipino mean- 52.51% AN, 29.88% S_EA, 7.46% T_A, 2.96% S_Ind, 1.25% Aus, 2.44% Mel-Poly, 0.27% Amerind. 90.11% East Eurasian, 0.7471
Filipino median- 52.67% AN, 29.21% S_EA, 7.15% T_A, 2.78% S_Ind, 1.08% Aus, 2.28% Mel-Poly, 0% Amerind. 90.06% East Eurasian, 0.7544



For reference, this is how La386 (Hoabinhian sample from Laos) and the Onge reference populations score:

La368: 42.70% S_Ind, 8.40% Australoid, 4.91% Melano_Polynesian, 12.16% AN, 11.79% S_EA. 29.07% East Eurasian (~30% shared drift with "East Eurasian") and the N-S cline for the East Eurasian ancestry is 0.5760.

Onge: 54.36% S_Ind, 13.17% Australoid, 0.98% Melano_Polynesian, 13.57% AN, 13.21% S_EA. 28.55% East Eurasian (~30% shared drift with "East Eurasian") and the N-S cline for the East Eurasian ancestry is 0.6882.

[Image: dQ8PjWa.png]

It's worth pointing out that MDLP K23b models modern-day Onge and prehistoric Hoabinhian as being "part East Asian" and not just some combination of AASI-like "S_Indian", "Australoid", and "Melano_Polynesian". It means AASI is an imperfect proxy for the non-Papuan, non-Australian aborigine ancestry/genetic drift in this population.

Modern-day SE Asians generally score more "S_Ind" than "Australoid" + "Melano_Polynesian" combined, often by similar ratios as in La368 and Onge. This is probably more true for Vietnamese than other SE Asian groups which might have non-negligible amounts of actual South Asian/subcontinental ancestry.
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#12
Repost of my Anthrogenica repost of my Anthroscape post(s), part 5?

More Global PCA plots for your viewing pleasure...
[Image: RWGfSz4.png]

Global PCA of all of my original East Eurasian samples (CJKVP + Nusantara + other samples that aren't obviously part Euro), plotted with some Central Asian and Latin American GEDmatch samples provided by @tsakhur.
[Image: t4gef7Q.png]

Same as the above, but with La368 (Hoabinhian) and Ikawazu Jomon plotted for reference. Note how La368 (Hoabinhian) is close to a straight line between "South_Indian" and "Australoid" + "Melano_Polynesian", but slightly closer to the former than the latter. By contrast, the Jomon samples are close to the modern-day Cambodians.



At this point, my interest shifted towards trying to figure out what kind of regional/spatial structure there was within the Han Chinese results I had. This wasn't possible for my Japanese/Korean/Vietnamese/Filipino samples because I had fewer samples on hand, and no way to determine where their regional ancestry within their country(ies) of origin might be from. However, many of the Chinese samples I found were associated with certain cities or provinces- either because they had names like "Guangxi girl", or because their names used Taiwan/Hong Kong/non-Mainland China specific romanization schemes.
[Image: Zi8TKkQ.png]

This is a global PCA plot of some of the "Han Chinese" samples in my original dataset, plus some additional samples I stumbled upon some time later, in early 2019. I know the labels look a bit confusing and all over the place, but I wanted to see if there were any obvious regional patterns on the global PCA.

The labeled samples are individuals of known regional/provincial ancestry, and they didn't seem to fall into clear regional clusters due to having variable levels of noise in their MDLP K23b results. But based on how the "northern" Chinese, "Taiwanese [ROC]", and "HK/Canto" subsets scored, I was able to guess where I might expect Han Chinese from certain regions to plot on the global PCA relative to other East Asians.

1) Some northern Chinese (possibly inland?)- shifted left on PC1 and down on PC2 from the "main series". (9.80, 5.60) to (10.00, 5.85)
2) Many northern/eastern Chinese (coastal)- rightmost end on PC1 of the "main series", closest to the Korean samples. (10.00, 5.90) to (10.05, 5.95)
3) Northern shifted end of the "Taiwanese" cluster (probably south-central?)- right end of TW/ROC cluster [red circles]. (9.90, 5.90) to (9.96, 5.95)
4) Southern shifted end of the "Taiwanese" cluster (around Fujian and Guangdong??)- left end of TW/ROC cluster [red circles]. (9.85, 5.90)
5) Most southern-shifted portion of the "main series". Yue-speaking part of Guangdong? Most of the HK/Canto cluster [triangles]. (9.80, 5.85) to (9.90, 5.90)
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#13
Anthrogenica repost part 6?

[Image: Ei529Fd.png]


Expanded N-S cline vs % East Eurasian plot of almost all of my East Eurasian samples that I classified as not ethnically mixed + several additional GEDmatch kits that @Tomenable  asked me to look at in early 2019. This one includes the "Chinese" samples that I specified as "Taiwan [ROC]" and "HK/Canto" in clear red and yellow boxes, with additional "SEA Chinese" samples as clear maroon boxes. It also includes 3 Hmong samples and several Tibetan/Himalayan samples from my original dataset.

Tomenable's requested samples include:
32 or 33 Cambodians
1 Cambodian/Laotian
11 Nusantarans (mostly Indonesian)
10 Thai nationals
2 confirmed Laotians
6 additional Island SE Asians
1 Thai sample from HGDP

I labeled the country of origin clusters to give a better idea of where Laotians, Cambodians, native Indonesians, etc.- basically non-Vietnamese, non-Filipino SE Asians score relative to my Chinese, Japanese, Korean, Vietnamese, and Filipino samples.

We can see that:
  1. the Nusantara samples are still mostly in between the Filipino and Cambodian samples. Which checks out with our current knowledge of Indonesians and native Malaysians having both Austronesian and Austroasiatic ancestry.
  2. Burmese are more West Eurasian/Indian-shifted AND more NE Asian-shifted than other SE Asians
  3. Laotians are more distant from Han Chinese AND more Indian + Hoabinhian shifted than Vietnamese, but still more "pure East Eurasian" than most Thais who do not have significant Chinese ancestry.
  4. Thais seem to have a lot of variance, but generally fall in between Burmese, Khmers, Malays/Indonesians, Lao, Kinh (Vietnamese), and SEA Chinese (who have varying levels of admixture with native SE Asians.
  5. Hmong are similarly "southern/northern" and similarly "pure East Eurasian" as Guangdong Han- but have lower levels of "Austronesian" or "Tungus_Altaic".
  6. Tibetans and Himalayans are somewhat more NE Asian-shifted than Northern Han Chinese, but also have significantly more Onge/Hoabinhian-like ancestry. Although with Nepalese much of this "non-East Eurasian" ancestry is probably actual South Asian (steppe + Indus Valley + AASI) ancestry.
  7. Mainland SE Asians almost never score >0.75 on my N-S cline. 0.75 is where Dai falls on my cline (50% S_EA 50% AN in MDLP K23b), which is probably a good proxy for how "southern" the "pure East Eurasian" component in Austroasiatic is.
  8. Island SE Asians usually score >0.75 on my cline, which makes sense because they almost always score higher on "AN" than "S_EA" + "T_A". We can think of modern-day Island SE Asians as being on a cline between "Dai" (continental/Austroasiatic) and "Igorot" (insular/'pure' Austronesian).
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#14
Huang Xiufeng et al. 2020: "Genomic Insights into the Demographic History of Southern Chinese"; doi: 10.1101/2020.11.08.373225
[Image: XdI1FGw.png]
Fig 1A labeled- with special emphasis on the Han Chinese regional clusters. The position of "Thai" and "Vietnamese [Kinh]" relative to the Guangdong/Guangxi Han clusters is interesting. And the "Sichuan/Chongqing" cluster does seem pulled towards more Austroasiatic-like groups in ways that the (much more SEA-shifted) Fujian, Guangdong, and Guangxi-like clusters are not.

The edited PCA I posted to Anthrogenica that made its way on Wikipedia.


First-half-of-2021 context for posting this in my megathread-

Quote:Just saw this Razib Khan blog post- A new paper, Genomic insights into population history and biological adaptation in Oceania, is worth reading. I’m going to sidestep the new inference that Austronesian expansion may predate the movement out of Taiwan.

https://www.nature.com/articles/s41586-021-03236-5
Quote:We infer that the East Asian ancestors of Pacific populations may have diverged from Taiwanese Indigenous peoples before the Neolithic expansion, which is thought to have started from Taiwan around 5,000 years ago[sup]2,3,4[/sup]
https://www.pnas.org/content/118/13/e2026132118
Quote:Here, we report ∼2.3 million genotypes from 1,028 individuals representing 115 indigenous Philippine populations and genome-sequence data from two ∼8,000-y-old individuals from Liangdao in the Taiwan Strait... The ancestors of Cordillerans diverged from indigenous peoples of Taiwan at least ∼8,000 y ago, prior to the arrival of paddy field rice agriculture in the Philippines ∼2,500 y ago, where some of their descendants remain to be the least admixed East Asian groups carrying an ancestry shared by all Austronesian-speaking populations. These observations contradict an exclusive “out-of-Taiwan” model of farming–language–people dispersal within the last four millennia for the Philippines and Island Southeast Asia.
This actually explains why Amis, Atatyal, and Igorot score ~100% Austronesian in MDLP K23b, while "mainstream" lowland Filipinos like Ilocanos and Tagalogs usually score less than 60%. Most rice-farming Austronesian groups have additional "non-pure Austronesian" East Eurasian ancestry from the southward expansion of rice agriculture, which MDLP K23b probably models as "South_East_Asian" due to being distinct from the Cordilleran "pure Austronesian" ancestry of Igorot and Amis.

This raises questions of how lowland Plains Taiwanese aborigines would have scored on MDLP K23b. I'm guessing they would be less pure "Austronesian" and more "South_East_Asian" than Amis and Atayal, although I have no idea if they'd be a good proxy for the donor population that introduced "non-pure Austronesian" ancestry to modern-day Filipinos.

Quote:Hmm, thanks for linking this. So would they be closer to Liangdao-type population?

And if so which ancient population would be closest to Yangtze river rice farmers? Would they be more northern than Austronesians?

Quote:I have absolutely no idea about the Yangtze rice farmers (I’m guessing they would be somewhere below the Austronesians), but it looks like Neolithic Fujian clusters around where modern-day Tagalogs do (who are more NEA-shifted than Visayans and Ilocanos), while Neolithic Coastal Fujian + Taiwan cluster with Visayan and Dusun/Murut? In any case Neolithic Fujian/Taiwan was clearly Austronesian-related.
Mulay 'Abdullah likes this post
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#15
(10-01-2023, 06:16 AM)okarinaofsteiner Wrote: Early 2019 estimates for MDLP K23b based on the samples I had encountered (whether in my private dataset or not). This was before I compiled a separate dataset of Chinese adoptees of known regional ancestry (based on orphanage location):

2019_05_27 DNAConnect.org Chinese adoptees (n = 356)

During the first half of 2019, I looked through the DNAConnect.org Chinese adoptee results on GEDmatch, which were labeled according to regional origin (which orphanage they were adopted/registered from) on a separate website when I did this project. The regional distribution is similar to what's shown in this map, in that they're almost all from the southern half of China and mostly from areas further away from the Tier 1 cities. I'm pretty sure my DNAConnect.org dataset contains more samples.

The Guangdong samples are mostly from the westernmost part of the province, while the Chongqing samples were almost all from the area furthest away from the actual city, near the Hubei and Hunan borders.
[Image: 9o52t9zaitf21.png]

I figured adoptees would be a better proxy for regional ancestry patterns, since they're probably more representative of the general (Han) Chinese population for their regions of origin. The regions in the charts and graphs I posted on Anthrogenica are as follows:

"North China" (6) = Henan (4) + Shaanxi (2)
"East China" (27) = Anhui (13) + Jiangsu (8) + Zhejiang (6)
"Central China 1" (192) = Jiangxi (136) + Hunan (51) + Hubei (5)
"Central China 2" (39) = Chongqing (28) + Guizhou (11)
"South China" (92) = Guangdong (81) + Guangxi (6) + Fujian (5)

Graph of MDLP K23b Austronesian vs Tungus_Altaic for Guangxi (6), Guangdong (81), Hunan (51), Jiangxi (136), Anhui (13), and rural Chongqing (28) adoptees
[Image: jlDweOf.png]

Graph of MDLP K23b Austronesian vs Tungus_Altaic for the other provinces with fewer adoptees
[Image: heU8O7s.png]

Graph of MDLP K23b Austronesian vs Tungus_Altaic for Guangxi (6), Guangdong (81), Hunan (51), Jiangxi (136), Anhui (13), and rural Chongqing (28) adoptees- with means
[Image: SekE2tW.png]


Global PCA with samples sorted by region, compared with the Japanese, Korean, Vietnamese, and Filipino samples from my original GEDmatch dataset. This suggests that many Northern Han are further away from Koreans and Japanese than Yangtze Delta Han due to having more Central Asian admixture. (Northern Han actually seem to be halfway between Vietnamese and Japanese on the Global PCA lol)

This also puts the north-south variation among Han Chinese in the broader perspective of differences across East/Southeast Asian ethnic groups on the global PCA.
(zoomed in version)
[Image: erVJJ5f.png]

(zoomed out version)
[Image: Ff1UTgG.png]

It's remarkable to me how sharp the boundary between the most "NE Asian"-shifted Chinese samples and the Korean portion of the Korean-Japanese cluster is, even though they're fairly close together.
By contrast, the most "SE Asian"-shifted Chinese samples partially overlap with the most "NE Asian"-shifted Vietnamese samples^, even though the main South Chinese (black triangles) and Vietnamese (light blue circles) clusters are much further apart on the PCA.
^ I'm not counting the "Vietnamese" samples that fall within the South Chinese cluster- those are almost certainly Hoa (Chinese Vietnamese) of Teochew and/or Cantonese speaking heritage.
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)