Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Check for new replies
Huang 2022 style ancestry calculator
#1
Huang et al, 2022 (Genomic Insights Into the Demographic History of the Southern Chinese) developed an N=10 ADMIXTURE model for ~1000 subjects drawn from different regions and language groups in China as well as East and Southeast Asia.  Unfortunately, the component allele frequencies were not available from the paper so it is not possible to use his model to estimate component frequencies for new data.

I have reverse engineered those component allele frequencies and write some quick and dirty R code to calculate component weights for 23andMe style data from WGSExtract.  I discarded the Sub-Saharan component since it had negligible weights in his model.  it seems to recover very well values he reported for various test subjects in his paper.  Evidently, given the training set, I think it is only suitable for data from subjects in East Asia and SE Asia that are unlikely to have ancestry components external to it.

Components are:-
  • Andaman-related
  • Northeast Asia
  • Hmong-Mien
  • West Eurasian
  • Kra-Dai
  • Austroasiatic
  • Sino-Tibetan
  • Austronesian

I have run my own data resulting in:-

    K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan.  K8..Papuan. K9..Austronesian.
[1,]        0.003089515          0.01125185      0.03041038      6.115006e-07    0.610802          0.1074186          0.174036 6.115006e-07        0.06299041


That would make me similar to Han_Guangxi which is what might be expected of a Cantonese descended from ancestors in West Guangdong.

Another dataset from a SE Asian person of Hokkien descent from Quanzhou and Xiamen:-
    K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,]        5.232493e-07          0.0749497      0.02121188      5.232493e-07    0.522726        0.05012311        0.2684863 0.001340675        0.06116127


This seems more similar to Han_Taiwan rather than Han_Fujian in Huang's dataset.  But aren't many of Taiwanese of Fujian descent anyway?

Perhaps we could make a collection of data of this type like with the thread with GEDMatch averages?

Download link is here.  If your data is does not work on it and you are willing to send it to me, I should be able to modify the parser to handle it and provide you a result.
okarinaofsteiner likes this post
Reply
#2
I made an unrooted NJ tree relating the different components as a sanity check:-
   
okarinaofsteiner likes this post
Reply
#3
(02-14-2024, 10:31 PM)ronin92 Wrote: This seems more similar to Han_Taiwan rather than Han_Fujian in Huang's dataset.  But aren't many of Taiwanese of Fujian descent anyway?

Perhaps we could make a collection of data of this type like with the thread with GEDMatch averages?

I'm not familiar with the Huang2022 dataset, but I'm guessing the provincial averages are from the G25 dataset, and that the Han_Fujian sample is mostly from around Fuzhou the capital, which speaks a different topolect than the Xiamen area and seems noticeably more Northern Chinese and NE Asian-shifted.

Nice to see you on here ronin92!
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#4
Quote: run22Analyse23Me("<redacted>.txt")
K1.Andaman.related. K2..Northeast_Asia. K3..Hmong.Mien. K4..West_Eurasian. K5..Kra.Dai. K6..Austroasiatic. K7..Sino.Tibetan. K8..Papuan. K9..Austronesian.
[1,] 4.630026e-07 0.1229811 4.630026e-07 0.005590859 0.4625396 4.630026e-07 0.3616683 0.01743673 0.02978207

12.3% "NE Asian", 0.0% "Hmong-Mien", 0.6% "West Eurasian", 46.3% "Kra-Dai", 0.0% "Austroasiatic", 36.2% "Sino-Tibetan", 1.7% "Papuan", and 3.0% "Austronesian".

[Image: LAjZ5ZP.png]
Assuming there's a 1 on 1 correspondence between the ancestry calculator components and the different components shown in this chart, I seem to have Han_Shandong-levels of "NE Asian" (orange) and Han_Zhejiang levels of "Kra-Dai" (green).
anti-racist on here for kicks and giggles

“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
Reply
#5
I've fixed a missing file issue and reworked the software to be easier to install.

The link is here.
okarinaofsteiner likes this post
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)