Login

crashdoc · 04-15-2024, 01:10 PM

Before answering Kale's thread https://genarchivist.com/showthread.php?tid=171 I wanted to have something concrete and then it became bigger than I had planned.

So I am posting this thread instead to share my results. There will be many posts to explain it all and many associated qpgraph results. This first post will be about generalities.

I have tried many options in admixtools1 and 2 and comparing the results with the basic "use only common snps to all".

-The best picture is of course obtained using only common snps to all populations, as long as there are enough (400k and above seems good, under 300k it gets distorted, although it gradually starts somewhere in the 300s).

-The second best is admixtools1 with "useallsnps: YES"

-And last is admixtools2 with "maxmiss = 1, afprod = TRUE" but you lose details (smaller affinities between populations).

I also tried imputed samples to get more common snps and see how that would behave. I was pleasantly surprised by how the results were similar to when I am not using them, as long as I use only good coverage individuals that are not from africa.

To get more common snps I also combined different pulldown strategies for the same individuals. At first I thought that would diminish the bias that we see with SG/DG vs Capture, but the SG/DG bias was kept almost as strong.

About that bias, from my tests it seems to affect residual f3 stats Z-scores from qpgraph by about 0.5. Other "noise" like chance common mutations or damage, etc. seems to be less than that. So I was able to be more precise than usual taking all of that into account (albeit with more admixture events).

When running qpgraph, whenever there was an ancestry unnacounted for or bad positionning in the tree, the problematic residual Z-scores were above |1|. Consequently, as a rule I did not keep graph with f3 residual z-scores above |1|. Of course 0 edges are also bad, as it means that one of the downstream branch is more related to an above branch than to its companion under that 0-edge branch.

Because of the number of admixture events and populations, I used admixtools2 and in qpgraph2 a zero edge can be 0 but can also be x e-19, e-20 etc. I have not seen e-9 to e-18 so I used e-8 as a cut-off, however the lowest results that ended up in my final graphs are e-6 (which look like e-3 once applied the usual x1000 and as you'll see in my graphs: 0.01 once rounded)

To end this post, here is the relevant part of my .ind file:

Code:
NE20    AR33k

BB7-240    BachoKiro_IUP_udg

CC7-2289    BachoKiro_IUP_udg

CC7-335    BachoKiro_IUP_udg

BK-1653    BK1653

Tianyuan    Tianyuan

MN2001.A0101    EHG_OLD

MN2002.A0101    EHG_OLD

MN2003.A0101    EHG_OLD

MNN005.A0101    EHG_OLD

MNN006.A0101    EHG_OLD

MNN007.A0101    EHG_OLD

PES001    EHG_OLD

FRL006.A0101    Goyet_Fournol

GoyetQ116-1    Goyet_Fournol

Ostuni1_d    Gravettian_ITA

PA12    Gravettian_ITA

AH1.SG    Iran_N

AH2.SG    Iran_N

AH4.SG    Iran_N

I1290    Iran_N

I1944    Iran_N

I1945    Iran_N

I1947    Iran_N

I1949    Iran_N

I1951    Iran_N

I1954    Iran_N

I7527    Iran_N

WC1_noUDG.SG    Iran_N

AC16    Italy_WHG

I2158_v2    Italy_WHG

R11.SG    Italy_WHG

R15.SG    Italy_WHG

R7.SG    Italy_WHG

STO001.A0101    Italy_WHG

UZZ5054    Italy_WHG

JpFu1.SG    Jomon

JpKa6904.SG    Jomon

JpOd181.SG    Jomon

JpOd274.SG    Jomon

JpOd282.SG    Jomon

JpOd6.SG    Jomon

Kostenki14    Kostenki14

Kostenki14.SG    Kostenki14

NEO283    KotiasKide25K

NEO283_imp    KotiasKide25K

S2949    KotiasKide25K

MA1_imp    MA1

MA1.SG    MA1

B_Mbuti-4.DG    Mbuti

HGDP00449.DG    Mbuti

HGDP00462.DG    Mbuti

HGDP00463.DG    Mbuti

HGDP00467.DG    Mbuti

HGDP00474.DG    Mbuti

HGDP00476.DG    Mbuti

HGDP00478.DG    Mbuti

HGDP00982.DG    Mbuti

HGDP00984.DG    Mbuti

HGDP01081.DG    Mbuti

S_Mbuti-1.DG    Mbuti

S_Mbuti-2.DG    Mbuti

S_Mbuti-3.DG    Mbuti

I5950    Mota

I5950.DG    Mota

I5950.SG    Mota

PM1.trim    Muierii1

Andaman_noUDG.SG    Onge_Jarawa

mondal_JAR-27.SG    Onge_Jarawa

mondal_JAR-32.SG    Onge_Jarawa

mondal_JAR-54.SG    Onge_Jarawa

mondal_JAR-61.SG    Onge_Jarawa

mondal_ONG-1.SG    Onge_Jarawa

mondal_ONG-12.SG    Onge_Jarawa

mondal_ONG-14.SG    Onge_Jarawa

mondal_ONG-4.SG    Onge_Jarawa

mondal_ONG-8.SG    Onge_Jarawa

mondal_ONG-9.SG    Onge_Jarawa

B_Papuan-15.DG    Papuan

HGDP00540.DG    Papuan

HGDP00541.DG    Papuan

HGDP00543.DG    Papuan

HGDP00545.DG    Papuan

HGDP00546.DG    Papuan

HGDP00547.DG    Papuan

HGDP00548.DG    Papuan

HGDP00549.DG    Papuan

HGDP00550.DG    Papuan

HGDP00552.DG    Papuan

HGDP00553.DG    Papuan

HGDP00554.DG    Papuan

HGDP00555.DG    Papuan

HGDP00556.DG    Papuan

S_Papuan-1.DG    Papuan

S_Papuan-10.DG    Papuan

S_Papuan-11.DG    Papuan

S_Papuan-12.DG    Papuan

S_Papuan-13.DG    Papuan

S_Papuan-14.DG    Papuan

S_Papuan-2.DG    Papuan

S_Papuan-3.DG    Papuan

S_Papuan-4.DG    Papuan

S_Papuan-5.DG    Papuan

S_Papuan-6.DG    Papuan

S_Papuan-7.DG    Papuan

S_Papuan-8.DG    Papuan

IL2.SG    Peru_RioUncallane_1800BP

IL3.SG    Peru_RioUncallane_1800BP

IL5.SG    Peru_RioUncallane_1800BP

IL7.SG    Peru_RioUncallane_1800BP

ZBC_IPB001.B-C0101_Luk2-Pinarbasi    Pinarbasi

SATP_imp    SATP

SATP.SG    SATP

I10871    ShumLaka

I10871_noUDG.DG    ShumLaka

I10871_noUDG.SG    ShumLaka

I10873    ShumLaka

I10873.SG    ShumLaka

Sunghir1.SG    Sunghir

Sunghir2.SG    Sunghir

Sunghir3.SG    Sunghir

Sunghir4.SG    Sunghir

TAF009    Taforalt

TAF010    Taforalt

TAF011    Taforalt

TAF013    Taforalt

TAF014    Taforalt

TAF015    Taforalt

UstIshim_snpAD.DG    Ust_Ishim

DLV005.A0101    Vestonice

DLV006.A0101    Vestonice

Vestonice13_d    Vestonice

Vestonice16    Vestonice

Yana_old.SG    Yana

Yana_old2.SG    Yana

zku002    ZlatyKun

ZlatyKun    ZlatyKun

crashdoc · (This post was last modified: 04-15-2024, 01:41 PM by crashdoc.)

BASE

With the following 10 included samples, the first branching after the OOA event is ZlatyKun, inluded in a "WAVE1" event.

Then, what I called preWestRussia (ancestor to the likes of Kostenki and Sunghir), followed by Ust-Ishim's main ancestry which probably followed the course of the Irtysh River.

The Coastal movement represented here by Onge_Jarawa (in later graphs Papuans will also be here) is the next to branch off, followed by Tianyuan, which I associate with the IUP of the Northern Route to Asia (from my results and archeology I believe the Northern vs Southern route debate is a wrong one, both dispersal routes were used, we can see it also in the blade industry in northern China vs the cobbles industry in southern China).

Next we have BachoKiro which is part of WAVE2 here (and in part absorbs the previous WAVE1), followed by Aurignacian (I will stay broad and will not mention proto-Aurignacian, as it is a simplification and there probably was more admixing involved). Aurignacians also absorb the previous waves.

Now for what I called the West Russia branch, it is customary to have a clean branching without admixture and that's what I did at first, but I ended with Onge/Jarawa needing some Aurignacian. The other problem was that Kostenki14's culture has strong affinities to Aurignacian, which were not explained that way. So I added an admixture event from Aurignacian to WestRussia which resulted in the graph you see and it makes perfect sense. The downside at first was that an admixture from Aurignacian into WestRussia followed by an admixture from WestRussia into Aurignacian made the graph a bit unstable, but when adding more samples latter, it stabilises.

Two last notes for now: Ust-Ishim needs some WestRussia, which makes sense geographically I guess, as WestRussia expands towards the Urals and, as we will see, also ends up in ANE. And last: the BLACKSEA and BALKANS names may not make sense right now, but when there are more populations added, you will understand.

[Image: ZGxQFDZ.png]

old europe · (This post was last modified: 04-15-2024, 01:58 PM by old europe.)

(04-15-2024, 01:23 PM)crashdoc Wrote: BASE

With the following 10 included samples, the first branching after the OOA event is ZlatyKun, inluded in a "WAVE1" event.

Then, what I called preWestRussia (ancestor to the likes of Kostenki and Sunghir), followed by Ust-Ishim's main ancestry which probably followed the course of the Irtysh River.

The Coastal movement represented here by Onge_Jarawa (in later graphs Papuans will also be here) is the next to branch off, followed by Tianyuan, which I associate with the IUP of the Northern Route to Asia (from my results and archeology I believe the Northern vs Southern route debate is a wrong one, both dispersal routes were used, we can see it also in the blade industry in northern China vs the cobbles industry in southern China).

Next we have BachoKiro which is part of WAVE2 here (and in part absorbs the previous WAVE1), followed by Aurignacian (I will stay broad and will not mention proto-Aurignacian, as it is a simplification and there probably was more admixing involved). Aurignacians also absorb the previous waves.

Now for what I called the West Russia branch, it is customary to have a clean branching without admixture and that's what I did at first, but I ended with Onge/Jarawa needing some Aurignacian. The other problem was that Kostenki14's culture has strong affinities to Aurignacian, which were not explained that way. So I added an admixture event from Aurignacian to WestRussia which resulted in the graph you see and it makes perfect sense. The downside at first was that an admixture from Aurignacian into WestRussia followed by an admixture from WestRussia into Aurignacian made the graph a bit unstable, but when adding more samples latter, it stabilises.

Two last notes for now: Ust-Ishim needs some WestRussia, which makes sense geographically I guess, as WestRussia expands towards the Urals and, as we will see, also ends up in ANE. And last: the BLACKSEA and BALKANS names may not make sense right now, but when there are more populations added, you will understand.

[url=https://imgur.com/a/nUHqK04][/url]

interesting but you should reconcile with the archeological record : I would get rid of prewest Russia and start instead with the aurignacian. After all Kostenki 14 is an aurignacian site in the map i attached linked to the Don valley samples

I would go with proto aurignacian being mostly Goyet/Fournol like and early aurignacian who followed more a balkan route being more kostenki/Muierii/Sungir like

old europe · 04-15-2024, 02:50 PM

to take into consideration the fact that we now have to add to the big picture the new Buran Kaya samples which in the map above overlap with the Crimea red spot, probably along with Siuren site always in Crimea. The Buran Kaya samples were unsurprisingly proto gravettian as for the archology but aurignacian like in their dna. Quote

Populations genetically related to present-day Europeans first appeared in Europe at some point after 38-40,000 years ago, following a cold period of severe climatic disruption. These new migrants would eventually replace the pre-existing modern human ancestries in Europe, but initial interactions between these groups are unclear due to the lack of genomic evidence from the earliest periods of the migration. Here we describe the genomes of two 36-37,000-year-old individuals from Buran-Kaya III in Crimea as belonging to this newer migration. Both genomes share the highest similarity to Gravettian-associated individuals found several thousand years later in southwestern Europe. These genomes also revealed that the population turnover in Europe after 40,000 years ago was accompanied by admixture with pre-existing modern human populations. European ancestry prior to 40,000 years ago persisted not only at Buran-Kaya III, but is also found in later Gravettian-associated populations of western Europe and Mesolithic Caucasus populations.

Of course in the paper they did not say that gravettian population in southwestern europe were.......aurignacian like and aurignacian in Europe arrived well before 38-40,000 years ago...

crashdoc · 04-15-2024, 04:42 PM

(04-15-2024, 01:56 PM)old europe Wrote:
(04-15-2024, 01:23 PM)crashdoc Wrote: BASE

interesting but you should reconcile with the archeological record : I would get rid of prewest Russia and start instead with the aurignacian. After all Kostenki 14 is an aurignacian site in the map i attached linked to the Don valley samples

I would go with proto aurignacian being mostly Goyet/Fournol like and early aurignacian who followed more a balkan route being more kostenki/Muierii/Sungir like

My WestRussia tag is culturally Aurignacian and we can see in this graph that AURIGNACIAN contributes 20% dna to it (in later graphs, when it stabilizes it's more like 27-29%) However for the sake of clarity of the graph (and it will get bigger!) I needed a different tag that I could use, an archeologically-neutral one. Also, as stated above, I don't want to get into proto-Aurignacian, Aurignacian and late Aurignacian tags, as to represent it in the graph I would need more admixture events and you'll see that I have already a LOT! As for the preWestRussia, it is another part of the dna of Kostenki and friends and it splits from the tree, even before Aurignacian existed. I have my idea of the corresponding archeological scenario for all of that, but I want to get all my dna posts done before I start discussing archeology in more details. Hint for now: read articles by Dinnis, Reynolds and Demidenko for WestRussia.

Kale · (This post was last modified: 04-15-2024, 05:03 PM by Kale.)

Excellent, I'm glad to no longer be alone exploring these sorts of things! Smile

The best words of caution I can give as you proceed; guard against extreme drift edges, as they do not guard against the statistically impossible.
As an example from the graph...
Onge and Tianyuan share 8 units of drift to the exclusion of ZlatyKun.
F4: Mbuti.DG China_UP Onge.SG ZlatyKun.SG -0.0072 -12.13 1009925
So 1 unit of drift on qpgraph is roughly equal to 0.0009 F-value.
The Aurignacian3 node contains 93 units of drift after splitting from ZlatyKun, which would suggest if we replaced China_UP and Onge with 2 members of the Aurignacian3 population, the F-value would be -0.0837, which is on par with identical twins or 2 sequences obtained from the same individual.
Mbuti.DG I2484 I2483 ZlatyKun.SG -0.0836 -81.89 288945
Mbuti.DG Kostenki14 Kostenki14.SG ZlatyKun.SG -0.0762 -90.96 989826
Here is the most extreme case of bottlenecking yet found among two un-genealogically related samples.
Mbuti.DG Italy_SanTeodoro_HG Italy_OrienteC_HG ZlatyKun.SG -0.052 -49.51 257026
Which would be about 65 units of drift.

crashdoc · 04-15-2024, 06:07 PM

(04-15-2024, 04:58 PM)Kale Wrote: Excellent, I'm glad to no longer be alone exploring these sorts of things!
The best words of caution I can give as you proceed; guard against extreme drift edges, as they do not guard against the statistically impossible.
As an example from the graph...
Onge and Tianyuan share 8 units of drift to the exclusion of ZlatyKun.
F4: Mbuti.DG China_UP Onge.SG ZlatyKun.SG -0.0072 -12.13 1009925
So 1 unit of drift on qpgraph is roughly equal to 0.0009 F-value.
The Aurignacian3 node contains 93 units of drift after splitting from ZlatyKun, which would suggest if we replaced China_UP and Onge with 2 members of the Aurignacian3 population, the F-value would be -0.0837, which is on par with identical twins or 2 sequences obtained from the same individual.
Mbuti.DG I2484 I2483 ZlatyKun.SG -0.0836 -81.89 288945
Mbuti.DG Kostenki14 Kostenki14.SG ZlatyKun.SG -0.0762 -90.96 989826
Here is the most extreme case of bottlenecking yet found among two un-genealogically related samples.
Mbuti.DG Italy_SanTeodoro_HG Italy_OrienteC_HG ZlatyKun.SG -0.052 -49.51 257026
Which would be about 65 units of drift.

You're right about being cautious, but don't worry, I've been running gpgraphs without stop for months and you know better than anyone, that once you get to a lot of populations in the same graph, you can't fool the graph! You'll see in the next diagrams that things stabilize.

However I'm not sure that we can take the drift on qpgraph as direct relations with F4 stats the way you do, I remember reading somewhere that it would work if we have only diploid samples. Also, in the next graph you'll see that Onge is not 100% pure of external influence (ie. EastAsia). And one more thing, isn't your China_UP including AR33k? I use Tianyuan by himself and in a later graph you'll see that AR33k not only has China IUP dna but also coastal dna, and it makes sense because of it's geographic position and age, after all we see it in Amerinds which are already split from the other North Asian samples around 25kya.

Kale · 04-15-2024, 06:15 PM

(04-15-2024, 06:07 PM)crashdoc Wrote: However I'm not sure that we can take the drift on qpgraph as direct relations with F4 stats the way you do, I remember reading somewhere that it would work if we have only diploid samples.

I know what you are referring to, which is terminal drift edges leading to non-diploid samples. (ex. IUP > Tianyuan edge)
Those are uninformative. You can see they are typically gigantic edges. Compared that to Ust-Ishim, the only diploid singleton, who's edge is just 2 units.

crashdoc · (This post was last modified: 04-15-2024, 08:32 PM by crashdoc.)

GRAPH2: adding Taforalt, Yana, BK1653, Vestonice, KotiasKide25k

First thing to note is an addition of a small percent of EastAsia contibution to Onge_Jarawa, it is necessary, so it must mean that they were not % always isolated. It could have entered their dna before India-related groups migrated to the Andaman islands probably during the LGM or later with maritime contact.

Second thing, when adding Vestonice to the graph, Sunghir gets a terminal 0-edge. It doesn't share that edge with anyone so it is not so bad, but it must mean that they don't have much drift since their common ancestry with Vestonice. Indeed Sunghir is ~35kya and a Sunghir-like ancestry seems to spread just around that time. At least according to mtdna TMRCA that is later shared with Western Europe, Italy, Central Europe and also Kostenki12 which is quite Sunghir-like. The Sunghir component is also present in Gravettians who are already spread by 33.5kya, see https://hal.science/hal-02376155/document (https://doi.org/10.1016/j.jasrep.2019.101958)

Third, the WAVE1a ancestry (ZlatyKun-like), is not only present in Europe, the 1st wave also seems to have spread somewhere near the Altai for instance (present in Yana and we'll see that in MA1 too). As a matter of fact, Tianyuan seems to want some too, but not enough for it to be worth adding to the the graph. (The worst f3 of this graph and the previous one is actually Tianyuan which wants closer to ZlatyKun even if they have different pulldown strategies). Of course if WAVE1 spread east and west it is not implausible that some of it was present somewhere in the Mid-East as we see here contributing 1% to SouthCaucasus(ie KotiasKide25k). It must be said also that WAVE1 contribution to WAVE2 splits before WAVE1's east-west split, so WAVE2 must have absorbed it on the way, before entering Europe. I won't say more on WAVE1 since we have only 1 sample (ZlatyKun) and maybe I have already speculated too much with so little information!

Now for the new additions:

-WAVE0: something that branches off before the other Eurasians split, but shares most of the OOA drift with them. Some would call it Basal, but I will not use that term, it is too loaded with other significations that might or might not be true. The term WAVE0 could also be misleading as it might not be a real WAVE of dispersion that preceded WAVE1, but could be instead a group that was left behind when the main group went to the MID-EAST for instance, however I couldn't think of any better tag and it fits with the branching order.

-Yana: a mix of West and East Eurasia. The East Eurasian component also includes something in common with Onge_Jarawa (confirmed by Y-haplogroup P that is also found in South-East Asia), which probably made its way North to the Altai (you'll see it in MA1 too). Once there, that component probably also went West and ended-up in small amount in the MID-EAST. It is noteworthy that Yana also needs a WAVE1 component like mentionned above.

-Vestonice & BK1653: Vestonice is a mix of preVESTONICE1(Sunghir-like) and 2 components in common with BK1653 (BachoKiro_IUP-related & WestRussia-BALKANS, which they have in common with Muierii), but in different proportions.

BK1653 also has another WestRussia component in common with Muerii and a more mysterious component that branches off before WAVE0. Does it mean something that really left the "hub" before WAVE0 or is it some strange artifact? I cannot answer that question and I will not dwell too long on that, you can ignore it if you want, I will not base any conclusion on it.

-KotiasKide & Taforalt: Except what's already been mentionned about WAVE0, WAVE1 and Onge-like components, KotiasKide25k and Yana also have a backflow from Europe, confirmed for instance by mtdna haplogroup U6, and the Levant Aurignacian. Taforalt, aside from the obvious african component, also has something that splits at the root of Aurignacian, which I have tagged AHMARIAN in accordance with archeology.

It might seem stange to have Levant Aurignacian connected to what I called WestRussia (here WR-BALKANS) but remember that WestRussia is actually culturally Aurignacian and has a nonnegligible Aurignacian genetic component.

All of that may seem fishy to some (especially the WAVE1 spreading or the India-related one), but I simply followed where the residual stats and the 0-edges were leading me, and those movements are plausible.

[Image: MTm2x4Y.png]

Norfern-Ostrobothnian · (This post was last modified: 04-15-2024, 10:55 PM by Norfern-Ostrobothnian.)

Try making Vestonice a mixture of BK1653 and Sunghir maybe that helps with the 0 edge? I think that is the accepted mixture these days.

Kale · (This post was last modified: 04-16-2024, 02:57 AM by Kale.)

Yes the zero edge leading to Sunghir is a red flag.
Going back to the terminal edge point... Diploid samples will have accurate terminal drift edges. Single sample pseudohaploids have crazy high terminal drift edges (because being pseudohaploid, they are 100% homozygous). Multi-sample pseudohaploid populations don't have as high terminal drift edges, but they are still heavily inflated, here's an example of kind of what is going on via F3 stats, which qpgraph works with to build it's structure.

Mbuti.DG Sunghir1.SG Sunghir2.SG 0.3401
Mbuti.DG Sunghir1.SG Sunghir3.SG 0.3397
Mbuti.DG Sunghir1.SG Sunghir4.SG 0.3296
Mbuti.DG Sunghir2.SG Sunghir3.SG 0.3580
Mbuti.DG Sunghir2.SG Sunghir4.SG 0.3286
Mbuti.DG Sunghir3.SG Sunghir4.SG 0.3383
There is a bit of spread due to varying genealogical relationships, but eyeballing it you'd probably say the Sunghir population has a ~0.34 internal drift as a whole.
So what happens if we group the Sunghir together and run F3: Mbuti Sunghir Sunghir?
Mbuti.DG Sunghir.SG Sunghir.SG 0.4786
Why so high? Because each sample is not compared just to the others, but also to themselves, incurring the pseudohaploid wrath Tongue

Mbuti.DG Sunghir1.SG Sunghir1.SG 0.8328
Mbuti.DG Sunghir2.SG Sunghir2.SG 0.8360
Mbuti.DG Sunghir3.SG Sunghir3.SG 0.8351
Mbuti.DG Sunghir4.SG Sunghir4.SG 0.8342
In the graph they are suffering the pseudohaploid terminal drift inflation, but still getting a 0 terminal edge. This is a problem that needs to be resolved.

crashdoc · 04-16-2024, 03:06 PM

(04-15-2024, 10:53 PM)Norfern-Ostrobothnian Wrote: Try making Vestonice a mixture of BK1653 and Sunghir maybe that helps with the 0 edge? I think that is the accepted mixture these days.

I tried and it is approximatively right, but it doesn't work with the kind of precision I'm going for, because BK1653 doesn't excatly have the right admixture proportions for that. It might work also if we merge Vestonice with Italian Gravettians, but you'll see in a later graph that Italian Gravettians have interesting additional components.

crashdoc · 04-16-2024, 03:07 PM

(04-16-2024, 02:49 AM)Kale Wrote: Yes the zero edge leading to Sunghir is a red flag.
Going back to the terminal edge point... Diploid samples will have accurate terminal drift edges. Single sample pseudohaploids have crazy high terminal drift edges (because being pseudohaploid, they are 100% homozygous). Multi-sample pseudohaploid populations don't have as high terminal drift edges, but they are still heavily inflated, here's an example of kind of what is going on via F3 stats, which qpgraph works with to build it's structure.

Mbuti.DG Sunghir1.SG Sunghir2.SG 0.3401
Mbuti.DG Sunghir1.SG Sunghir3.SG 0.3397
Mbuti.DG Sunghir1.SG Sunghir4.SG 0.3296
Mbuti.DG Sunghir2.SG Sunghir3.SG 0.3580
Mbuti.DG Sunghir2.SG Sunghir4.SG 0.3286
Mbuti.DG Sunghir3.SG Sunghir4.SG 0.3383
There is a bit of spread due to varying genealogical relationships, but eyeballing it you'd probably say the Sunghir population has a ~0.34 internal drift as a whole.
So what happens if we group the Sunghir together and run F3: Mbuti Sunghir Sunghir?
Mbuti.DG Sunghir.SG Sunghir.SG 0.4786
Why so high? Because each sample is not compared just to the others, but also to themselves, incurring the pseudohaploid wrath
Mbuti.DG Sunghir1.SG Sunghir1.SG 0.8328
Mbuti.DG Sunghir2.SG Sunghir2.SG 0.8360
Mbuti.DG Sunghir3.SG Sunghir3.SG 0.8351
Mbuti.DG Sunghir4.SG Sunghir4.SG 0.8342
In the graph they are suffering the pseudohaploid terminal drift inflation, but still getting a 0 terminal edge. This is a problem that needs to be resolved.

You're right that it's a red flag and whenever I've seen it with other samples, there were residual f3 to confirm the problem. But not so this time. I've tried different ways to eliminate it, but it stays there, so I tried to explain it instead. I'm sharing my results even though they might not be 100% perfect (I won't put as much time in it as I did as my wife and I agree that I've already spent far too much, well, she agrees more than I do!) and the feedback is necessary for me and for the others who will read this thread, so thank you!

I remain unconvinced by the non-diploid samples affecting only the terminal edge drift. When common drift between samples is not directly constrained by direct edges but by admixture edges, the terminal drift of the most direct sample (or highest admixture %) seems to be distributed along the upper common edges. For instance a branching of only 1% (see WAVE1_EUR upwards from ZlatyKun) gets the drift cleanly split in two just like if there was no branching and only a redundant edge, a branching of 2% (see EastAsia1 to preOnge_Jarawa upwards from Tianyuan) affects only minimally the distribution of the terminal drift and so on.

Kale · 04-16-2024, 03:35 PM

(04-16-2024, 03:07 PM)crashdoc Wrote: I remain unconvinced by the non-diploid samples affecting only the terminal edge drift. When common drift between samples is not directly constrained by direct edges but by admixture edges, the terminal drift of the most direct sample (or highest admixture %) seems to be distributed along the upper common edges. For instance a branching of only 1% (see WAVE1_EUR upwards from ZlatyKun) gets the drift cleanly split in two just like if there was no branching and only a redundant edge, a branching of 2% (see EastAsia1 to preOnge_Jarawa upwards from Tianyuan) affects only minimally the distribution of the terminal drift and so on.

It's not that pseudohaploids only effect the terminal drift edge, it is that the terminal drift edge cannot be used to measure the bottleneck of a population (because the sample is artificially maximally bottlenecked because of being 100% homozygous). It can be useful to look at though to see how much drift is being absorbed by the graph's branching events, because all the single-sample pseudohaploids will have roughly the same number of total drift units if you trace through the graph, as they all have the same start and end points (Root and 100% homozygous).

I'm not sure why in those particular cases it is choosing to split the branches evenly. My guess would be that the standard errors for the admixture % and drift edges are very large, and it's splitting the branches evenly out of 'convenience' (for lack of a better term). Check that out and let me know, I'm interested to confirm or deny that hypothesis.

crashdoc · 04-16-2024, 04:41 PM

GRAPH3: adding MA1, WHG, EHG, SATP (Satsurblia=CHG), Pinarbasi, Iran_N

A word of warning: yes there are a lot of admixture events, because I tried to be as precise as possible in order to squeeze as much information as I can from the genetics. A more conservative approach as been taken by Kale and he already captured all the main admixture events, I don't need to do it again. We must also take into account that the samples here are later then those in the first graphs and so human groups continued to expand and admix in the meantime.

-MA1: same components as Yana, but Yana has additional EastAsia, while MA1 has a new component: a Sunghir-related one, that probably expanded by 35kya, like I said above.

-WHG: has Vestonice-like components + some MID-EAST components (IND_toMID_EAST, WAVE0, WAVE1_MID) + a small amount of ANE

-SATP: For CHG I used exclusively Satsurblia, because the other 2 are later and more admixed. Apart from a bit of ANE and the usual Mid-East components, among which a good amount of SouthCaucasus (KotiasKide25k), it has 3 interresting additional components (which it shares with Iran_N). A Sunghir-related one that expanded probably from the same route that it reached MA1, an African component (which we also find in Natufian even though not represented in my graphs since Natufian is not good enough quality). And the last one, which I named NorthCaucasus is a mix of Kostenki14-like and WR_BLACKSEA. I named it NorthCaucasus, because BuranKaya3C (which I ran with a much simpler graph in admixtools1 with allsnps = yes) gets about the same amount of the same components.

-Pinarbasi: notable components apart from a bit of ANE and the usual Mid-eastern components are from the Balkans: WR_BALKANS, WHG, and BACHOKIRIAN! I tried various combinations, but the direction is always WHG->Pinarbasi and not the reverse. The BachoKirian component is surprising, but considering the geographical location, not impossible. At that time it was probably not in unadmixed form though, but we know little from Anatolia (or even the Southern Balkans) before Pinarbasi.

-EHG: what I called preEHG5, is likely very AG3-like and compared with earlier ANE like MA1, it has 3 additonnal components: SouthCaucasus, NorthCaucasus (the two probably mixed together by that time) and a PARA-AMERICA component, which is connected here to EastAsia (as the main missing America sub-component of EHG) but, as you'll see later if we include American samples, it gets connected to them instead. To get from "AG3" to EHG, there is of course an additional WHG component, but also a Pinarbasi-related component that was probably present in the Balkans where that particular WHG source originated.

-IRAN_N: I was expecting some deep component(s) in Iran, but as it turns out, everything that makes its genetic composition is already present around: apart from SATP-like components, it has additional "preEHG5-AG3" and SouthAsian input.

That's a lot to digest, but I believe it answers many questions. For me, the main takeways are:

-The Balkans link with the mid-east are manyfold and the WHG input into Pinarbasi is also real.

-As for the "basal" dna and it's signification for the amount of Neandertal, it is neither pure Basal(WAVE0) or pure Africa that affects it, it is a mix of both.

-Iran is not an ancient dna refugium, at least in the Mesolithic-Neolithic transition.

The most complicated stuff is now done. For the remaining graphs I will show some additional specifics: Gravettians from Italy; Coastal dispersion with Papuan & Jomon, AR33k, Amerinds; and Africa with Mota & Shum Laka (with a word about the others). I did not include all of the mentionned samples in the same graph because it gets too big and too long to run.

.zip

graph3.zip (Size: 34.37 KB / Downloads: 6)
.zip

graph3.zip (Size: 34.37 KB / Downloads: 6)

[Image: WU35s4O.png]

Login
Username/Email:
Password:	Lost Password?
	Remember me