Login

crashdoc · 04-16-2024, 04:46 PM

(04-16-2024, 03:35 PM)Kale Wrote:
(04-16-2024, 03:07 PM)crashdoc Wrote: I remain unconvinced by the non-diploid samples affecting only the terminal edge drift. When common drift between samples is not directly constrained by direct edges but by admixture edges, the terminal drift of the most direct sample (or highest admixture %) seems to be distributed along the upper common edges. For instance a branching of only 1% (see WAVE1_EUR upwards from ZlatyKun) gets the drift cleanly split in two just like if there was no branching and only a redundant edge, a branching of 2% (see EastAsia1 to preOnge_Jarawa upwards from Tianyuan) affects only minimally the distribution of the terminal drift and so on.

It's not that pseudohaploids only effect the terminal drift edge, it is that the terminal drift edge cannot be used to measure the bottleneck of a population (because the sample is artificially maximally bottlenecked because of being 100% homozygous). It can be useful to look at though to see how much drift is being absorbed by the graph's branching events, because all the single-sample pseudohaploids will have roughly the same number of total drift units if you trace through the graph, as they all have the same start and end points (Root and 100% homozygous).

I'm not sure why in those particular cases it is choosing to split the branches evenly. My guess would be that the standard errors for the admixture % and drift edges are very large, and it's splitting the branches evenly out of 'convenience' (for lack of a better term). Check that out and let me know, I'm interested to confirm or deny that hypothesis.

Thanks for the explanation, that makes sense. As for the standard errors, I have never seen them in my output for the drift/admixture edges in qpgraph2, could you tell me how I can find them?

Kale · (This post was last modified: 04-16-2024, 05:00 PM by Kale.)

I assume you are running something like this (example from the documentation)...
out = qpgraph(example_f2_blocks, example_graph)
plot_graph(out$edges)

Instead just run
qpgraph(example_f2_blocks, example_graph)

You'll get an output with all the edge weights, admixture weights, std. errs. and fitted f3 statistics.

old europe · 04-16-2024, 06:14 PM

(04-16-2024, 04:41 PM)crashdoc Wrote: GRAPH3: adding MA1, WHG, EHG, SATP (Satsurblia=CHG), Pinarbasi, Iran_N

A word of warning: yes there are a lot of admixture events, because I tried to be as precise as possible in order to squeeze as much information as I can from the genetics. A more conservative approach as been taken by Kale and he already captured all the main admixture events, I don't need to do it again. We must also take into account that the samples here are later then those in the first graphs and so human groups continued to expand and admix in the meantime.

-MA1: same components as Yana, but Yana has additional EastAsia, while MA1 has a new component: a Sunghir-related one, that probably expanded by 35kya, like I said above.

-WHG: has Vestonice-like components + some MID-EAST components (IND_toMID_EAST, WAVE0, WAVE1_MID) + a small amount of ANE

-SATP: For CHG I used exclusively Satsurblia, because the other 2 are later and more admixed. Apart from a bit of ANE and the usual Mid-East components, among which a good amount of SouthCaucasus (KotiasKide25k), it has 3 interresting additional components (which it shares with Iran_N). A Sunghir-related one that expanded probably from the same route that it reached MA1, an African component (which we also find in Natufian even though not represented in my graphs since Natufian is not good enough quality). And the last one, which I named NorthCaucasus is a mix of Kostenki14-like and WR_BLACKSEA. I named it NorthCaucasus, because BuranKaya3C (which I ran with a much simpler graph in admixtools1 with allsnps = yes) gets about the same amount of the same components.

-Pinarbasi: notable components apart from a bit of ANE and the usual Mid-eastern components are from the Balkans: WR_BALKANS, WHG, and BACHOKIRIAN! I tried various combinations, but the direction is always WHG->Pinarbasi and not the reverse. The BachoKirian component is surprising, but considering the geographical location, not impossible. At that time it was probably not in unadmixed form though, but we know little from Anatolia (or even the Southern Balkans) before Pinarbasi.

-EHG: what I called preEHG5, is likely very AG3-like and compared with earlier ANE like MA1, it has 3 additonnal components: SouthCaucasus, NorthCaucasus (the two probably mixed together by that time) and a PARA-AMERICA component, which is connected here to EastAsia (as the main missing America sub-component of EHG) but, as you'll see later if we include American samples, it gets connected to them instead. To get from "AG3" to EHG, there is of course an additional WHG component, but also a Pinarbasi-related component that was probably present in the Balkans where that particular WHG source originated.

-IRAN_N: I was expecting some deep component(s) in Iran, but as it turns out, everything that makes its genetic composition is already present around: apart from SATP-like components, it has additional "preEHG5-AG3" and SouthAsian input.

That's a lot to digest, but I believe it answers many questions. For me, the main takeways are:

-The Balkans link with the mid-east are manyfold and the WHG input into Pinarbasi is also real.

-As for the "basal" dna and it's signification for the amount of Neandertal, it is neither pure Basal(WAVE0) or pure Africa that affects it, it is a mix of both.

-Iran is not an ancient dna refugium, at least in the Mesolithic-Neolithic transition.

The most complicated stuff is now done. For the remaining graphs I will show some additional specifics: Gravettians from Italy; Coastal dispersion with Papuan & Jomon, AR33k, Amerinds; and Africa with Mota & Shum Laka (with a word about the others). I did not include all of the mentionned samples in the same graph because it gets too big and too long to run.

these kind of graphs seem so different from the ones you find on professional/scientific papers. I wonder what do you have that they don't about genetic clusters?

Kale · 04-16-2024, 06:31 PM

Published papers are typically hyper-conservative. They include very few populations and use the minimum number of admixtures possible to clear the statistical threshold (worst F4 residual Z < 3 typically).

Chad · 04-16-2024, 10:01 PM

The problem is, you can admix anything to a passing model. There's too much going on here to say any of it is real.

crashdoc · 04-16-2024, 11:30 PM

(04-16-2024, 04:59 PM)Kale Wrote: I assume you are running something like this (example from the documentation)...
out = qpgraph(example_f2_blocks, example_graph)
plot_graph(out$edges)

Instead just run
qpgraph(example_f2_blocks, example_graph)

You'll get an output with all the edge weights, admixture weights, std. errs. and fitted f3 statistics.

I tried but I get exactly the same results as when putting it into a variable like "out", no std errs except on f3 stats. Note that I'm running my graphs with lsqmode = TRUE, but even with the default there are not standard errors.

crashdoc · 04-16-2024, 11:44 PM

(04-16-2024, 10:01 PM)Chad Wrote: The problem is, you can admix anything to a passing model. There's too much going on here to say any of it is real.

I understand your position and I would probably have said the same. So many admixture events is almost "heretic" (however still a vast simplification of the real history!) But I assure you that with all the runs I did, it is not so easy to get a passing model with all those samples. In fact Kale's models in the other thread are not passing models (no offense meant, I still believe they are valid, just less precise and definately worst f4 > |3|). You can try yourself if you want and see if you can get worst f4 stats under |3| and you have the advantage of being able to base yourself on mines.

If you want you can look at only the first 2 graphs in this thread (afterall they do not have as many admix events) which have really low worst f4 stats and f3 as well.

However I'm not out to convince anyone, as everyone has his preconceptions of what the end result should prove or not (me included). That said, I had to change my views a couple of times because of the results I got.

My main goal with this thread is to share my results with others, if that can be of use to anyone, but my personal goal was to answer my own questions and I'm satisfied.

crashdoc · 04-17-2024, 12:47 AM

PAPUAN AND JOMON

The Coastal Route branches off into COASTAL_INDIA and the groups that continued which I tagged COASTAL_PACIFIC. Those last ones went into Sahul and also up the Pacific Coast to contribute to many coastal groups and ultimately, Amerinds (not shown here).

-Papuans as expected, have a small percent of archaic admixture witch I tagged FAKEDENISOVIAN because there is no real Denisovian dna in my graph. They also have a small amount of EastAsia that they probably got from the Coastal Papuan which got it in turn from the Austronesians.

-Jomon are a mix of EastAsia, Coastal and inland Onge-like dna. They also have ANE and EastAsiaPARA_AMERICA which they share with EHG (and come ultimately from the ancestors of the Amerinds)

A word about inland Onge-like dna from which Jomon branches off before what contributed to ANE. Jomon and Andaman are linked by the Y-dna D haplogroup (D1a2-Z3660 TMRCA 45200) which also gets dispersed into China. Remember that South China archeology is different than North China. The oldest Hoabinhian site is also in South China. Hoabinhian (incidentally 1 of the 2 we have is D1*) is genetically quite close to the same source as Onge (barring a small amount of contamination, which is why I don't show them in the graph: according to residual f3 they want some European dna!)

[Image: bV9g3U7.png]

PapuanJomon.zip (Size: 38.16 KB / Downloads: 6)

Kale · 04-17-2024, 03:54 AM

(04-16-2024, 11:44 PM)crashdoc Wrote: I assure you that with all the runs I did, it is not so easy to get a passing model with all those samples. In fact Kale's models in the other thread are not passing models (no offense meant, I still believe they are valid, just less precise and definately worst f4 > |3|).

It is extremely difficult if not impossible without severe overfitting. The F4 residuals seem to compound deviations which are quite minor.
Here's an example of f3 and f4 residual from graph 41124b on my thread
F3: Mbuti MA1 Muierii1: Z = -0.40
F3: Mbuti MA1 Iran_N: Z = +1.16
F3: Mbuti MA1 CHG: Z = -0.89
F3: Mbuti Muierii1 Iran_N: Z = -0.80
F3: Mbuti Muierii1 CHG: Z = -0.41
F3: Mbuti Iran_N CHG: Z = -0.56
F4: Iran_N CHG MA1 Muierii1: Z = +3.03
I can't say I understand how the worst F3 residual there has a p-value of 0.2461, yet the F4 has a p-value of 0.0025?

kolompar · 04-17-2024, 11:50 AM

(04-16-2024, 11:44 PM)crashdoc Wrote:
(04-16-2024, 10:01 PM)Chad Wrote: The problem is, you can admix anything to a passing model. There's too much going on here to say any of it is real.

I understand your position and I would probably have said the same. So many admixture events is almost "heretic" (however still a vast simplification of the real history!) But I assure you that with all the runs I did, it is not so easy to get a passing model with all those samples. In fact Kale's models in the other thread are not passing models (no offense meant, I still believe they are valid, just less precise and definately worst f4 > |3|). You can try yourself if you want and see if you can get worst f4 stats under |3| and you have the advantage of being able to base yourself on mines.

If you want you can look at only the first 2 graphs in this thread (afterall they do not have as many admix events) which have really low worst f4 stats and f3 as well.

However I'm not out to convince anyone, as everyone has his preconceptions of what the end result should prove or not (me included). That said, I had to change my views a couple of times because of the results I got.

My main goal with this thread is to share my results with others, if that can be of use to anyone, but my personal goal was to answer my own questions and I'm satisfied.

The problem with adding too many edges is they won't be identifiable by qpgraph and that's why it will pass. Run plot_graph with highlight_unidentifiable = TRUE to see if you've overdone it. I guess it's nice to add some extra edges to make it look more realistic and convey your idea better, but I'm not sure if that's what I feel when I look at your graphs.

crashdoc · (This post was last modified: 04-17-2024, 11:52 AM by crashdoc.)

GravIta.zip (Size: 36.92 KB / Downloads: 3)

ITALIAN GRAVETTIANS

Warning: the best qpgraphs are behind. For instance this one has only 350k snps and it starts to show.

Italian Gravettians are a bit different from Vestonice. Apart from Vestonice-like dna, they have some NW_Aurignacian dna (confirmed by Ostuni M mtdna) and also some southern dna (among which some wave0 and Ind_Mid-East just like WHG but less. So it means that those component were already in southern Europe ~28kya and not only 25kya with the ancestors of WHG.

WHG also absorbed a small amount of Italian Gravettians dna. It might seem small, but remember that they already had Vestonice-like components. It is plausible that they got the Italian Gravettians dna when they dispersed from the Balkans to Italy. Note that their contribution to EHG and Pinarbasi happens before that.

[Image: TqOvzZv.png]

crashdoc · (This post was last modified: 04-17-2024, 12:49 PM by crashdoc.)

I know those graphs look crazy! But what is critical with qpgraph is not the amount of admixture, it is the amount of populations and the f3 residuals between them.

With a small amount of populations you can do what you want. I can (and I have) make a graph with a worst f4 Z-score of 0.9 !!! with the populations in the 1st graph (minus Muierii and Ust-Ishim) with only about 3-4 admixture events and making Onge descend from Goyet_Fournol! I can do the same with a still nice Z-score and only 2 admixture events.

In the literature it is full of such graphs and some even tolerate many 0-edges as root nodes.

From my experience, when you get enough populations in a graph you cannot make it say what you want, no matter the amount of admixture events (you can't say that I don't know about many admixture events!), because you get 0% admix proportions and also 0-edges, without mentioning too high f3 residuals.

Like I said, I don't think I will convince many, but if you have one takeway from this thread, let it be the first graph, which still looks reasonable to most and has quite low residual Z-scores.

crashdoc · (This post was last modified: 04-17-2024, 05:23 PM by crashdoc.)

AFRICA

Of course not much can be inferred about internal structure in Africa with qpgraph, as we don't have enough ancient samples. After all, Africa has a longer AMH history than Eurasia!

What can be seen here is that Mota has a small amount of backflow from Eurasia, while Shum Laka doesn't (or not detectably so with qpgraph). Shum Laka also has Central African dna contrary to Mota, which should come as no surprise.

I tried adding Yoruba, but they are much too mixed (Central Africa, East Africa, North Africa) and we don't know the exact internal african structure before the Bantu Expansion. I can get a nice score putting all 3 sources plus IBM (they get about 5-6%), but they still miss some additional Eurasian dna (probably Neolithic Levant by a North or East Africa route).

As for South Africa 2000BP (not shown on the graph), they too are mixed, but no Eurasian dna, nor North Africa. However probably some Central Africa and a lot of East Africa through the rift valley route, that seemed to have been used regularly in a North-South direction since quite long ago.

Additionally, Mbuti, which is used as an outgroup, is not pure ancestral Central Africa if we look at uniparentals, so any african structure and admix percentages must be taken with a grain (or many!) of salt.

We can infer a bit more with uniparentals:

The 130kya interglacial (MIS5) is associated with the first main divergence among surviving sapiens lineages that spread all over africa at this time.

-A00 & A0 & A1a are from central Africa. L1 split from L2'3'4'5'6'7 about 130kya and is from central Africa

-A1a split from A1b ~133kya, BT from A1b1 ~130kya, A1b1a from A1b1b ~125kya

-A1b1a->South Africa, A1b1b->East Africa, BT->Somewhere in the Northern half of Africa

-A1 is associated with mtdna L1'2'3'4'5'6'7 and their TMRCA is both about 130kya, that's when BT & L2'3'4'5'6'7 split from them.

-L0d (South Africa) split from L0a'b'f'g'k (East Africa) about 125kya

-L0k (also South Africa) split from L0a'b'f'g (East Africa) about 115kya

-BT is associated with L2'3'4'6 and their TMRCA is about 90kya, that's when CT and L3'4'6 split from them.

In summary:

-Both mtdna and Y-dna show that around 130kya Central Africa split from the rest (or the rest split from Central Africa!)

-Shortly thereafter BT and L2'3'4'6 split from the non-CentralAfrica group and probably went northward somewhere

-The last main split of that period was between East and South Africa. However movements from East to South Africa continued.

-Around 90kya Eurasians ancestors (not speculating about their whereabouts) split from B/L2

-Some of them stayed or went back to Africa (depending on their whereabouts) at some point (mtdna L6, L4, L3, Y-dna E).

-And over the next tens of thousands of years there were other events that caused admixing in Africa and that we don't know much about.

-Finally, to single out the most obvious, we start to see back movements from Eurasia to Africa in the Neolithic and a later Bantu expansion.

Of course there were also other surviving lineages in Central Africa and non-surviving ones in the Aterian of North Africa.

I didn't mention West Africa, because I don't have much information about them and what I read about archeology usually bundles them with Central Africa. All I can say is that their oldest lineage indeed seem to have a link with Central Africa, but later we also see links with North and East Africa.

[Image: zP4e8vx.png]

MotaAndShum.zip (Size: 38.37 KB / Downloads: 2)

crashdoc · (This post was last modified: 04-18-2024, 03:33 AM by crashdoc.)

Amur River 33kya

This graph is a bit distorted, maybe because of the quality of the sample or of less effort from the graph maker, but I only wanted to show what I meant earlier by AR33k being different from Tianyuan. AR33k has a good chunk of coastal dna. On this graph, however, I would take the exact percent with a grain of salt, but it is obviously present. I cannot detect it with single f3 or f4 stats, maybe because Onge and Papuan are not ancient and quite isolated ones, so quite bottlenecked. Moreover they are shotgun diploid while Tianyuan and AR33k are Capture pseudo-haploid.

EDIT: qpadm sees it

Code:
> right = c('Mbuti','Kostenki14','Sunghir','Taforalt','Goyet_Fournol','ZlatyKun','BachoKiro_IUP_udg','BK1653','Ust_Ishim','Muierii1','Yana','KotiasKide25K','Vestonice','MA1','Italy_WHG','Pinarbasi','SATP','Iran_N','EHG_OLD','Peru_RioUncallane_1800BP','Mota','Jomon','Onge_Jarawa')

> left = c('Tianyuan', 'Papuan')

> target = 'AR33k'

> results = qpadm(f2_blocks, left, right, target)

$popdrop

# A tibble: 3 × 13

  pat      wt   dof chisq        p f4rank Tianyuan Papuan feasible best  dofdiff

  <chr> <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>  <dbl> <lgl>    <lgl>   <dbl>

1 00        0    21  26.1 2.03e- 1      1    0.795  0.205 TRUE     NA         NA

2 01        1    22  42.1 6.06e- 3      0    1     NA     TRUE     TRUE        0

3 10        1    22 122.  8.59e-16      0   NA      1     TRUE     TRUE       NA

There is one more thing to note, it is that Papuan in this setup needs also some SouthChina/SouthEastAsia Onge-like dna, probably, like I said earlier, because they got it from Coastal Papuans which are more mixed.

[Image: BmjCeZc.png]

AR33k.zip (Size: 37.82 KB / Downloads: 2)

TanTin · 04-18-2024, 05:39 AM

I do similar kind of graphs or tree-kind of structures and I have the same problem.
With a small number the structure is clear.
By adding more we expect to have better picture. But it is the opposite: more groups involved - more we see the total mess. In addition - more groups involved in the tree - more time I will have to spend each time to remember, to re-discover, to understand what we have here and why.
Good luck! Wish you luck.

Login
Username/Email:
Password:	Lost Password?
	Remember me