Login

TanTin · 03-26-2024, 07:26 AM

I think I notices some bad bug in Admixtools 2 for group statistics.
( from library("admixtools"))

Running the same test for the same groups: I got different results for each run.
It doesn't make much sense , unless if there is some random algorithm for simplification of calculations.

Test population:

TEST
[1] "Turkey_N" "Iran_GanjDareh_N" "Georgia_Kotias.SG" "Russia_Sidelkino_HG.SG"
[5] "Tajikistan_BA_DashtiKozy" "Greece_Minoan_Lassithi" "Greece_Minoan_Kephala_Petras.SG" "Greece_Minoan_Odigitria"
[9] "Italy_South_HG_Ostuni2" "Russia_Ust_Ishim.DG" "Denisova.DG" "Vindija_Neanderthal.DG"
[13] "Papuan.SDG" "Greece_BA_Mycenaean_Pylos" "Greece_BA_Mycenaean" "Micronesia_Pohnpei_400BP"
[17] "Chimp.REF" "JPT.SG" "Mongolia_Salkhit_UP.SG" "Japan_HG_Jomon"
>

This is the test:

f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" ) #
prefix -> is the path to the dataset v52.2_1240K : in my case in another subfolder:
"../v52.2/geno/v52.2_1240K_public"

To run the test just execute the command in R: f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" )

This is the activity log in real time:

i Getting population combinations...
i 20 population combinations found
i Computing from f4 from genotype data...
i Reading metadata...
i Computing block lengths for 1150639 SNPs...
i Computing 20 f4-statistics for block 713 out of 713...
i Summarize across blocks...

Next: the results:

[Image: F4-TEST-a.png]

Show Content

TanTin · 03-26-2024, 07:35 AM

However the problem is that each time I run the same script for the same test populations , I get different results.

[Image: F4-TEST-b.png]

next run:

TanTin · 03-26-2024, 07:45 AM

Now, I am adding additional option for allsnps = T , which seems to be active by default.

So the script is:
f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" )

[Image: F4-TEST-all-snp-T.png]

TanTin · 03-26-2024, 08:05 AM

I just restarted R-Studio. And rerun the previous test:

[Image: Rst-F4-TEST-all-snp-T.png]

This time the numbers for the snips ( the last column) are stable, same numbers . However the est , se and Z are still very different. Z= est/se .

Some numbers are the same in both runs. And the difference for some groups is very bad. I don't see any explanation unless some bug.

TanTin · 03-26-2024, 08:13 AM

[Image: RUN3-4-Rst-F4-TEST-all-snp-T.png]

another version

TanTin · 03-26-2024, 09:10 AM

I can assure you that F4 functions work perfect if you run these on individuals (instead of group populations).
When running F4 functions for individuals only we get the same reliable results on each run. No such random discrepancy that I show above with the groups.
Here is such example:

[Image: F4-individuals.png]

Show Content

TanTin · 03-26-2024, 09:17 AM

Here are the F4- results when we run the test for Individuals ( not groups).

[Image: Neanderthal-Denisova.png]

F4- stats for the individuals are perfectly matching to what we know already. There is 1 special case: Salkhit. It is on the Neanderthal side, so she is more Neanderthal than Ust-Ishim.

***AimSmall*** · 03-26-2024, 09:29 PM

I don't know the cause of your random results, be curious if the original version has the same behavior.

https://uqrmaie1.github.io/admixtools/ar...nd-qpdstat
The "f4mode = FALSE" seems interesting.

Search the word "Random"
https://uqrmaie1.github.io/admixtools/ar...stats.html

Your source code is here...
https://github.com/uqrmaie1/admixtools/b.../qpdstat.R

TanTin · 03-27-2024, 01:01 AM

(03-26-2024, 09:29 PM)AimSmall Wrote: I don't know the cause of your random results, be curious if the original version has the same behavior.

https://uqrmaie1.github.io/admixtools/ar...nd-qpdstat
The "f4mode = FALSE" seems interesting.

Search the word "Random"
https://uqrmaie1.github.io/admixtools/ar...stats.html

Your source code is here...
https://github.com/uqrmaie1/admixtools/b.../qpdstat.R

I am using ‘admixtools’ version 2.0.0.

In general I don't need and I don't use F4 for populations. It is used in many official publications, but I don't use it. The only reason to test something with the groups is to validate some results that I already have by using PCA or other tests.. I prefer to run F4 stats for individuals .

To run F4 for individuals is a little bit tricky. Because I have to modify .ind or .fam file and to replace the group ID with the individual ID.
Running F4 for the groups is the easy and lazy way for using Admixtools.
However with the latest tests that I provided above: it makes nonsense to do such group tests, as these results are not reliable at all. Of course someone may play with the settings.
For example by using: allsnps = F option it may fix the random results and we may get stable results, but it will restrict the number of snips. In the example that I provided when allsnps = F I get only 94 snips included in the results.
For qpWave and qpAdm - I know it is using some algorithms that may create different results even if we apply the same criteria. I did not expect that same could be true for the group F4 stats.

***AimSmall*** · 03-27-2024, 01:35 AM

Everything I posted was Admixtools 2.0.

kolompar · 03-27-2024, 09:25 PM

Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.

TanTin · 03-27-2024, 09:52 PM

(03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.

I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.

Арсен · 04-04-2024, 01:21 PM

(03-27-2024, 09:52 PM)TanTin Wrote:
(03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.

I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.

dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?

TanTin · 04-04-2024, 07:00 PM

(04-04-2024, 01:21 PM)Арсен Wrote:
(03-27-2024, 09:52 PM)TanTin Wrote:
(03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.

I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.

dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?

There is already such topic in Eupedia. Start from there.

https://www.eupedia.com/forum/threads/ad...ows.42684/

Regarding converting converting bam to plink format - this is another topic. It is a different process. For now try to make your software working with the already available data. (in plink bed,bim fam format).

Арсен · 04-04-2024, 07:16 PM

(04-04-2024, 07:00 PM)TanTin Wrote:
(04-04-2024, 01:21 PM)Арсен Wrote:
(03-27-2024, 09:52 PM)TanTin Wrote: I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.

dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?

There is already such topic in Eupedia. Start from there.

https://www.eupedia.com/forum/threads/ad...ows.42684/

Regarding converting converting bam to plink format - this is another topic. It is a different process. For now try to make your software working with the already available data. (in plink bed,bim fam format).

nothing is clear but very interesting xD

Login
Username/Email:
Password:	Lost Password?
	Remember me