Admixtools 2 - TanTin - 03-26-2024
I think I notices some bad bug in Admixtools 2 for group statistics.
( from library("admixtools"))
Running the same test for the same groups: I got different results for each run.
It doesn't make much sense , unless if there is some random algorithm for simplification of calculations.
Test population:
TEST
[1] "Turkey_N" "Iran_GanjDareh_N" "Georgia_Kotias.SG" "Russia_Sidelkino_HG.SG"
[5] "Tajikistan_BA_DashtiKozy" "Greece_Minoan_Lassithi" "Greece_Minoan_Kephala_Petras.SG" "Greece_Minoan_Odigitria"
[9] "Italy_South_HG_Ostuni2" "Russia_Ust_Ishim.DG" "Denisova.DG" "Vindija_Neanderthal.DG"
[13] "Papuan.SDG" "Greece_BA_Mycenaean_Pylos" "Greece_BA_Mycenaean" "Micronesia_Pohnpei_400BP"
[17] "Chimp.REF" "JPT.SG" "Mongolia_Salkhit_UP.SG" "Japan_HG_Jomon"
>
This is the test:
f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" ) #
prefix -> is the path to the dataset v52.2_1240K : in my case in another subfolder:
"../v52.2/geno/v52.2_1240K_public"
To run the test just execute the command in R: f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" )
This is the activity log in real time:
i Getting population combinations...
i 20 population combinations found
i Computing from f4 from genotype data...
i Reading metadata...
i Computing block lengths for 1150639 SNPs...
i Computing 20 f4-statistics for block 713 out of 713...
i Summarize across blocks...
Next: the results:
Show Content
Spoiler
# A tibble: 20 x 9
pop1 pop2 pop3 pop4 est se z p n
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Turkey_N Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.00157 0.000346 -4.53 5.97e- 6 1005076
2 Iran_GanjDareh_N Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.000298 0.000218 1.37 1.71e- 1 961681
3 Georgia_Kotias.SG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.000219 0.000247 0.886 3.75e- 1 1010220
4 Russia_Sidelkino_HG.SG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.000152 0.000271 -0.563 5.74e- 1 873308
5 Tajikistan_BA_DashtiKozy Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.000132 0.000202 -0.651 5.15e- 1 913838
6 Greece_Minoan_Lassithi Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.000140 0.000212 0.663 5.07e- 1 894038
7 Greece_Minoan_Kephala_Petras.SG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.0000310 0.000234 0.132 8.95e- 1 1000258
8 Greece_Minoan_Odigitria Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.000182 0.000376 -0.484 6.28e- 1 274463
9 Italy_South_HG_Ostuni2 Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.000853 0.00199 -0.429 6.68e- 1 8797
10 Russia_Ust_Ishim.DG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0 0 NaN NaN 1012212
11 Denisova.DG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.0373 0.00123 30.5 7.57e-204 1012212
12 Vindija_Neanderthal.DG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.0465 0.00205 -22.7 6.61e-114 1012212
13 Papuan.SDG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.00623 0.00132 -4.74 2.16e- 6 1008381
14 Greece_BA_Mycenaean_Pylos Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.000674 0.000585 1.15 2.49e- 1 159773
15 Greece_BA_Mycenaean Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG 0.000551 0.000453 1.22 2.24e- 1 264072
16 Micronesia_Pohnpei_400BP Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.00757 0.00141 -5.38 7.34e- 8 931290
17 Chimp.REF Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.0165 0.00451 -3.66 2.51e- 4 907022
18 JPT.SG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.00283 0.000559 -5.06 4.09e- 7 1012211
19 Mongolia_Salkhit_UP.SG Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.00113 0.000904 -1.25 2.11e- 1 78932
20 Japan_HG_Jomon Russia_Ust_Ishim.DG Denisova.DG Vindija_Neanderthal.DG -0.0000677 0.000213 -0.318 7.50e- 1 894525
RE: Admixtools 2 - TanTin - 03-26-2024
However the problem is that each time I run the same script for the same test populations , I get different results.
next run:
RE: Admixtools 2 - TanTin - 03-26-2024
Now, I am adding additional option for allsnps = T , which seems to be active by default.
So the script is:
f4( prefix, TEST , "Russia_Ust_Ishim.DG", "Denisova.DG" , "Vindija_Neanderthal.DG" )
RE: Admixtools 2 - TanTin - 03-26-2024
I just restarted R-Studio. And rerun the previous test:
This time the numbers for the snips ( the last column) are stable, same numbers . However the est , se and Z are still very different. Z= est/se .
Some numbers are the same in both runs. And the difference for some groups is very bad. I don't see any explanation unless some bug.
RE: Admixtools 2 - TanTin - 03-26-2024
another version
RE: Admixtools 2 - TanTin - 03-26-2024
I can assure you that F4 functions work perfect if you run these on individuals (instead of group populations).
When running F4 functions for individuals only we get the same reliable results on each run. No such random discrepancy that I show above with the groups.
Here is such example:
Show Content
Spoiler
> cluster_data[1:20,]
# A tibble: 20 x 9
pop1 pop2 pop3 pop4 est se z p n
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 tai002_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000915 0.000701 1.31 1.92e- 1 77860
2 Gorilla.REF Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.0169 0.000454 37.1 1.34e-301 520864
3 Gorilla Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.0167 0.000459 36.3 8.54e-289 502768
4 Chimp_HO Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.0182 0.000451 40.5 0 560121
5 Chimp.REF Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.0182 0.000446 40.8 0 548547
6 AltaiNeanderthal.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0461 0.000575 -80.2 0 560059
7 AltaiNeanderthal_snpAD.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0462 0.000576 -80.2 0 560059
8 Chagyrskaya_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0500 0.000577 -86.5 0 560072
9 Denisova11_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.00595 0.000474 12.6 3.65e- 36 426334
10 Goyet_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0510 0.000596 -85.6 0 435769
11 Les_Cottes_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0494 0.000579 -85.3 0 445990
12 Mezmaiskaya2_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0483 0.000600 -80.5 0 406983
13 Spy_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0487 0.000642 -75.8 0 256947
14 Vindija_snpAD.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0559 0.000582 -96.0 0 560121
15 VindijaG1_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.0521 0.000597 -87.3 0 338821
16 HGDP00545 Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.00212 0.000367 5.78 7.26e- 9 559695
17 I3921_noUDG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.00178 0.000379 4.69 2.77e- 6 512697
18 VK551_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000499 0.000350 1.43 1.54e- 1 474692
19 Loschbour.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000498 0.000334 1.49 1.36e- 1 554841
20 Ust_Ishim.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0 0 NaN NaN 560121
> cluster_data[21: length(myData$famInd ),]
# A tibble: 14 x 9
pop1 pop2 pop3 pop4 est se z p n
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 salkhit1_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.000685 0.000747 -0.917 0.359 76629
2 Denisova3.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.0652 0.000607 107. 0 560121
3 Tianyuan Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000317 0.000382 0.831 0.406 489100
4 BK-1653_noUDG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.00000239 0.000380 0.00629 0.995 522874
5 BB7-240_noUDG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.000693 0.000439 -1.58 0.114 447438
6 F6-620_noUDG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.000447 0.000405 -1.10 0.270 538749
7 Yana_old_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000424 0.000338 1.25 0.210 559996
8 Kostenki14 Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000571 0.000360 1.59 0.113 552582
9 Sunghir3_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000660 0.000354 1.86 0.0625 559548
10 Oase1_d Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG -0.00198 0.000633 -3.13 0.00173 115573
11 ZKU002_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000300 0.000370 0.810 0.418 545032
12 KK1_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000671 0.000350 1.92 0.0550 558972
13 SATP_noUDG.SG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000761 0.000375 2.03 0.0428 333318
14 S_Iranian-2.DG Ust_Ishim.DG Denisova3.DG Vindija_snpAD.DG 0.000514 0.000308 1.67 0.0953 554555
RE: Admixtools 2 - TanTin - 03-26-2024
Here are the F4- results when we run the test for Individuals ( not groups).
F4- stats for the individuals are perfectly matching to what we know already. There is 1 special case: Salkhit. It is on the Neanderthal side, so she is more Neanderthal than Ust-Ishim.
RE: Admixtools 2 - AimSmall - 03-26-2024
I don't know the cause of your random results, be curious if the original version has the same behavior.
https://uqrmaie1.github.io/admixtools/articles/admixtools.html#f4-and-qpdstat
The "f4mode = FALSE" seems interesting.
Search the word "Random"
https://uqrmaie1.github.io/admixtools/articles/fstats.html
Your source code is here...
https://github.com/uqrmaie1/admixtools/blob/master/R/qpdstat.R
RE: Admixtools 2 - TanTin - 03-27-2024
(03-26-2024, 09:29 PM)AimSmall Wrote: I don't know the cause of your random results, be curious if the original version has the same behavior.
https://uqrmaie1.github.io/admixtools/articles/admixtools.html#f4-and-qpdstat
The "f4mode = FALSE" seems interesting.
Search the word "Random"
https://uqrmaie1.github.io/admixtools/articles/fstats.html
Your source code is here...
https://github.com/uqrmaie1/admixtools/blob/master/R/qpdstat.R
I am using ‘admixtools’ version 2.0.0.
In general I don't need and I don't use F4 for populations. It is used in many official publications, but I don't use it. The only reason to test something with the groups is to validate some results that I already have by using PCA or other tests.. I prefer to run F4 stats for individuals .
To run F4 for individuals is a little bit tricky. Because I have to modify .ind or .fam file and to replace the group ID with the individual ID.
Running F4 for the groups is the easy and lazy way for using Admixtools.
However with the latest tests that I provided above: it makes nonsense to do such group tests, as these results are not reliable at all. Of course someone may play with the settings.
For example by using: allsnps = F option it may fix the random results and we may get stable results, but it will restrict the number of snips. In the example that I provided when allsnps = F I get only 94 snips included in the results.
For qpWave and qpAdm - I know it is using some algorithms that may create different results even if we apply the same criteria. I did not expect that same could be true for the group F4 stats.
RE: Admixtools 2 - AimSmall - 03-27-2024
Everything I posted was Admixtools 2.0.
RE: Admixtools 2 - kolompar - 03-27-2024
Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.
RE: Admixtools 2 - TanTin - 03-27-2024
(03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.
I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.
RE: Admixtools 2 - Арсен - 04-04-2024
(03-27-2024, 09:52 PM)TanTin Wrote: (03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.
I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.
dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?
RE: Admixtools 2 - TanTin - 04-04-2024
(04-04-2024, 01:21 PM)Арсен Wrote: (03-27-2024, 09:52 PM)TanTin Wrote: (03-27-2024, 09:25 PM)kolompar Wrote: Is this with a merged/converted dataset? That's where I've had this before, but don't know what it is that goes wrong.
I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.
dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?
There is already such topic in Eupedia. Start from there.
https://www.eupedia.com/forum/threads/admixtools2-tutorial-for-windows.42684/
Regarding converting converting bam to plink format - this is another topic. It is a different process. For now try to make your software working with the already available data. (in plink bed,bim fam format).
RE: Admixtools 2 - Арсен - 04-04-2024
(04-04-2024, 07:00 PM)TanTin Wrote: (04-04-2024, 01:21 PM)Арсен Wrote: (03-27-2024, 09:52 PM)TanTin Wrote: I guess this is a bug in Admixtools2. Doesn't matter what dataset we use. We should get the same or near the same results if we specify the same criteria.
But as we see: running the same script - same options - the results are disaster.
This is only for groups statistics.
When I run F4 on individuals I get perfect and stable results.
The bug is definately in Admixtools.
Thanks for confirming that you already have seen that.
dude, teach me how to use this program, question, do I have to download heavy bam ENA files from data library sites for this? and do you need a serious, powerful processor to calculate mixtures?
There is already such topic in Eupedia. Start from there.
https://www.eupedia.com/forum/threads/admixtools2-tutorial-for-windows.42684/
Regarding converting converting bam to plink format - this is another topic. It is a different process. For now try to make your software working with the already available data. (in plink bed,bim fam format).
nothing is clear but very interesting xD
|