Posts: 323
Threads: 8
Joined: Oct 2023
(03-01-2024, 11:56 PM)Tomenable Wrote: (03-01-2024, 11:48 PM)TanTin Wrote: (03-01-2024, 11:37 PM)Tomenable Wrote: I think that paper has data from Qatar, but not from Bahrain?
Yes, mostly Qatar, but there are many others as well. There are more than 800 in total.
Which of these samples are from Bahrain?
Not from Bahrain exactly, but from the region in general. The populations there are not very different from each other.
Posts: 323
Threads: 8
Joined: Oct 2023
Regarding these 4 samples: we have the fastq data: (4 files)
https://www.ebi.ac.uk/ena/browser/view/PRJEB71330
Posts: 323
Threads: 8
Joined: Oct 2023
Converted one of the Bahrain individuals:
https://ufile.io/7yaq3rrq
( Four genomes from Tylos-period Bahrain: SAMEA115108455 ERX11887579 ERR12512058 9606 )
Posts: 104
Threads: 5
Joined: Oct 2023
(03-01-2024, 11:14 PM)Tomenable Wrote: Teepean converted them. I sent you in a PM.
For some reason I couldn't align them with bwa aln but had to use bwa mem.
Posts: 323
Threads: 8
Joined: Oct 2023
(03-02-2024, 01:28 PM)teepean Wrote: (03-01-2024, 11:14 PM)Tomenable Wrote: Teepean converted them. I sent you in a PM.
For some reason I couldn't align them with bwa aln but had to use bwa mem.
I also did the alignment . Here are the 4 files:
https://ufile.io/8s5jtzvn
Please note: there are multiple variants with 3+ alleles present
# 3 more multiple-position warnings: see log file.
# Error: 98093 variants with 3+ alleles present.
So for one individual that I checked, I had to exclude almost half of all the snips because of the 3-rd allele present. (ERR12512058) . Excluding these 98000 snips is not fatal, there are 170 k other snips left.
The other 3 samples have lot more snips, but I didn't have time to check more on them.
Capsian20 and teepean like this post
Posts: 323
Threads: 8
Joined: Oct 2023
Another version for Tylos, I removed the variants with 3+ alleles present.
> missing$F_MISS
[1] 0.8138 0.3887 0.4102 0.4722
> missing$IID
[1] "B_ERR12512058" "ERR12512055.bam" "ERR12512056.bam" "ERR12512057.bam"
https://ufile.io/cwgdu23l
The quality if this data is still very good. More than 700 k variants.
Capsian20 likes this post
Posts: 104
Threads: 5
Joined: Oct 2023
ionix and Capsian20 like this post
Posts: 323
Threads: 8
Joined: Oct 2023
03-02-2024, 10:14 PM
When I do a projection for the new Bahrain data, I have different positions on PCA.
I am not sure which one is the correct position. When I did the alignment, I skip the step with the filtering. Also I notice a difference in my .bim file. In my file I have such info:
23 rs4829294 0 33611081 A C
23 rs12014055 0 33618765 . C <-----------------------Dots for the missing
23 rs5972902 0 33619246 G A
23 snp_23_33623357 0 33623357 . G
23 rs1878889 0 33625822 A C
in teepan .bim file:
23 rs12013178 0.981526 53292827 C A <--- all the alleles are listed properly.
23 rs2315863 0.981748 53302195 G C
23 rs1409117 0.981761 53302495 A G
23 rs6638377 0.981833 53304115 G T
23 rs2094145 0.982004 53307984 G C
23 rs6529669 0.982055 53308578 C T
23 rs188067797 0.982304 53310072 G C
This could be a result from the convert from vcf to bed/bim/fam.
I did the alignment on galaxy.org site, as per the steps described in the other topic (except that I filter only ID=="." )
As you may see, my data on PCA is more related to Yemen / Eritrea/ Morocco. The data converted from Teepean is more related to Iran, Causas and Europe. It seems in the official publication they also find the position as more related to Europe.
Capsian20 likes this post
Posts: 323
Threads: 8
Joined: Oct 2023
This is the official PCA from the publication:
https://ars.els-cdn.com/content/image/1-...4X-gr1.jpg
As you may notice, the ancient Bahrain individuals goes almost in the middle of Europe and Caucasus.
Capsian20 likes this post
Posts: 104
Threads: 5
Joined: Oct 2023
(03-02-2024, 10:14 PM)TanTin Wrote: When I do a projection for the new Bahrain data, I have different positions on PCA.
I am not sure which one is the correct position. When I did the alignment, I skip the step with the filtering. Also I notice a difference in my .bim file. In my file I have such info:
23 rs4829294 0 33611081 A C
23 rs12014055 0 33618765 . C <-----------------------Dots for the missing
23 rs5972902 0 33619246 G A
23 snp_23_33623357 0 33623357 . G
23 rs1878889 0 33625822 A C
in teepan .bim file:
23 rs12013178 0.981526 53292827 C A <--- all the alleles are listed properly.
23 rs2315863 0.981748 53302195 G C
23 rs1409117 0.981761 53302495 A G
23 rs6638377 0.981833 53304115 G T
23 rs2094145 0.982004 53307984 G C
23 rs6529669 0.982055 53308578 C T
23 rs188067797 0.982304 53310072 G C
This could be a result from the convert from vcf to bed/bim/fam.
I did the alignment on galaxy.org site, as per the steps described in the other topic (except that I filter only ID=="." )
As you may see, my data on PCA is more related to Yemen / Eritrea/ Morocco. The data converted from Teepean is more related to Iran, Causas and Europe. It seems in the official publication they also find the position as more related to Europe.
I used following process: remove adapters -> align with bwa aln -> remove duplicates -> get the dataset with pileupCaller.
Posts: 78
Threads: 3
Joined: Sep 2023
Gender: Undisclosed
Ethnicity: Levantine
03-03-2024, 10:51 AM
(This post was last modified: 03-03-2024, 10:51 AM by Qrts.)
Here are the coords:
Quote:AS_EMT2,0.084229,0.126941,-0.068636,-0.064277,-0.03139,-0.030678,-0.005405,-0.013846,-0.015339,-0.006378,0.002273,-0.002698,0.010406,-0.007982,0.001357,0.02254,0.005476,-0.000127,0.005405,-0.006503,-0.012353,-0.008656,-0.00456,0.000723,0.007664
MH2_LT2,0.084229,0.118817,-0.066373,-0.046512,-0.033237,-0.011992,0.00329,-0.001385,-0.01493,-0.007107,0.002436,-0.008842,0.020069,0.005092,0.002307,0.00411,-0.020861,0.002154,0.002891,-0.005878,-0.006613,0.000618,-0.005546,0.001325,0.007784
MH1_LT2,0.086506,0.128972,-0.067127,-0.064277,-0.03139,-0.012829,-0.001175,-0.006692,-0.000614,-0.005103,0.003897,-0.014987,0.023637,-0.003441,0.001493,0.001856,-0.019558,-0.002787,-0.000754,-0.001501,0.007112,-0.001978,0.001232,-0.00241,0.00467
MH3_LT2,0.087644,0.132019,-0.071276,-0.075905,-0.034468,-0.017849,-0.006345,-0.008077,0.004909,0.000364,0.008119,-0.016335,0.024083,0.003716,-0.000136,0.017634,-0.002086,0.011529,0.007668,0.001376,0.001996,0.003586,-0.002711,0.00012,0.003832
Posts: 78
Threads: 3
Joined: Sep 2023
Gender: Undisclosed
Ethnicity: Levantine
03-03-2024, 11:20 AM
(This post was last modified: 03-03-2024, 11:23 AM by Qrts.)
They're significantly Iranian and Arabian admixed, with some South-Central Asian in the mix. Although it doesn't seem like MH3 is 'Levantine admixed' as they said, it's only 'less Iranian'.
The sources:
Code: Mesopotamia_BA:Iran_DinkhaTepe_MLBA,0.0878714,0.141768,-0.0678816,-0.0858534,-0.0256046,-0.0293392,0.000329,-0.0077536,-0.0067084,0.00277,0.0073724,-0.0039562,0.0107928,0.0030552,-0.0021172,0.0091222,-0.0037552,0.0009376,0.003947,0.0007254,0.003968,0.0035118,-0.005472,-0.0021448,0.0058436
Mesopotamia_BA:Turkey_Mesopotamia_Sirnak_EBA,0.09418875,0.14318925,-0.06665625,-0.0799425,-0.019388,-0.03005025,-0.00188,-0.00738425,-0.014981,0.00154875,0.00499325,-0.00273525,0.00364225,0.00681225,-0.0075325,-0.00089475,-0.01258225,0.003357,0.00402225,-0.001657,0.00698775,0.00398775,-0.005053,-0.010905,0.000988
Syria_TellQarassa_Umayyad,0.0682935,0.155376,-0.0550595,-0.127909,-0.0004615,-0.056615,-0.0186835,-0.0125765,0.0651405,0.0072895,0.0185935,-0.028924,0.069053,0.004473,0.01079,0.0230045,-0.035269,-0.003864,0.00088,0.0345165,0.0207135,0.0072335,-0.0070865,0.0030125,-0.01443
Iranian_Zoroastrian,0.091627545,0.10778459,-0.063219091,-0.022521909,-0.045365,0.0040438636,0.0018586364,-0.005318,-0.028373045,-0.016840318,-0.00025090909,0.000361,0.0043652273,-0.0049418182,0.0071253182,0.013554227,-0.0035262273,0.002885,0.0017312727,-0.0087711818,-0.0032215909,-0.0037656818,0.00067772727,-0.003489,0.0053668636
Armenia_Lchashen_LBA,0.10974442,0.12727967,-0.038403583,-0.015315583,-0.034365333,0.0029515833,0.0066193333,-0.0068074167,-0.04678475,-0.02426775,0.002422,0.0033346667,-0.010492917,-0.002913,0.0070575833,-0.0026739167,-0.00140175,0.00029566667,-0.00116275,0.0016153333,0.0038474167,0.00021641667,-0.00126325,-0.00215875,-0.0011575
Kalash,0.083556455,0.024972818,-0.084166409,0.066391182,-0.071649591,0.039982773,0.0029695455,0.0020559545,-0.030743636,-0.025256318,-0.0056910909,-0.00052440909,-0.0027299545,-0.011003545,0.016724455,0.0086965909,-0.013951136,0.0020442727,0.00063981818,-0.0128755,-0.0038171818,-0.0053057273,0.0030475909,-0.0034286364,0.0034945
Posts: 32
Threads: 0
Joined: Oct 2023
When are these dated too??
Posts: 104
Threads: 5
Joined: Oct 2023
The paper says the raw data has been published at accession PRJEB31781. Does anyone know how to match the names from PRJEB71330 to this one?
Posts: 104
Threads: 5
Joined: Oct 2023
(03-03-2024, 11:35 AM)pegasus Wrote: When are these dated too??
Quote:Due to poor collagen preservation, we could only obtain radiocarbon dates for two out of the four sequenced individuals, placing them in the Late Tylos/Sasanian period (LT, ∼300–622 CE), with MH1 being older (432–561 cal. CE) than MH3 (577–647 cal. CE) (Figure S1B). Sample MH2 was not directly dated, but its archaeological context places it in the Late Tylos period. The Abu Saiba sample was excavated from a cemetery with known occupation between 200 BCE and 300 CE,7,27 and therefore it dates confidently within the boundaries of the Early/Middle Tylos period (EMT), more precisely during the times of Seleucid and Characene influence in Bahrain, which preceded the emergence of the Sasanian Empire.
Qrts and pegasus like this post
|