Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Late Antiquity-Early Middle Ages cemetery in the Eastern Italian Alps
#46
My hard disk is almost full, the BAM files take a lot of space and my pc is old. There are 53 BAM in the link. If you tell me which i can download a couple.
Or if you want i will explain you how to use WGSExtract and you can do it yourself. I think it's better. Teach a man to fish and you feed him for a lifetime.
[Image: 5BkBPRh]
Reply
#47
There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?
ChrisR likes this post
Reply
#48
(12-05-2023, 05:34 PM)billh Wrote: [Image: PYCL80e.png]
This is the results I got, with modern Italians for comparison. 


Model:
Code:
Italo-Etruscan:ITA_Etruscan_Tarquinia,0.1255089,0.1563238,0.0373098,-0.0154825,0.0468804,-0.0084969,-0.0004387,-0.0011229,0.0230294,0.0454496,-0.0003139,0.0102111,-0.0207133,-0.0058994,-0.0028231,-0.0061081,0.0007301,0.0023056,0.003662,-0.0076035,-0.0007654,0.0046244,-0.0039932,-0.0057276,-0.0007105
Italo-Etruscan:ITA_Ardea_Latini_IA,0.133173,0.156392,0.0445,-0.00969,0.044008,-0.004462,0.00846,0.003,0.024543,0.044101,-0.002598,0.012289,-0.022448,-0.009634,-0.005429,-0.005834,0.00352,0.003167,0.006034,-0.007379,-0.008859,0.005317,-0.000863,-0.006989,-0.001796
Germanic:POL_Weklice,0.1325406,0.1295364,0.0744184,0.0634874,0.0426062,0.0238607,0.0064236,0.0109483,0.0049312,-0.0093142,-0.0056656,0.005262,-0.0080772,-0.0048472,0.0188952,0.0100621,0.0004054,0.0001832,0.00169,0.0044188,0.0092338,0.0032837,-0.0003151,0.0134421,-0.0010646
Germanic:DNK_Jutland_IA,0.136588,0.13405,0.070522,0.069445,0.036007,0.022869,0.009635,0.013615,0.012885,-0.004374,0.006333,0.019633,-0.014271,-0.021744,0.015472,0.010607,0.001304,0.007095,0.00176,0.02101,0.014974,-0.000247,0.000986,0.025666,-0.00467
Celtic:FRA_Occitanie_IA2,0.1300432,0.135827,0.0581708,0.021964,0.0506245,0.0122012,-0.002174,-0.003173,0.0177935,0.0249665,0.001827,0.0092542,-0.0208498,-0.0103905,0.0163885,0.0046072,0.0031617,0.0065245,-0.0043993,-0.0080038,0.0099512,0.0025348,-0.0097983,-0.0059348,3e-07
Celtic:CZE_IA_La_Tene_Hallstatt,0.127482,0.142174,0.052797,0.026809,0.038469,0.01004,0.00188,-0.001385,0.013499,0.020046,-0.001624,0.005245,-0.007582,-0.012937,0.001221,0.013524,0.022948,0.005448,0.004022,0.006628,0.009483,0.010758,-0.006286,-0.004458,0.000958
Magna_Graecia:ITA_Sicily_Himera_480BCE_1,0.1187011,0.1620494,-0.00598,-0.0614623,0.0252794,-0.0257376,-0.0023837,-0.0028021,0.0066326,0.0438149,0.004083,0.0099769,-0.0185401,-0.0035783,-0.0185743,-0.0092623,0.0119953,-0.000398,0.0077751,-0.008647,-0.0102497,0.005317,-0.0009156,0.0060247,-0.0047044
Levant:Levant_Beirut_IAIII,0.08679,0.1505522,-0.0472345,-0.0877349,-0.009771,-0.0319679,-0.0048469,-0.0089995,0.0089224,0.011663,0.0068408,-0.0044585,0.0097372,0.000602,-0.0083978,0.0025359,0.0007988,-0.0003166,0.003441,0,0.0016845,0.0050852,-0.0039592,0.0009639,-0.0024549
Anatolia_BA:TUR_Kaman-Kalehoyuk_MLBA,0.1050018,0.1515678,-0.042332,-0.082365,-0.0040775,-0.0274705,-0.0024088,-0.0077882,-0.011402,0.028429,0.0097435,0.007006,-0.0120788,0.0030965,-0.0138435,-0.004475,0.0116693,-0.0021538,0.0087988,-0.00741,-0.0031817,0.0061828,-0.0048065,0.0030725,-0.001407
North_African:MAR_LN,0.021626,0.148267,0.003394,-0.095285,0.047393,-0.054384,-0.027731,0.008769,0.083855,0.054124,0.020136,0.001798,0.002973,-0.028901,0.004343,0.009944,0.032726,-0.014062,-0.033938,-5e-04,-0.018343,-0.02201,0.011709,-0.009881,-0.004191
Sicani:ITA_Sicily_Himera_East_Necropolis,0.118376,0.164516,0.015085,-0.068153,0.052933,-0.022869,-0.00799,-0.003231,0.031088,0.060867,0.00065,0.01079,-0.022299,0.002064,-0.024022,-0.023866,0.001825,-0.001267,0.000503,-0.016508,0.000499,-0.000124,-0.005546,-0.010122,-0.001796


It seems like G25 doesn't model Mediterrenean groups correctly though in my opinion. Germanic seems to always act as an absorbing force. The problem is much worse when you use Southern Italians. Celtic seems underrepresented. If we had more ancient DNA from Northern Italy (Gauls, Ligurians, Rhaetians) this would be easier

I should be between Bergamo and Tuscany yet my figures are:

9.8 pc Italo-Etruscan; 33.0 pc Germanic; 11.7 pc Levant; 28.6 pc Sicani (!!!); 11.1 pc Anatolian_BA; 5.8 pc  Celtic
Y-DNA R-Z36 (A7967)                                                                          mtDNA U6A7A1
Reply
#49
(03-14-2024, 04:31 PM)ilabv Wrote: There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?

It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged. And the papers without bams need aligning which takes even longer. Personally I do this with my personal PC and do not want to assign all of the resources just to aDNA as I want to use it for other things as well. A dedicated machine is expensive and I think most of us do this with our own money.
Capsian20, ChrisR, Qrts And 1 others like this post
Reply
#50
(03-14-2024, 08:59 PM)teepean Wrote:
(03-14-2024, 04:31 PM)ilabv Wrote: There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?

It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged. And the papers without bams need aligning which takes even longer. Personally I do this with my personal PC and do not want to assign all of the resources just to aDNA as I want to use it for other things as well. A dedicated machine is expensive and I think most of us do this with our own money.

I have a dedicated machine available so if you or anyone else wants to teach me please do (certainly in return for compensation)
ChrisR and Stefano like this post
Reply
#51
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.

I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code:
‹     ÿ BC 2BŽ͏亲'–³Ë7Ð␠Úˆ„0mبîªî»jr8ÚÀ³÷ Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RԏñAgÿùß=‡

Decompressed file header begins like
Code:
BAM0  @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
    3  ÿÿÿÿÿÿÿÿ+    M —  ÿÿÿÿÿÿÿÿ    ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Reply
#52
(03-17-2024, 01:11 PM)ChrisR Wrote:
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.

I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code:
‹     ÿ BC 2BŽ͏亲'–³Ë7Ð␠Úˆ„0mبîªî»jr8ÚÀ³÷ Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RԏñAgÿùß=‡

Decompressed file header begins like
Code:
BAM0  @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
    3  ÿÿÿÿÿÿÿÿ+    M —  ÿÿÿÿÿÿÿÿ    ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚

surely corrupted file, I wanted to download this BAM to my usegalaxy account although there's no link to galaxy yet. However I downloaded to usegalaxy,eu both FASTQ file and they are running now with BWA-MEM against T2T CHM13 v2.0
Reply
#53
(03-17-2024, 01:11 PM)ChrisR Wrote:
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.

I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code:
‹     ÿ BC 2BŽ͏亲'–³Ë7Ð␠Úˆ„0mبîªî»jr8ÚÀ³÷ Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RԏñAgÿùß=‡

Decompressed file header begins like
Code:
BAM0  @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
    3  ÿÿÿÿÿÿÿÿ+    M —  ÿÿÿÿÿÿÿÿ    ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚

The BAMs at that location are unmapped BAMs so they have to be aligned first.

This comment "first sorted and merged" was not about this paper, sorry.
ChrisR likes this post
Reply
#54
(03-17-2024, 01:11 PM)ChrisR Wrote:
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.

I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code:
‹     ÿ BC 2BŽ͏亲'–³Ë7Ð␠Úˆ„0mبîªî»jr8ÚÀ³÷ Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RԏñAgÿùß=‡

Decompressed file header begins like
Code:
BAM0  @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
    3  ÿÿÿÿÿÿÿÿ+    M —  ÿÿÿÿÿÿÿÿ    ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
Reply
#55
(03-17-2024, 02:21 PM)Stefano Wrote:
(03-17-2024, 01:11 PM)ChrisR Wrote:
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.

I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code:
‹     ÿ BC 2BŽ͏亲'–³Ë7Ð␠Úˆ„0mبîªî»jr8ÚÀ³÷ Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RԏñAgÿùß=‡

Decompressed file header begins like
Code:
BAM0  @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
    3  ÿÿÿÿÿÿÿÿ+    M —  ÿÿÿÿÿÿÿÿ    ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?

Usually the BAMs are aligned so this is rare.

Code:
$ samtools view -H ERR12074641.bam
@HD     VN:1.6  SO:queryname
@RG     ID:A    SM:ERS15930731
@PG     ID:samtools     PN:samtools     VN:1.19.2       CL:C:\msys64\ucrt64\bin\samtools.exe view -H ERR12074641.bam
Reply
#56
(03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?

I tried 3 files and it seems to happen for all. This the console log text:
Code:
--- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS:   0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header
Reply
#57
(03-17-2024, 07:40 PM)ChrisR Wrote:
(03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?

I tried 3 files and it seems to happen for all. This the console log text:
Code:
--- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS:   0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header

Check if your antivirus removed/quarantined samtools.exe or other components, if not it is a BAM problem. samt samtools.exe ls.exe
Reply
#58
sorry, I deleted stats because FASTQ file are dirty. I am running CUTADAPT to clean adapters
Capsian20 likes this post
Reply
#59
(03-17-2024, 07:50 PM)Stefano Wrote: Check if your antivirus removed/quarantined samtools.exe or other components, if not it is a BAM problem. samt samtools.exe ls.exe

We are getting OT here ;-) But no: samtools.exe is still there and in AV I could not find anything in quarantine or other warning messages. So unfortunately seems no problem on my system.
Reply
#60
(03-17-2024, 07:40 PM)ChrisR Wrote:
(03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?

I tried 3 files and it seems to happen for all. This the console log text:
Code:
--- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS:   0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header

Like I said earlier: the BAMs are not aligned. You cannot process them with WGSExtract.
ChrisR likes this post
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)