Posts: 31
Threads: 2
Joined: Nov 2023
Gender: Male
Y-DNA (P): R1b-U152
mtDNA (M): H1
My hard disk is almost full, the BAM files take a lot of space and my pc is old. There are 53 BAM in the link. If you tell me which i can download a couple.
Or if you want i will explain you how to use WGSExtract and you can do it yourself. I think it's better. Teach a man to fish and you feed him for a lifetime.
Posts: 149
Threads: 1
Joined: Oct 2023
There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?
Posts: 22
Threads: 0
Joined: Dec 2023
(12-05-2023, 05:34 PM)billh Wrote:
This is the results I got, with modern Italians for comparison.
Model:
Code: Italo-Etruscan:ITA_Etruscan_Tarquinia,0.1255089,0.1563238,0.0373098,-0.0154825,0.0468804,-0.0084969,-0.0004387,-0.0011229,0.0230294,0.0454496,-0.0003139,0.0102111,-0.0207133,-0.0058994,-0.0028231,-0.0061081,0.0007301,0.0023056,0.003662,-0.0076035,-0.0007654,0.0046244,-0.0039932,-0.0057276,-0.0007105
Italo-Etruscan:ITA_Ardea_Latini_IA,0.133173,0.156392,0.0445,-0.00969,0.044008,-0.004462,0.00846,0.003,0.024543,0.044101,-0.002598,0.012289,-0.022448,-0.009634,-0.005429,-0.005834,0.00352,0.003167,0.006034,-0.007379,-0.008859,0.005317,-0.000863,-0.006989,-0.001796
Germanic:POL_Weklice,0.1325406,0.1295364,0.0744184,0.0634874,0.0426062,0.0238607,0.0064236,0.0109483,0.0049312,-0.0093142,-0.0056656,0.005262,-0.0080772,-0.0048472,0.0188952,0.0100621,0.0004054,0.0001832,0.00169,0.0044188,0.0092338,0.0032837,-0.0003151,0.0134421,-0.0010646
Germanic:DNK_Jutland_IA,0.136588,0.13405,0.070522,0.069445,0.036007,0.022869,0.009635,0.013615,0.012885,-0.004374,0.006333,0.019633,-0.014271,-0.021744,0.015472,0.010607,0.001304,0.007095,0.00176,0.02101,0.014974,-0.000247,0.000986,0.025666,-0.00467
Celtic:FRA_Occitanie_IA2,0.1300432,0.135827,0.0581708,0.021964,0.0506245,0.0122012,-0.002174,-0.003173,0.0177935,0.0249665,0.001827,0.0092542,-0.0208498,-0.0103905,0.0163885,0.0046072,0.0031617,0.0065245,-0.0043993,-0.0080038,0.0099512,0.0025348,-0.0097983,-0.0059348,3e-07
Celtic:CZE_IA_La_Tene_Hallstatt,0.127482,0.142174,0.052797,0.026809,0.038469,0.01004,0.00188,-0.001385,0.013499,0.020046,-0.001624,0.005245,-0.007582,-0.012937,0.001221,0.013524,0.022948,0.005448,0.004022,0.006628,0.009483,0.010758,-0.006286,-0.004458,0.000958
Magna_Graecia:ITA_Sicily_Himera_480BCE_1,0.1187011,0.1620494,-0.00598,-0.0614623,0.0252794,-0.0257376,-0.0023837,-0.0028021,0.0066326,0.0438149,0.004083,0.0099769,-0.0185401,-0.0035783,-0.0185743,-0.0092623,0.0119953,-0.000398,0.0077751,-0.008647,-0.0102497,0.005317,-0.0009156,0.0060247,-0.0047044
Levant:Levant_Beirut_IAIII,0.08679,0.1505522,-0.0472345,-0.0877349,-0.009771,-0.0319679,-0.0048469,-0.0089995,0.0089224,0.011663,0.0068408,-0.0044585,0.0097372,0.000602,-0.0083978,0.0025359,0.0007988,-0.0003166,0.003441,0,0.0016845,0.0050852,-0.0039592,0.0009639,-0.0024549
Anatolia_BA:TUR_Kaman-Kalehoyuk_MLBA,0.1050018,0.1515678,-0.042332,-0.082365,-0.0040775,-0.0274705,-0.0024088,-0.0077882,-0.011402,0.028429,0.0097435,0.007006,-0.0120788,0.0030965,-0.0138435,-0.004475,0.0116693,-0.0021538,0.0087988,-0.00741,-0.0031817,0.0061828,-0.0048065,0.0030725,-0.001407
North_African:MAR_LN,0.021626,0.148267,0.003394,-0.095285,0.047393,-0.054384,-0.027731,0.008769,0.083855,0.054124,0.020136,0.001798,0.002973,-0.028901,0.004343,0.009944,0.032726,-0.014062,-0.033938,-5e-04,-0.018343,-0.02201,0.011709,-0.009881,-0.004191
Sicani:ITA_Sicily_Himera_East_Necropolis,0.118376,0.164516,0.015085,-0.068153,0.052933,-0.022869,-0.00799,-0.003231,0.031088,0.060867,0.00065,0.01079,-0.022299,0.002064,-0.024022,-0.023866,0.001825,-0.001267,0.000503,-0.016508,0.000499,-0.000124,-0.005546,-0.010122,-0.001796
It seems like G25 doesn't model Mediterrenean groups correctly though in my opinion. Germanic seems to always act as an absorbing force. The problem is much worse when you use Southern Italians. Celtic seems underrepresented. If we had more ancient DNA from Northern Italy (Gauls, Ligurians, Rhaetians) this would be easier
I should be between Bergamo and Tuscany yet my figures are:
9.8 pc Italo-Etruscan; 33.0 pc Germanic; 11.7 pc Levant; 28.6 pc Sicani (!!!); 11.1 pc Anatolian_BA; 5.8 pc Celtic
Y-DNA R-Z36 (A7967) mtDNA U6A7A1
Posts: 104
Threads: 5
Joined: Oct 2023
(03-14-2024, 04:31 PM)ilabv Wrote: There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?
It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged. And the papers without bams need aligning which takes even longer. Personally I do this with my personal PC and do not want to assign all of the resources just to aDNA as I want to use it for other things as well. A dedicated machine is expensive and I think most of us do this with our own money.
Posts: 149
Threads: 1
Joined: Oct 2023
(03-14-2024, 08:59 PM)teepean Wrote: (03-14-2024, 04:31 PM)ilabv Wrote: There are seemingly many studies who's raw data were never converted to G25 coordinates... does anyone have a list of all of them?
It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged. And the papers without bams need aligning which takes even longer. Personally I do this with my personal PC and do not want to assign all of the resources just to aDNA as I want to use it for other things as well. A dedicated machine is expensive and I think most of us do this with our own money.
I have a dedicated machine available so if you or anyone else wants to teach me please do (certainly in return for compensation)
ChrisR and Stefano like this post
Posts: 141
Threads: 13
Joined: Oct 2023
(03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.
I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code: ‹ ÿ BC 2BŽÍ亲'–³Ë7Ðâ Úˆ„0mبîªî»jr8ÚÀ³÷Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RÔñAgÿùß=‡
Decompressed file header begins like
Code: BAM0 @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
3 ÿÿÿÿÿÿÿÿ+ M — ÿÿÿÿÿÿÿÿ ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Posts: 41
Threads: 0
Joined: Feb 2024
(03-17-2024, 01:11 PM)ChrisR Wrote: (03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.
I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code: ‹ ÿ BC 2BŽÍ亲'–³Ë7Ðâ Úˆ„0mبîªî»jr8ÚÀ³÷Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RÔñAgÿùß=‡
Decompressed file header begins like
Code: BAM0 @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
3 ÿÿÿÿÿÿÿÿ+ M — ÿÿÿÿÿÿÿÿ ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
surely corrupted file, I wanted to download this BAM to my usegalaxy account although there's no link to galaxy yet. However I downloaded to usegalaxy,eu both FASTQ file and they are running now with BWA-MEM against T2T CHM13 v2.0
Posts: 104
Threads: 5
Joined: Oct 2023
03-17-2024, 02:13 PM
(This post was last modified: 03-17-2024, 02:14 PM by teepean.)
(03-17-2024, 01:11 PM)ChrisR Wrote: (03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.
I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code: ‹ ÿ BC 2BŽÍ亲'–³Ë7Ðâ Úˆ„0mبîªî»jr8ÚÀ³÷Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RÔñAgÿùß=‡
Decompressed file header begins like
Code: BAM0 @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
3 ÿÿÿÿÿÿÿÿ+ M — ÿÿÿÿÿÿÿÿ ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
The BAMs at that location are unmapped BAMs so they have to be aligned first.
This comment "first sorted and merged" was not about this paper, sorry.
Posts: 31
Threads: 2
Joined: Nov 2023
Gender: Male
Y-DNA (P): R1b-U152
mtDNA (M): H1
(03-17-2024, 01:11 PM)ChrisR Wrote: (03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.
I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code: ‹ ÿ BC 2BŽÍ亲'–³Ë7Ðâ Úˆ„0mبîªî»jr8ÚÀ³÷Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RÔñAgÿùß=‡
Decompressed file header begins like
Code: BAM0 @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
3 ÿÿÿÿÿÿÿÿ+ M — ÿÿÿÿÿÿÿÿ ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
Posts: 104
Threads: 5
Joined: Oct 2023
(03-17-2024, 02:21 PM)Stefano Wrote: (03-17-2024, 01:11 PM)ChrisR Wrote: (03-14-2024, 08:59 PM)teepean Wrote: It is not difficult to do but it does take time and resources and for example with this paper the bams have to be first sorted and merged.
I tried to download the ERR12074641_2424-US105c BAM from ftp://ftp.sra.ebi.ac.uk/vol1/err/ERR120/...074641.bam
It seems to be compressed. Either the original and decompressed file when opened in WGSExtract gives
Error processing the BAM File Header
So not sure if the download is corrupted (had the same error for 3 samples downloaded) or some other preprocessing is necessary.
Original file header begins like
Code: ‹ ÿ BC 2BŽÍ亲'–³Ë7Ðâ Úˆ„0mبîªî»jr8ÚÀ³÷Ãð΋YÚ€
xáÝ!Á7^ûϵ#‚”RY•Y•]G÷>ö9™ú®RÔñAgÿùß=‡
Decompressed file header begins like
Code: BAM0 @HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
3 ÿÿÿÿÿÿÿÿ+ M — ÿÿÿÿÿÿÿÿ ST-E00181:870:HF7J2CCX2:8:1101:10003:10767 D„AAHHˆ!‚B$„""B$$H$B$"H"D$A‚
Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
Usually the BAMs are aligned so this is rare.
Code: $ samtools view -H ERR12074641.bam
@HD VN:1.6 SO:queryname
@RG ID:A SM:ERS15930731
@PG ID:samtools PN:samtools VN:1.19.2 CL:C:\msys64\ucrt64\bin\samtools.exe view -H ERR12074641.bam
Posts: 141
Threads: 13
Joined: Oct 2023
(03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
I tried 3 files and it seems to happen for all. This the console log text:
Code: --- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS: 0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header
Posts: 31
Threads: 2
Joined: Nov 2023
Gender: Male
Y-DNA (P): R1b-U152
mtDNA (M): H1
(03-17-2024, 07:40 PM)ChrisR Wrote: (03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
I tried 3 files and it seems to happen for all. This the console log text:
Code: --- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS: 0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header
Check if your antivirus removed/quarantined samtools.exe or other components, if not it is a BAM problem. samt samtools.exe ls.exe
Posts: 41
Threads: 0
Joined: Feb 2024
03-17-2024, 08:37 PM
(This post was last modified: 03-17-2024, 09:18 PM by miquirumba.
Edit Reason: dirty FASTQs
)
sorry, I deleted stats because FASTQ file are dirty. I am running CUTADAPT to clean adapters
Capsian20 likes this post
Posts: 141
Threads: 13
Joined: Oct 2023
(03-17-2024, 07:50 PM)Stefano Wrote: Check if your antivirus removed/quarantined samtools.exe or other components, if not it is a BAM problem. samt samtools.exe ls.exe
We are getting OT here ;-) But no: samtools.exe is still there and in AV I could not find anything in quarantine or other warning messages. So unfortunately seems no problem on my system.
Posts: 104
Threads: 5
Joined: Oct 2023
(03-17-2024, 07:40 PM)ChrisR Wrote: (03-17-2024, 02:21 PM)Stefano Wrote: Does it happen for all the files or just few? it may be the same problem i encountered, the antivirus removed an .exe file needed for header checking. What does the program say in the black console screen?
I tried 3 files and it seems to happen for all. This the console log text:
Code: --- Exec: GetBAMHeader.sh, started @ Sun Mar 17 20:37:04 2024
+ C:/WGSExtract/cygwin64/usr/local/bin/samtools.exe view -H --no-PG 'C:/AncientBAM/ERR12074641.bam'
--- SUCCESS: 0 seconds to run: GetBAMHeader.sh (finished @ Sun Mar 17 20:37:04 2024
***ERROR: BAM / CRAM file error:
C:/AncientBAM/ERR12074641.bam
Error processing the BAM File Header
Like I said earlier: the BAMs are not aligned. You cannot process them with WGSExtract.
|