09-16-2024, 08:47 PM
Check for new replies
Hell Yes! New AADR v62.0
|
09-16-2024, 09:19 PM
09-16-2024, 09:34 PM
Not a fan of the new hyper verbose header records in the anno files.
So they actually reduced the number of present-day samples compared to the previous version?
09-16-2024, 10:37 PM
There seems to be too many duplicates of the same individuals.
UNTA58_147.AG F Germany_Lech_EBA_possible.contam.AG UNTA58_147_d.AG U Germany_Lech_EBA_lc.AG WEHR_1415adult.AG F Germany_Lech_EBA_mother.WEHR_1415child_possible.contam.AG WEHR_1415adult_d.AG F Germany_Lech_EBA_mother.WEHR_1415child_lc.AG POST_99.AG F Germany_Lech_EBA_possible.contam.AG POST_99_d.AG F Germany_Lech_EBA.AG YCH028.IM M Mexico_ChichenItza_MayaLowlands_LClassic.IM YCH028_d.IM U Mexico_ChichenItza_MayaLowlands_LClassic.IM YPN021_d.SG U Thailand_YappaNhae_2_LogCoffin_IA.SG YPN021.SG F Thailand_YappaNhae_2_LogCoffin_IA.SG
Checking to make sure there isn't anything in my dataset that isn't in v62. Here's what I see missing from v62.
I0726 - Mentese_N I2110 - Verteba cave Trypillia I3619 & I3621 - Taiwan_Hanben_IA EHU002 - Spain_CA_o_steppe ble007 & ble008 - SWE_meso Ancient Australia - "Ancient nuclear genomes enable repatriation of Indigenous human remains" I12438 - England_N_Megalithic VEK007 & VEK009 - Russia_Caucasus_KuraAraxes I0585 - LaBrana1 CLL007 - Spain_SE_Iberia_CA_NO I21035 - Sudan_EarlyChristian I7334 - RUS_Ekven_OldBeringSea A few samples from "Three Reagents for in-Solution Enrichment of Ancient Human DNA at More than a Million..." (The TWIST publication) I2818 I2949 (Dzudzuana) - It's in there as I11857, but ~1/3 snp count. I20703 I20720 I20721 I21299 Some of the "The genomic origins of the Bronze Age Tarim Basin mummies" dataset. MUR009 - Volosovo Dual ancestries and ecologies of the Late Glacial Palaeolithic in Britain (GoughsCave Magdelanian, Kendricks WHG) DCP1 It appears like UKY001 is finally the actual UKY001 instead of a KPT002 duplicate. AG3 and I2158 (Oriente) are also ~1/3 snp count of previous version. Honestly I think 2% contamination or whatever is worth 3x the snps in cases like this. ElMiron is labeled contaminated now? But it also passes assessment, idk. EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets.
Yesterday, 03:27 AM
FYI: there is a name in .ind file with a special character: Ertebølle .
R-studio may not like it. I usually replace these specal characters with regular.
Yesterday, 04:20 AM
Amazing this released today. After Reich's interview recently, I've been rewatching and relistening to it. Not uncommon for me to binge this stuff. Always learn something each time. Bet I've listened to Spencer Wells and Razib Khan's Insight podcasts 50 times each. I binge this stuff. Love John Hawkes as well.
Anyway, was just about to write the Reich Labs asking why no new AADR and why not publishing datasets like they had in the past. I even had a draft I'd been working on. Today, as I randomly do, I checked again and wow... new data. Question: Why did it take so long? 1.5 to almost 2.0 years??? The pace of updates and papers has definitely changed. Very curious why the change in release of data and papers. Listening to Reich recently, he implies there are probably 10K samples yet to be released. (Yesterday, 02:38 AM)Kale Wrote: EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets. DCP1 VEK007+VEK009 ROT+BOO Mura https://mega.nz/folder/JvdwUZhI#mxbiTgsAR8KMeERUt6vgLg
Yesterday, 12:21 PM
Another good news here: we have the WHG samples from SimoesPNAS2024
hoe001.SG - tev001 and other included in v62 dataset. These are high coverage. (6351-6069 calBCE) . Excellent quaality.
Yesterday, 12:37 PM
(Yesterday, 04:38 AM)Light Wrote:(Yesterday, 02:38 AM)Kale Wrote: EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets. Kofiko in the Refugium posted the 1240K https://www.mediafire.com/file/wywscie9d...K.zip/file
Yesterday, 01:29 PM
v62: we have Morocco_OUB_EpiP.SG
oub002.SG This one is high coverage, missingness is only: 0.0006273 This one is from TAF cluster, however the previous TAF were low coverage. This will allow us to see more details for North Africans.
11 hours ago
(Yesterday, 01:29 PM)TanTin Wrote: v62: we have Morocco_OUB_EpiP.SG I’ll start merging my sons and wife’s data to this today a
4 hours ago
So, merged my family with v62. Goal was to do an ADMIXTURE run, figured I'd start with 16 components. Nothing I haven't done numerous times.
Holy crap. The new file format of the anno file is a PITA. Not only are the headers crazy verbose, the fields are no longer in the same order. Been busy redoing scripts to accommodate the new layout. Ugggh. Extracting all of the DG including my family for a 16 components run. Then I can use those to project onto the rest of the dataset. Have server, will smoke it. LOL |
« Next Oldest | Next Newest »
|
Check for new replies
Users browsing this thread: Square, 2 Guest(s)