Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Check for new replies
Hell Yes! New AADR v62.0
#1
It's about time!

The Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes - David Reich Lab Dataverse (harvard.edu)

https://dataverse.harvard.edu/file.xhtml...d=10537416&version=9.0
miquirumba, VN_SH, ronin92 And 13 others like this post
Reply
#2
   
Tomenable likes this post
Reply
#3
Not a fan of the new hyper verbose header records in the anno files.
Reply
#4
So they actually reduced the number of present-day samples compared to the previous version?
Reply
#5
There seems to be too many duplicates of the same individuals.

UNTA58_147.AG F Germany_Lech_EBA_possible.contam.AG
UNTA58_147_d.AG U Germany_Lech_EBA_lc.AG

WEHR_1415adult.AG F Germany_Lech_EBA_mother.WEHR_1415child_possible.contam.AG
WEHR_1415adult_d.AG F Germany_Lech_EBA_mother.WEHR_1415child_lc.AG

POST_99.AG F Germany_Lech_EBA_possible.contam.AG
POST_99_d.AG F Germany_Lech_EBA.AG


YCH028.IM M Mexico_ChichenItza_MayaLowlands_LClassic.IM
YCH028_d.IM U Mexico_ChichenItza_MayaLowlands_LClassic.IM

YPN021_d.SG U Thailand_YappaNhae_2_LogCoffin_IA.SG
YPN021.SG F Thailand_YappaNhae_2_LogCoffin_IA.SG
AimSmall likes this post
Reply
#6
Checking to make sure there isn't anything in my dataset that isn't in v62. Here's what I see missing from v62.
I0726 - Mentese_N
I2110 - Verteba cave Trypillia
I3619 & I3621 - Taiwan_Hanben_IA
EHU002 - Spain_CA_o_steppe
ble007 & ble008 - SWE_meso
Ancient Australia - "Ancient nuclear genomes enable repatriation of Indigenous human remains"
I12438 - England_N_Megalithic
VEK007 & VEK009 - Russia_Caucasus_KuraAraxes
I0585 - LaBrana1
CLL007 - Spain_SE_Iberia_CA_NO
I21035 - Sudan_EarlyChristian
I7334 - RUS_Ekven_OldBeringSea

A few samples from "Three Reagents for in-Solution Enrichment of Ancient Human DNA at More than a Million..." (The TWIST publication)
I2818
I2949 (Dzudzuana) - It's in there as I11857, but ~1/3 snp count.
I20703
I20720
I20721
I21299

Some of the "The genomic origins of the Bronze Age Tarim Basin mummies" dataset.
Show Content

MUR009 - Volosovo
Dual ancestries and ecologies of the Late Glacial Palaeolithic in Britain (GoughsCave Magdelanian, Kendricks WHG)
DCP1

It appears like UKY001 is finally the actual UKY001 instead of a KPT002 duplicate.
AG3 and I2158 (Oriente) are also ~1/3 snp count of previous version. Honestly I think 2% contamination or whatever is worth 3x the snps in cases like this.
ElMiron is labeled contaminated now? But it also passes assessment, idk.

EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets.
Genetics189291 and AimSmall like this post
Reply
#7
FYI: there is a name in .ind file with a special character: Ertebølle .
R-studio may not like it. I usually replace these specal characters with regular.
AimSmall likes this post
Reply
#8
Amazing this released today. After Reich's interview recently, I've been rewatching and relistening to it. Not uncommon for me to binge this stuff. Always learn something each time. Bet I've listened to Spencer Wells and Razib Khan's Insight podcasts 50 times each. I binge this stuff. Love John Hawkes as well.

Anyway, was just about to write the Reich Labs asking why no new AADR and why not publishing datasets like they had in the past. I even had a draft I'd been working on.

Today, as I randomly do, I checked again and wow... new data.

Question: Why did it take so long? 1.5 to almost 2.0 years??? The pace of updates and papers has definitely changed.

Very curious why the change in release of data and papers. Listening to Reich recently, he implies there are probably 10K samples yet to be released.
jamtastic and Tomenable like this post
Reply
#9
(Yesterday, 02:38 AM)Kale Wrote: EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets.

DCP1
VEK007+VEK009
ROT+BOO
Mura

https://mega.nz/folder/JvdwUZhI#mxbiTgsAR8KMeERUt6vgLg
Reply
#10
Another good news here: we have the WHG samples from SimoesPNAS2024
hoe001.SG - tev001 and other included in v62 dataset.
These are high coverage. (6351-6069 calBCE) . Excellent quaality.
AimSmall likes this post
Reply
#11
(Yesterday, 04:38 AM)Light Wrote:
(Yesterday, 02:38 AM)Kale Wrote: EDIT: Could someone upload a plink version (bed/bim/fam) to ufile.io or some other file hosting site. My computer doesn't have the RAM necessary to convert these large datasets.

DCP1
VEK007+VEK009
ROT+BOO
Mura

https://mega.nz/folder/JvdwUZhI#mxbiTgsAR8KMeERUt6vgLg

Kofiko in the Refugium posted the 1240K
https://www.mediafire.com/file/wywscie9d...K.zip/file
Reply
#12
v62: we have Morocco_OUB_EpiP.SG
oub002.SG
This one is high coverage,
missingness is only: 0.0006273
This one is from TAF cluster, however the previous TAF were low coverage.
This will allow us to see more details for North Africans.
Genetics189291 likes this post
Reply
#13
(Yesterday, 01:29 PM)TanTin Wrote: v62: we have Morocco_OUB_EpiP.SG
oub002.SG
This one is high coverage, 
missingness is only: 0.0006273
This one is from TAF cluster, however the previous TAF were low coverage.
This will allow us to see more details for North Africans.

I’ll start merging my sons and wife’s data to this today

a
Reply
#14
So, merged my family with v62.  Goal was to do an ADMIXTURE run, figured I'd start with 16 components.  Nothing I haven't done numerous times.

Holy crap.  The new file format of the anno file is a PITA.  Not only are the headers crazy verbose, the fields are no longer in the same order.  Been busy redoing scripts to accommodate the new layout.  Ugggh.

Extracting all of the DG including my family for a 16 components run.  Then I can use those to project onto the rest of the dataset.

Have server, will smoke it.  LOL
   
Reply

Check for new replies

Forum Jump:


Users browsing this thread: Square, 2 Guest(s)