So I have had questions over the years about creating datasets and here I present a new program called aDNA to dataset (AKA make-myself-redundant)
There are both Linux and Windows versions available. The instructions are very simple so I would like to get feedback from anyone interested in creating their own datasets from BAMs.
Notice: Windows version includes all the files necessary to run except references. For Linux you need to have samtools and pileupCaller in your path.
pileupCaller can be downloaded from here and for samtools I assume you know how to use apt, pacman, yum etc.
https://github.com/stschiff/sequenceTools
Main page:
https://github.com/teepean/adna_to_dataset
Download:
https://github.com/teepean/adna_to_datas.../v.0.2.zip
PileupCaller uses default settings and if you want to modify them you have to edit the .bat or .sh.
EDIT: the program supports only hs37d5 and hg19 as references as those are the most commonly used in aDNA papers. hg38/T2T support can be added if AADR starts supporting those references.
There are both Linux and Windows versions available. The instructions are very simple so I would like to get feedback from anyone interested in creating their own datasets from BAMs.
Notice: Windows version includes all the files necessary to run except references. For Linux you need to have samtools and pileupCaller in your path.
pileupCaller can be downloaded from here and for samtools I assume you know how to use apt, pacman, yum etc.
https://github.com/stschiff/sequenceTools
Main page:
https://github.com/teepean/adna_to_dataset
Download:
https://github.com/teepean/adna_to_datas.../v.0.2.zip
PileupCaller uses default settings and if you want to modify them you have to edit the .bat or .sh.
EDIT: the program supports only hs37d5 and hg19 as references as those are the most commonly used in aDNA papers. hg38/T2T support can be added if AADR starts supporting those references.