Here's what I had posted on the other forum:
I'm sure there are more efficent methods, but I'll give you two methods, one on usegalaxy.org and the other locally on a linux computer.
I assume you already have plink and know how to merge datasets
1) SETTING UP
-GALAXY: create an account on usegalaxy.org and upload your data (fastq or BAM file). You can use the upload data button on the right side and then choose a local file from your computer or use the paste/fetch data button to paste a url. Alternatively you can use the download tools under the Get Data header of the tools pannel on the left of the main window.
-LOCAL: a) install needed tools: bwa, bcftools, samtools, tabix
b) getting the hg19 reference genome (not needed on Galaxy since built in)
wget -O hg19.tar.gz
http://hgdownload.cse.ucsc.edu/goldenPat...mFa.tar.gz
gunzip hg19.tar.gz
tar -xf hg19.tar
cat *.fa >> hg19.fa
rm *.fa
bwa index -a bwtsw hg19.fa
2) MAKING THE BAM FROM THE FASTQ
-GALAXY: use the tool "map with bwa-mem" on your fastq file by
a) selecting the reference genome in the second drop down (hg19)
b) selecting the single or paired-end option the 3rd drop down (usually single with ancient dna)
c) selecting your fastq file in the next drop down(s)
-LOCAL:
bwa mem -t 1 hg19.fa YOURFILE.fastq > YOURFILE.sam (the -t option is the number of concurrent thread to use on your computer, it will mutiply RAM usage as well)
samtools view -@ 2 -bS YOURFILE.sam | samtools sort -@ 2 -m 2G -o YOURFILE.bam (the -@ option is the number of processors to use and the -m the amount of RAM)
samtools index -@ 2 YOURFILE.bam
3) MAKING THE BCF FILE FROM THE BAM
-GALAXY: a) use "bcftools mpileup" tool on the output file of preceding step and select the reference genome (same as before = hg19)
b) use "bcftools call" tool on the output file of preceding step and select "1 - Treat all samples as haploid" in "Select predefined ploidy" under "file format options"
-LOCAL: a) bcftools mpileup -O b -o YOURFILE_mpiled.bcf -f hg19.fa YOURFILE.bam
b) bcftools call --ploidy 1 -m -O b -o YOURFILE.bcf YOURFILE_mpiled.bcf
4.1) PREPARING A FILE TO ANNOTATE WITH SNP IDS
start with the BIM file of a 1240k set
we need to keep only columns 1,2&4 and invert 2&4. We also need to replace the chromosomes numbers in the first column by chr1,chr2,chr3...chr22,chrX,chrY
-GALAXY: a) upload the BIM file
b) use "cut" tool to get 1st and 4th column
c) use the "cut" tool again to get the 2nd column
d) use the "paste" tool to aggregate the 2 files together
e) use the "replace" tool to replace the chromosomes numbers of the first column, after having uploaded a file like this:
1 chr1
2 chr2
3 chr3
4 chr4
5 chr5
6 chr6
7 chr7
8 chr8
9 chr9
10 chr10
11 chr11
12 chr12
13 chr13
14 chr14
15 chr15
16 chr16
17 chr17
18 chr18
19 chr19
20 chr20
21 chr21
22 chr22
23 chrX
24 chrY
*** In the above steps you might lose the column numbering at some point, in the right panel where you see the resulting files, you might lose the blue line with the column numbers and the file type "tabular". There are two ways of correcting that, one is editing the file type (pencil icon and then second tab), the other is to use the "convert" tool.
-LOCAL: use the tools of your choice to achieve the same result: from the BIM file keep first column but change the chromosomes names, put the fourth column next and then the 2nd one and erase the rest.
4.2) ANNOTATING WITH SNP IDS
-GALAXY: use "bcftools annotate" a) Under Add annotations in column, type: CHROM,POS,ID
b) For Annotations File, select 'From a BED or tab-delimited file' and select your annotation file that we prepared in the preceding step
c) as usual click on run tool at the bottom
-LOCAL: let's say the prepared annotation file from the BIM in the step above is called chr1240k
a) bgzip chr1240k
b) tabix -s1 -b2 -e2 num1240k.gz
c) bcftools annotate -O b -o YOURFILE_annotated.bcf -a chr1240k.gz -c CHROM,POS,ID YOURFILE.bcf
5) FILTERING (note that it is possible to do it in one step by excluding the expression: ID=="." || QUAL<25 || QUAL=="."
-GALAXY: a) keep only positions with the 1240k snps IDs: "bcftools filter" under Restrict to in the exclude box type: ID=="."
b) filter for quality: "bcftools filter" under Restrict to in the exclude box type: QUAL<25
c) cleanup empty QUAL fields in case there are some: "bcftools filter" under Restrict to in the exclude box type: QUAL=="."
* at step c) select the uncompressed VCF output type just above the run tool button
-LOCAL: a) bcftools filter -O b -o YOURFILE_clean1.bcf -e 'ID=="."' YOURFILE_annotated.bcf
b) bcftools filter -O b -o YOURFILE_clean2.bcf -e 'QUAL<25' YOURFILE_clean1.bcf
c) bcftools filter -o YOURFILE_clean.vcf -e 'QUAL=="."' YOURFILE_clean2.bcf
**IN ONE STEP: bcftools filter -o YOURFILE_clean.vcf -e 'ID=="." || QUAL<25 || QUAL=="."' YOURFILE_annotated.bcf
6) CONVERT TO PLINK
-GALAXY: you have to download the last resulting file (disk icon that appears at the bottom when you click on your file in the history pannel) and do it LOCALLY after having optionnaly renamed it YOURFILE_clean.vcf
-LOCAL: plink --const-fid 0 --vcf YOURFILE_clean.vcf --make-bed --out YOURFILE
7) LAST CLEANUP AND MERGE
a) try a merge with a dataset you already have: plink --bfile DATASETNAME --bmerge YOURFILE.bed YOURFILE.bim YOURFILE.fam --indiv-sort 0 --allow-no-sex --make-bed --out NEWDATASETNAME
b) If it gives you an error message of the kind: "Error: xx variants with 3+ alleles present" you will have to remove the allele because plink only supports two allele possibility at each variant/snp. Fortunately when we get this error plink creates the necessary file (NEWDATASETNAME-merge.missnp) and we just have to run:
plink --bfile YOURFILE --exclude NEWDATASETNAME-merge.missnp --make-bed --out YOURFILE2
and attempt to merge again:
plink --bfile DATASETNAME --bmerge YOURFILE2.bed YOURFILE2.bim YOURFILE2.fam --indiv-sort 0 --allow-no-sex --make-bed --out NEWDATASETNAME
8) You're done except you might want to edit the fam file as usual.
Again, I'm sure there are more efficient ways, especially with multiple samples, but I went with what I know, hoping others will share their knowledge as well.