Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Check for new replies
Dataset merging help
#1
hi, if I could get some help with this error please it doesn't seem to be merging my file with the dataset

parameter file: merge_param.par
geno1: myson.geno
snp1: myson.snp
ind1: myson.ind
geno2: v54.1.p1_1240K_public.geno
snp2: v54.1.p1_1240K_public.snp
ind2: v54.1.p1_1240K_public.ind
outputformat: EIGENSTRAT
genotypeoutname: son_output.geno
snpoutname: son_output.snp
indivoutname: son_output.ind
allele funny: rs320061 T C A C
allele funny: rs4668878 C A A G
allele funny: rs7692855 C A C T
allele funny: rs10053269 G A G T
allele funny: rs10253843 T G C T
allele funny: rs10488002 T G G A
allele funny: rs4734497 T G T C
allele funny: rs1829605 T C G T
allele funny: rs1247096 G A G T
allele funny: rs2002129 T G G A
allele funny: rs2902299 T C C A
allele funny: rs2618512 C A A G
allele funny: rs2110167 C A A G
allele funny: rs969863 T C A C
allele funny: rs11065634 T G G A
allele funny: rs971394 T G A G
allele funny: rs2305307 T G T C
allele funny: rs2326253 C A C T
allele funny: rs10401155 G A G T
allele funny: rs6097797 T G T C
allele funny: rs6062840 G A G T
read 1073741824 bytes
read 2147483648 bytes
read 3221225472 bytes
read 4294967296 bytes
read 5052887274 bytes
packed geno read OK
end of inpack
numsnps input: 287522 1233013
*** warning output snpname NULL
snpname: (null) 287522
indname:  (null) 16390
gname: (null)
eigenstrat output
numsnps output: 0  numindivs: 16390


Histogram of checkmatch return codes
kode:   -2    223  A/T or C/G and strandcheck
kode:    0     21  Allele mismatch
kode:    1  97691  SNP OK (no flip)
kode:    2  97198  SNP OK (flip)
total:         195133
Capsian20 likes this post

a
Reply
#2
You have strand inconsistencies or flipped SNPs...  I've never tried to merge Eigenstrat files, just PLINK.  Your mileage may vary.

Here is how, who I call "Obi Wan",  taught me..... (because he was the master) aka Nganasankan aka Henjin 

Note this was using a Windows system at the time.  Same technique works elsewhere, just fix your file pathing accordingly.


Basically a four step process....
Code:
D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome --make-bed --out v52_HO_AimSmall

Need to flip for strand inconsistency

henjin — Today at 4:37 PM
This is the procedure to deal with strand inconsistency errors:
alias pli='plink --allow-no-sex'
pli --bfile $a --bmerge $b --make-bed --out merge
pli --bfile $b --flip merge-merge.missnp --make-bed --out $b-flip
pli --bfile $a --bmerge $b-flip --make-bed --out merge
pli --bfile $b-flip --exclude merge-merge.missnp --make-bed --out $b-filtered
pli --bfile $a --bmerge $b-filtered --make-bed --out merged


You can use a function for it too:
plib()(plink --allow-no-sex --bfile "$@")
merg()(plib "$1" --bmerge "$2" --make-bed --out merge;plib "$2" --flip merge-merge.missnp --make-bed --out "$2-flip";plib "$1" --bmerge "$2-flip" --make-bed --out merge;plib "$2-flip" --exclude merge-merge.missnp --make-bed --out "$2-filtered";plib "$1" --bmerge "$2-filtered" --make-bed --out "${3-merged}")
merg v52.2_HO_public AimSmall_genome


Manual method example

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile AimSmall_genome --flip v52_HO_AimSmall-merge.missnp --make-bed --out AimSmall_genome_flip

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome_flip --make-bed --out v52_HO_merged

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile AimSmall_genome_flip --exclude v52_HO_merged-merge.missnp --make-bed --out AimSmall_genome_flip_filtered

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome_flip_filtered --make-bed --out v52_HO_AimSmall2
Genetics189291 likes this post
Reply
#3
Suggest you change the title to something like "Dataset Merging" or something like that. qpAdm isn't relevant in context.
Genetics189291 likes this post
Reply
#4
(05-05-2024, 06:12 PM)AimSmall Wrote: You have strand inconsistencies or flipped SNPs...  I've never tried to merge Eigenstrat files, just PLINK.  Your mileage may vary.

Here is how, who I call "Obi Wan",  taught me..... (because he was the master) aka Nganasankan aka Henjin 

Note this was using a Windows system at the time.  Same technique works elsewhere, just fix your file pathing accordingly.


Basically a four step process....
Code:
D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome --make-bed --out v52_HO_AimSmall

Need to flip for strand inconsistency

henjin — Today at 4:37 PM
This is the procedure to deal with strand inconsistency errors:
alias pli='plink --allow-no-sex'
pli --bfile $a --bmerge $b --make-bed --out merge
pli --bfile $b --flip merge-merge.missnp --make-bed --out $b-flip
pli --bfile $a --bmerge $b-flip --make-bed --out merge
pli --bfile $b-flip --exclude merge-merge.missnp --make-bed --out $b-filtered
pli --bfile $a --bmerge $b-filtered --make-bed --out merged


You can use a function for it too:
plib()(plink --allow-no-sex --bfile "$@")
merg()(plib "$1" --bmerge "$2" --make-bed --out merge;plib "$2" --flip merge-merge.missnp --make-bed --out "$2-flip";plib "$1" --bmerge "$2-flip" --make-bed --out merge;plib "$2-flip" --exclude merge-merge.missnp --make-bed --out "$2-filtered";plib "$1" --bmerge "$2-filtered" --make-bed --out "${3-merged}")
merg v52.2_HO_public AimSmall_genome


Manual method example

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile AimSmall_genome --flip v52_HO_AimSmall-merge.missnp --make-bed --out AimSmall_genome_flip

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome_flip --make-bed --out v52_HO_merged

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile AimSmall_genome_flip --exclude v52_HO_merged-merge.missnp --make-bed --out AimSmall_genome_flip_filtered

D:\DataAnalysis\DataSets\v52>plink --allow-no-sex --bfile v52.2_HO_public --bmerge AimSmall_genome_flip_filtered --make-bed --out v52_HO_AimSmall2



Can I also get the command to initially convert your raw data into bim bed and fam on windows please thanks

a
Reply
#5
plink --23file 23andme_raw_data.txt --make-bed --out mydata
Genetics189291 likes this post
Reply
#6
(05-05-2024, 06:38 PM)AimSmall Wrote: plink --23file 23andme_raw_data.txt --make-bed --out mydata

That seemed to work , if I remember what you taught me i have to convert the dataset into the same format and merge it then change it back after resolving any issues with your commands ?

a
Reply
#7
(05-05-2024, 06:38 PM)AimSmall Wrote: plink --23file 23andme_raw_data.txt --make-bed --out mydata

I got to the bit of merging the two files but I’m getting this now 

Error: Line 1 of v54.1 fam has fewer tokens then expected

a
Reply
#8
Post the head of the file
Reply
#9
(05-05-2024, 11:34 PM)AimSmall Wrote: Post the head of the file

I’ve seemed to fix that issue but not my problem is when I’m converting the file back after merging the names are not showing up instead it’s still  showing numbers, the ids are fine it’s just the sample names that are still in numbers?

a
Reply
#10
(05-05-2024, 11:34 PM)AimSmall Wrote: Post the head of the file

Don’t know what I’m doing wrong but when I’m converting the file back after merging it, it just has control next to the ids

a
Reply
#11
Genetics189291, I'm seemingly on your ignore list. I can't therefore reply to your PM. I guess you want to convert from PLINK to EIGENSTRAT? If this is correct, the word “CONTROL” appears in the individual codes column. This is not a problem, you can replace these mentions with population names of your choice. If not, clarify your question.
Genetics189291 likes this post
MyHeritage:
North and West European 55.8%
English 28.5%
Baltic 11.5%
Finnish 4.2%
GENETIC GROUPS Scotland (Aberdeen and Aberdeenshire)

Papertrail (4 generations): Normandy, Orkney, Bergum, Emden, Oulu
Reply
#12
(05-07-2024, 02:20 PM)Anglesqueville Wrote: Genetics189291, I'm seemingly on your ignore list. I can't therefore reply to your PM. I guess you want to convert from PLINK to EIGENSTRAT? If this is correct, the word “CONTROL” appears in the individual codes column. This is not a problem, you can replace these mentions with population names of your choice. If not, clarify your question.

sorry Don’t know why that happened, but there was a way I used to do it where it converted back with the sample names unless I’m wrong it’s been a while. I might look for a smaller dataset if I have to rename it all.

a
Reply
#13
It’s okay I seemed to have a found a way to merge it directly which is much easier then converting it back and forth, takes a fraction of the time I’ll go through the steps when I get back home

a
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)