Login

Gordius · (This post was last modified: 10-07-2023, 03:03 PM by Gordius.)

I created this thread for questions about PLINK.

My first question - has anyone used IBS clustering from plink?: https://zzz.bwh.harvard.edu/plink/strat.shtml#cluster
What data format is used there? Can it provide any useful information about population structure?

***AimSmall*** · 10-07-2023, 03:29 PM

(10-07-2023, 03:03 PM)Gordius Wrote: I created this thread for questions about PLINK.

My first question - has anyone used IBS clustering from plink?: https://zzz.bwh.harvard.edu/plink/strat.shtml#cluster
What data format is used there? Can it provide any useful information about population structure?

I've played with it in the past. I don't recall any great revelations. I might have to give it some more attention.

These are my notes from that session.

Gordius · 10-07-2023, 03:36 PM

(10-07-2023, 03:29 PM)AimSmall Wrote:
(10-07-2023, 03:03 PM)Gordius Wrote: I created this thread for questions about PLINK.

My first question - has anyone used IBS clustering from plink?: https://zzz.bwh.harvard.edu/plink/strat.shtml#cluster
What data format is used there? Can it provide any useful information about population structure?

I've played with it in the past. I don't recall any great revelations. I might have to give it some more attention.

These are my notes from that session.

Is it possible to do this on Windows? What will the code look like if the genomic file is in plink format (bed/bim/fam)?

***AimSmall*** · 10-07-2023, 03:40 PM

Yes. I run plink on Windows as well. My example was with plink format files. No need to specify the file extension.

Gordius · 10-07-2023, 03:56 PM

(10-07-2023, 03:40 PM)AimSmall Wrote: Yes. I run plink on Windows as well. My example was with plink format files. No need to specify the file extension.

Error: failed to open file

***AimSmall*** · 10-07-2023, 04:14 PM

What version of PLINK? Are PLINK and the three files in the same directory or PLINK at least in your PATH?

Did you create the genome file first?

I have this running right now.... Note it's in Windows

Code:
D:\DataAnalysis\DataSets\v54_Family>plink --bfile family_v54_HO --genome

PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/

(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to plink.log.

Options in effect:

  --bfile family_v54_HO

  --genome

32717 MB RAM detected; reserving 16358 MB for main workspace.

1905494 variants loaded from .bim file.

20588 people (11927 males, 8408 females, 253 ambiguous) loaded from .fam.

Ambiguous sex IDs written to plink.nosex .

20588 phenotype values loaded from .fam.

Warning: Ignoring phenotypes of missing-sex samples.  If you don't want those

phenotypes to be ignored, use the --allow-no-sex flag.

Using up to 8 threads (change this with --threads).

Before main variant filters, 20587 founders and 1 nonfounder present.

Calculating allele frequencies... done.

Warning: 675 het. haploid genotypes present (see plink.hh ); many commands

treat these as missing.

Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands

treat these as missing.

Total genotyping rate is 0.21655.

1905494 variants and 20588 people pass filters and QC.

Among remaining phenotypes, 0 are cases and 20335 are controls.  (253

phenotypes are missing.)

Excluding 52080 variants on non-autosomes from IBD calculation.

52992 markers complete.

Gordius · 10-07-2023, 04:20 PM

For example. Small genome files have the same problem.

[Image: PLINK.jpg]

***AimSmall*** · 10-07-2023, 04:27 PM

Did you create the genome file first? Mine is still running.

--genome v50.0_HO_public.genome

Gordius · 10-07-2023, 04:36 PM

(10-07-2023, 04:27 PM)AimSmall Wrote: Did you create the genome file first? Mine is still running.

--genome v50.0_HO_public.genome

What will the full command look like with everything? Let's say I have a file v50.0_HO_public. The plink and the file are in the same directory.

***AimSmall*** · 10-07-2023, 05:15 PM

Once you have the genome file. Look at my initial screenshot for the syntax. I’m currently driving the next few hours and away from my systems.

Gordius · 10-07-2023, 05:34 PM

Thank you very much, it seems to have worked. Now I will deal with the clustering results.

***AimSmall*** · 10-07-2023, 05:37 PM

Be curious what you learn.

Gordius · 10-07-2023, 09:05 PM

(10-07-2023, 05:37 PM)AimSmall Wrote: Be curious what you learn.

You mean what I'm trying to cluster? (I have some difficulties with the translation)

***AimSmall*** · 10-07-2023, 11:42 PM

Just anything you learn would be interesting. It’s a reason many of us are here is to learn, be it analysis results or techniques.

Gordius · 10-08-2023, 08:38 PM

(10-07-2023, 11:42 PM)AimSmall Wrote: Just anything you learn would be interesting. It’s a reason many of us are here is to learn, be it analysis results or techniques.

I am currently looking for some new methods that could help in the analysis of populations or individuals. Inspired by this thread in particular: https://genarchivist.com/showthread.php?tid=110 . I tried to carry out clustering with the help of plink of some samples of Ukrainians that I had in plink format, and compared with the results on G25. At the moment, it is still difficult for me to interpret the results and find patterns. During clustering, samples that were previously considered outliers (EG600048 and EG600093) were filtered out, which is good. Also, samples EG600037 and EG600038 were very close, next to each other, they come from the same village (they are very far from each other by G25 distances). Further, I have not yet found any patterns of division into clusters. Tried the function IBS similarity matrix https://zzz.bwh.harvard.edu/plink/strat.shtml#matrix , looked at the distances between populations, but those populations that are closest to each other on the distance matrix may not be located on the same branch during clustering.

By the way, do you know the method, how can I directly compare two samples to each other? Maybe this IBS similarity matrix is just for that? I mean some analogue of distances for calculators, only directly comparing the genotype. For example, the difference between alleles between two individuals. I tried f2 and fst in admixtools 2, but firstly, these functions are non-linear, secondly, their results vary greatly depending on the number of samples that are compared at the same time, and I'm not sure that I'm doing everything correctly, so they gave some faint results only for long distances, that is, if you compare, for example, modern samples with some ancient hunters.
Are there any analogues of f2 statistics in the same PLINK? For example, I have the distance from a sample to other samples on the G25 calculator, and I would like to see how correct these distances are, that is, to directly compare the genotypes of these samples, for example, by allele frequency. How can this be done? Is this IBS similarity matrix suitable for this?

Login
Username/Email:
Password:	Lost Password?
	Remember me