Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Y chromosome recurrent SNPs
#16
Here is another one:

CTS9894.1 G PF3036.1; M3574.1 rs559968594 19124322 17012442 A->T
PF3036.1 G M3574.1; CTS9894.1 rs559968594 19124322 17012442 A->T


famInd X Y_ch
1 HG02231.SG T IBS.SG R1b1a1b1a1a2
2 HG02690.SG T PJL.SG J2b2a2b2


9 Iceman_noUDG.SG T Italy_North_MN_Iceman_contam.SG NA ( I just confirmed Iceman - Ötzi is G, so no questions here )
Reply
#17
Next examples:

Show Content
Reply
#18
Show Content
Reply
#19
And there is one more special.  PF3112

Show Content

I am providing also a PCA picture for this one:

[Image: Hg-G-special.png]
Reply
#20
While I think the mechanism you propose might exists, making a list of "exemples" is not the way to go to prove it exists.
If this mecanism is frequent enough to be seen among ancient samples, then, you should see it in "confirmed" clades from the Y-tree (either FTDNA or YFULL).

I will propose you a statistical test :
--> Restrict yourself to SNP used for age estimation (to avoid regions with a too high mutation rate)
--> Take few macro-haplogroups
--> Select all subclades with TF < 5000 ybp
--> Classify them by likely geographical place of living since ~5000 ybp.
--> Compute the covariance matrix (== counting the number of redundant mutations per "yr.clade") of as a function of localisation and macro-haplogroup (PS: becarefull to correct for unknown private SNP).

If the effect you mention is significant, you should see a higher covariance for separate macro-haplogroups that spent time at the location compared to macro-haplogroups that weren't around the same place.

For exemple, you would expect many R-P312's SNP injections into minor western European macro-haplogroup subclades, whereas you won't expect such injections for the concerned macro-haplogroup subclades located in Asia.

I think that if you manage to show that such statistical signal exists, you would have a strong case that would be hard to explain by "lucky" convergent mutations (and if done with care, such detection would also definitely deserves a publication).
But showing some exemples won't never prove that the effect you speak of exists and is spatially coherent (which is a requirement considering the hypothesis you propose to explain such effect).
TanTin likes this post
Reply
#21
This is a good one :
Show Content
Reply
#22
I don't quite understand your theory. It was long known that some mutations happen in different haplogroups. In older times, when only a few SNPs were known this caused problems after individual SNP testing. Nowadays with the NGS the haplogroup is easily identified in the context of all mutations. Probably YFULL avoids including such SNPs.
For example in the past L277.1 was defining a major branch under R1b-Z2103. The same mutation, named L277.2 was found also under R1a and even O2 and N1b as S334 . It is possible such positions are unstable and should not be used in major Y tree building.
https://www.genetichomeland.com/dna-mark...ome-Y/L277
Reply
#23
Yfull puts all SNPs positive including those which are known and which belong to haplogroups other than ours.
They are in "Knows SNPs".
I have some in private variants, but also at the level of my final markers.
Example:
I am positive for PF2595 (haplogroup F), Z2051 (haplogroup G), CTS12727 (haplogroup J), FGC48518 (haplogroup G), M10956 (haplogroup A1b) ect...

So, what is certain is that it is very common, yfull puts them for information but does not take them into account because they are only rated with one star.
On the other hand, I looked at the first 3 markers given by TanTin, they are rated 4 stars on Yfull. This means that Yfull still considers them quite reliable (Maximum 5 stars)

I have no knowledge of how this happens..
TanTin likes this post
Reply
#24
(01-22-2024, 09:15 AM)eastara Wrote: I don't quite understand your theory. It was long known that some mutations happen in different haplogroups. In older times, when only a few SNPs were known this caused problems after individual SNP testing. Nowadays with the NGS the haplogroup is easily identified in the context of all mutations. Probably YFULL avoids including such SNPs.
For example in the past L277.1 was defining a major branch under R1b-Z2103. The same mutation, named L277.2 was found also under R1a and even O2 and N1b as S334 . It is possible such positions are unstable and should not be used in major Y tree building.
https://www.genetichomeland.com/dna-mark...ome-Y/L277

Indeed it is known, and expected, that some Y-haplogroup will share "lucky" mutations on the same SNP.
But the question TanTin is proposing is to check if these mutations are really only based on "luck" and random process or are some of them passed down via some mechanism from chrX to chrY ?
The later would imply a spatial coherence in the so-called shared "lucky" mutations ...

Another way (in addition to the one I proposed above) to detect the existence of such effect would be to place the shared SNP on a maps for a given "source" population (i.e., making a map of mutations shared with R-P312 but inside haplogroups of non-R-P312 descendents, with a weight of 1 for each mutation, therefore spreading this weight among all the carrier to avoid effects comming from later founding effect post-dating the mutation).
After you normalise this map by the density of habitants not descending from R-P312 (technically a normalisation by the integrated lifetime, since their TF, of the non-R-P312 clades would be more relevant ... but is also harder to build accurately).

If the effet proposed in this topic exists, then the mutations map won't be a "white noise" (random spatial distribution) and should show some signal on the angular power-spectrum of the map.
To enhance the signal in such attempt it is better to look for the Shared-mutation-map vs R-P312 density map cross-correlation angular power spectrum (because you expect more mutation sharing where R-P312 are living) ... that should be positive if the effect proposed in this topic exists and have a significant impact on the Y-tree.

Of course such analysis is a bit more tricky to perform, and would only speak to peoples with a strong background in data analysis (== peoples with a academic research level). At first order a simple "covariance matrix approach" would be more easy to understand for most peoples.
TanTin likes this post
Reply
#25
Hmmm, well here's a test that might suggest something going on...
1) Checked all 107 snps that define R-M269 in yfull to see if any other lineage is defined by those snps
2) Compared the number of matches to the number of snps in each major haplogroup (click the ytree statistics button)
3) Plot the matches against the total snps for the haplogroup
   

Clearly I is an outlier, and if there is a haplogroup you'd expect to have most contact with M269 it would surely be I.
It would be good to do a 'control' test with some haplogroup that's well restricted geographically, like S or M, instead of M269, to make sure everything behaves and this isn't an artifact.
TanTin and GHurier like this post
Reply
#26
Btw, did anyone have a database (in .csv, .txt formats or anything easily usable) of all SNPs defining haplogroups (if possible with HG38 positions) ?
Last time I tried to make a massive query on YFULL pages, I got blocked after retrieving few pages (too many connections in a short time).

And for some reasons, if I do it manually, I can get the the web-pages containing web-adresses with the HG38 position of the SNPs, but when retrieving the page with WGET I don't get the link and thus I only get the SNP name and not the position.
And I'm way too lazy to do that manually for all haplogroups (nearly ~35k for YFULL in total).
Reply
#27
(01-22-2024, 09:03 PM)GHurier Wrote: Btw, did anyone have a database (in .csv, .txt formats or anything easily usable) of all SNPs defining haplogroups (if possible with HG38 positions) ?
Last time I tried to make a massive query on YFULL pages, I got blocked after retrieving few pages (too many connections in a short time).

And for some reasons, if I do it manually, I can get the the web-pages containing web-adresses with the HG38 position of the SNPs, but when retrieving the page with WGET I don't get the link and thus I only get the SNP name and not the position.
And I'm way too lazy to do that manually for all haplogroups (nearly ~35k for YFULL in total).

The Y tree is officially supported by the ISOGG, the scientific studies are based on it. This is their SNP list:

https://docs.google.com/spreadsheets/d/1...1934392066

https://isogg.org/tree/
GHurier and TanTin like this post
Reply
#28
(01-22-2024, 10:50 PM)eastara Wrote: The Y tree is officially supported by the ISOGG, the scientific studies are based on it. This is their SNP list:

https://docs.google.com/spreadsheets/d/1...1934392066

https://isogg.org/tree/

Where do you find which snip is 4* or 5* ?
Reply
#29
Thanks, for people looking at it like me, I also found this version with a mapping to YFULL-nodes :
https://ybrowse.org/gbrowse2/gff/snps_hg38.gff3
Reply
#30
https://www.biorxiv.org/content/10.1101/...658v2.full

Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation

Quote: Abstract

The prevalence of highly repetitive sequences within the human Y chromosome has led to its 36 incomplete assembly and systematic omission from genomic analyses. Here, we present long-read de 37 novo assemblies of 43 diverse Y chromosomes spanning 180,000 years of human evolution, including 38 two from deep-rooted African Y lineages, and report remarkable complexity and diversity in 39 chromosome size and structure, in contrast with its low level of base substitution variation. The size 40 of the Y chromosome assemblies varies extensively from 45.2 to 84.9 Mbp and include, on average, 41 81 kbp of novel sequence per Y chromosome. Half of the male-specific euchromatic region is subject 42 to large inversions with a >2-fold higher recurrence rate compared to inversions in the rest of the 43 human genome. Ampliconic sequences associated with these inversions further show differing 44 mutation rates that are sequence context-dependent and some ampliconic genes show evidence for 45 concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest 46 heterochromatic region in the human genome, the Yq12, is composed of alternating arrays of DYZ1 47 and DYZ2 repeat units that show extensive variation in the number, size and distribution of these 48 arrays, but retain a 1:1 copy number ratio of the monomer repeats, consistent with the notion that 49 functional or evolutionary forces are acting on this chromosomal region. Finally, our data suggests 50 that the boundary between the recombining pseudoautosomal region 1 and the non-recombining 51 portions of the X and Y chromosomes lies 500 kbp distal to the currently established boundary. The 52 availability of sequence-resolved Y chromosomes from multiple individuals provides a unique 53 opportunity for identifying new associations of specific traits with Y-chromosomal variants and 54 garnering novel insights into the evolution and function of complex regions of the human genome.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)