Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Check for new replies
Y chromosome recurrent SNPs
#31
A couple of general questions on Y mutations overall:

1. Are there certain regions of the Y chromosome that mutations are limited to or can mutations occur anywhere?

2. Are some regions more prone to mutations than others?

3. How sure are we really on what the Y mutation rate truly is and if it's really generalizable across all haplogroups? I'm thinking overall that if some regions of the Y are more prone to mutations than other regions, how does that impact fixing a consistent mutation rate across all haplos through space and time? And when we go back deeper in time, when all the main haplos diversified tens of thousands of years ago, how are we sure that each haplo more or less retained the same fixed/consistent rate - could we be overestimating haplo ages in certain instances by applying a single standard/fixed mutation rate, based on high numbers of mutations, when maybe there could be selection effects that increased the mutation rate at a certain stretch of time?
Reply
#32
(02-02-2024, 11:24 PM)Horatio McCallister Wrote: A couple of general questions on Y mutations overall:

1. Are there certain regions of the Y chromosome that mutations are limited to or can mutations occur anywhere?

2. Are some regions more prone to mutations than others?

3. How sure are we really on what the Y mutation rate truly is and if it's really generalizable across all haplogroups? I'm thinking overall that if some regions of the Y are more prone to mutations than other regions, how does that impact fixing a consistent mutation rate across all haplos through space and time? And when we go back deeper in time, when all the main haplos diversified tens of thousands of years ago, how are we sure that each haplo more or less retained the same fixed/consistent rate - could we be overestimating haplo ages in certain instances by applying a single standard/fixed mutation rate, based on high numbers of mutations, when maybe there could be selection effects that increased the mutation rate at a certain stretch of time?

1. Technically mutation can occur everywhere, but many mutations are likely not viable ... and therefore won't never be oserved among living individuals. Also, some mutation might be "short-lived" in the sens that if this mutation makes the carrier unable to produce children, then this mutation will have a maximum life duration of 1 generation (making it unlikely to be observed).

2. Some regions have a way higher mutation rate than others. This is why when performing TMRCA estimations (time of the most recent common ancestor) only a subset of SNPs are used. The selected SNPs are chosen to have relatively stable mutation rate. For exemple YFULL uses a region they call "COMB-BED" (the concerned SNP-region delimitations are public), for which they assume a rate of 1 mutation every ~144.41 years.

3. Stability of the mutation rate over all haplogroups is an open question. Even if mutation rate is stable, depending on living conditions some variation of the fixing rate might affect significantly the "apparent" mutation rate after many generations.
It appears that Y-DNA mutation rate is way less affected by variation of the fixing rate than mt-DNA (mt-DNA apparently got a burst of fixed mutations during OoA).
For Y-DNA, TMRCA estimations are corresponding fairly well with significant demographic events. For exemple, the European Neolithic expansion that you observe in the archeological reccords match well with the TMRCAs of G-haplogroup subclades (work aswell for many population movements/expansions).

In fact there is a simple way to test the mutation rate.
Lets take the Y-DNA full tree. You can check if on each branch you have the same amount of mutations.
For each sample, you can count the number of mutations since F.
It is important to take into-account the amount of "correlated mutations" (because under the absolute root of the tree some sample have a more recent MRCA, i.e., R1b, I2, J2b, H, O, ....) 
Then you can test if the number of mutations fit with a single mutation rate (in that case the number of mutations should follow a poissonian distribution*).
A variable mutation rate would have for effect to create a more dispersed distribution of mutation number since MRCA (a stack of multiple poisonnian distribution with different mean values).

*Each sample individually follow a poissonian distribution, but when stacking them it is important to take into account that many samples are heavily correlated because their Y-lineages often separated way later than the root of the Y-tree.

Then, you can test if the mutation rate is correlated  with different variable : lattitudes, climat, specific subclades, ...
I don't know if such work have been done already, but from YFULL data it should be relatively easy.
The main problematic part would be to retrieve automatically the number of private SNP of each YFULL sample. I know to well that when massively queried YFULL is blocking IP-adresses for few hours.
Merriku likes this post
Reply
#33
Quote:We observed significant variation in the Y-chromosome somatic mutation rate among haplogroups using the 1KG data set (fig. 3A, P = 5.30 × 10−10, Kruskal–Wallis rank-sum test). Strikingly, the interhaplogroup variation in somatic mutation rate and phylogenetic branch length were positively correlated (Spearman ρ = 0.54, P = 1.54 × 10−2, fig. 3B), suggesting that similar variation in mutation rate is likely also present in the germline. For example, haplogroups E and R had the shortest phylogenetic branch lengths and the lowest somatic mutation rates.

Mutation Rate Variability across Human Y-Chromosome Haplogroups | Molecular Biology and Evolution | Oxford Academic

Found the following above paper that addresses mutation rate variability across haplogroups, with special notice given to E and R incidentally. The paper looks specifically differences in somatic mutation but there apparently most be some related process that acts on the germline as well.
Megalophias likes this post
Reply
#34
(01-22-2024, 05:36 PM)Kale Wrote: Hmmm, well here's a test that might suggest something going on...
1) Checked all 107 snps that define R-M269 in yfull to see if any other lineage is defined by those snps
2) Compared the number of matches to the number of snps in each major haplogroup (click the ytree statistics button)
3) Plot the matches against the total snps for the haplogroup


Clearly I is an outlier, and if there is a haplogroup you'd expect to have most contact with M269 it would surely be I.
It would be good to do a 'control' test with some haplogroup that's well restricted geographically, like S or M, instead of M269, to make sure everything behaves and this isn't an artifact.

Following up on this... 
R-M269 definitely has something suspicious as far as number of matches on yfull goes. It is defined by 107 snps, and has 310 matches in other macrohaplogroups.
I tried M-Z31022, defined by a whopping 374 snps, yet only 75 matches to other macrohaplogroups.
Hmmm, is it contamination? Most of the tests I'd assume are processed in Europe and/or North America, where R-M269 would be infinitely more common than M-Z31022.... 
I1 - defined by 297 snps, 96 matches to other macrohaplogroups. Not worlds apart from M-Z31022. 
What is going on with snps like L777 where there are loads of matches?
Reply
#35
(01-22-2024, 05:16 AM)Kale Wrote: How do you propose to reconcile...
A) "Quick tally of yfull showed 0 shared snps between those that define haplogroup G and those that define J, J1, L, or T."
B) That there is no scarcity of ancient examples you've listed in this thread.

I would say, if pattern A is repeatable with other haplogroups with known or strongly suspected extended contact, then the conclusion for B must be false-positives/dna-damage/contamination/etc.
Many times false-positives, indeed. Single reads, bad mapping, deamination events (C->T, G->A) etc. also concur.
There are over ten non-R-U106 ancient samples that get U106+, but so far there would be only one recurrence of it in modern men (downstream a different macro-hg). Certain positive reads may be ignored then, unless they're well contextualized. A given categorization (of an ancient), or the deepest assignment, usually comes from a set of results that must make sense together, and not from isolated results. Aggressiveness compromises confidence.
Reply
#36
Here's a fun one from TheYtree. This is on their DF19 tree.
The first ancient listed here is not DF19, he is actually DF27, but the SNP is a wobbler:
https://www.theytree.com/tree/R-DF19

He is listed as R-Z39393.
If you plug R-Z39393 into FTDNA's Discover you get a funny message:
https://discover.familytreedna.com/y-dna...93/ancient
Quote:Search returned multiple results.
Select from the following:
R-Z39380 [this path is DF19]
R-BY94701 [this path is DF27]

TheYtree summary of the sample when you click through:
Quote:Sample name
BAS023
AU number
AU67366
Sample Type
Ancient
Reference assembly
hg19 / GRCh37
Y-haplogroup
R-Z39393
MT-haplogroup
U5b2b3
Gender
Male
Quality
Coverage: 16.7% Average Depth: 1
Scientific institution

Native place
la Bastida, Totana, Murcia, Spain
Ethnicgroup
Argar Culture Bronze Age
Area
SE Spain
Genetic family
R1b-P312-DF27-Z195
Ancient period
Argar 2nd phase (2000 - 1750 cal BCE)
Data contributor
Miquel Roman
Data source
Genomic transformation and social organization during the Copper Age-Bronze Age transition in southern Iberia

Kudos to TheYtree for "showing their work" and cross-referencing things.
Capsian20, Ambiorix, JMcB like this post
R1b>M269>L23>L51>L11>P312>DF19>DF88>FGC11833 >S4281>S4268>Z17112>FT354149

Ancestors: Francis Cooke (M223/I2a2a) b1583; Hester Mahieu (Cooke) (J1c2 mtDNA) b.1584; Richard Warren (E-M35) b1578; Elizabeth Walker (Warren) (H1j mtDNA) b1583; John Mead (I2a1/P37.2) b1634; Rev. Joseph Hull (I1, L1301+ L1302-) b1595; Benjamin Harrington (M223/I2a2a-Y5729) b1618; Joshua Griffith (L21>DF13) b1593; John Wing (U106) b1584; Thomas Gunn (DF19) b1605; Hermann Wilhelm (DF19) b1635
Reply

Check for new replies

Forum Jump:


Users browsing this thread: 1 Guest(s)