(02-02-2024, 11:24 PM)Horatio McCallister Wrote: A couple of general questions on Y mutations overall:
1. Are there certain regions of the Y chromosome that mutations are limited to or can mutations occur anywhere?
2. Are some regions more prone to mutations than others?
3. How sure are we really on what the Y mutation rate truly is and if it's really generalizable across all haplogroups? I'm thinking overall that if some regions of the Y are more prone to mutations than other regions, how does that impact fixing a consistent mutation rate across all haplos through space and time? And when we go back deeper in time, when all the main haplos diversified tens of thousands of years ago, how are we sure that each haplo more or less retained the same fixed/consistent rate - could we be overestimating haplo ages in certain instances by applying a single standard/fixed mutation rate, based on high numbers of mutations, when maybe there could be selection effects that increased the mutation rate at a certain stretch of time?
1. Technically mutation can occur everywhere, but many mutations are likely not viable ... and therefore won't never be oserved among living individuals. Also, some mutation might be "short-lived" in the sens that if this mutation makes the carrier unable to produce children, then this mutation will have a maximum life duration of 1 generation (making it unlikely to be observed).
2. Some regions have a way higher mutation rate than others. This is why when performing TMRCA estimations (time of the most recent common ancestor) only a subset of SNPs are used. The selected SNPs are chosen to have relatively stable mutation rate. For exemple YFULL uses a region they call "COMB-BED" (the concerned SNP-region delimitations are public), for which they assume a rate of 1 mutation every ~144.41 years.
3. Stability of the mutation rate over all haplogroups is an open question. Even if mutation rate is stable, depending on living conditions some variation of the fixing rate might affect significantly the "apparent" mutation rate after many generations.
It appears that Y-DNA mutation rate is way less affected by variation of the fixing rate than mt-DNA (mt-DNA apparently got a burst of fixed mutations during OoA).
For Y-DNA, TMRCA estimations are corresponding fairly well with significant demographic events. For exemple, the European Neolithic expansion that you observe in the archeological reccords match well with the TMRCAs of G-haplogroup subclades (work aswell for many population movements/expansions).
In fact there is a simple way to test the mutation rate.
Lets take the Y-DNA full tree. You can check if on each branch you have the same amount of mutations.
For each sample, you can count the number of mutations since F.
It is important to take into-account the amount of "correlated mutations" (because under the absolute root of the tree some sample have a more recent MRCA, i.e., R1b, I2, J2b, H, O, ....)
Then you can test if the number of mutations fit with a single mutation rate (in that case the number of mutations should follow a poissonian distribution*).
A variable mutation rate would have for effect to create a more dispersed distribution of mutation number since MRCA (a stack of multiple poisonnian distribution with different mean values).
*Each sample individually follow a poissonian distribution, but when stacking them it is important to take into account that many samples are heavily correlated because their Y-lineages often separated way later than the root of the Y-tree.
Then, you can test if the mutation rate is correlated with different variable : lattitudes, climat, specific subclades, ...
I don't know if such work have been done already, but from YFULL data it should be relatively easy.
The main problematic part would be to retrieve automatically the number of private SNP of each YFULL sample. I know to well that when massively queried YFULL is blocking IP-adresses for few hours.