Posts: 323
Threads: 8
Joined: Oct 2023
01-22-2024, 06:37 AM
(This post was last modified: 01-22-2024, 06:59 AM by TanTin.)
Here is another one:
CTS9894.1 G PF3036.1; M3574.1 rs559968594 19124322 17012442 A->T
PF3036.1 G M3574.1; CTS9894.1 rs559968594 19124322 17012442 A->T
famInd X Y_ch
1 HG02231.SG T IBS.SG R1b1a1b1a1a2
2 HG02690.SG T PJL.SG J2b2a2b2
9 Iceman_noUDG.SG T Italy_North_MN_Iceman_contam.SG NA ( I just confirmed Iceman - Ötzi is G, so no questions here )
Posts: 323
Threads: 8
Joined: Oct 2023
Next examples:
Show Content
Spoiler
snp_24_23768744
F3536 G2 M3626; PF3120 rs540990933 23768744 21606858 C->T
Botocudo15_noUDG.SG T Brazil_Botocudo.SG NA
2 N5a.SG T Russia_LenaRiver_MiddleN.SG n/a (female)
3 VK18_noUDG.SG T Russia_Viking_o.SG NA
4 VK18.SG_3.2Mall T Russia_Viking_o.SG NA
snp_24_7930724
M3474 G PF2869 rs567272681 7930724 8062683 C->A
PF2869 G M3474 rs567272681 7930724 8062683 C->A
I15027 A France_Medieval_o NA
2 HG02231.SG A IBS.SG R1b1a1b1a1a2
3 I4068 A Netherlands_BellBeaker R1b1a1b1a1a2
4 RISE548_noUDG.SG A Russia_Kalmykia_EBA_Yamnaya.SG NA
rs2713254
L382 G M3523; PF2951 rs2713254 14469411 12348680 C->A
M3523 G L382; PF2951 rs2713254 14469411 12348680 C->A
PF2951 G L382; M3523 rs2713254 14469411 12348680 C->A
I8549 A Dominican_Andres_Ceramic Q1b1a1a
2 I12344 A Dominican_ElSoco_Ceramic Q1b1a1a
3 I7043 A Hungary_EBA_Protonagyrev R1b1a1b1a1a
Posts: 323
Threads: 8
Joined: Oct 2023
01-22-2024, 06:51 AM
(This post was last modified: 01-22-2024, 06:52 AM by TanTin.)
Show Content
Spoiler
snp_24_14273557 = rs537301281
CTS2357 G M3517; PF2943 rs537301281 14273557 12152851 C->T
M3517 G CTS2357; PF2943 rs537301281 14273557 12152851 C->T
PF2943 G CTS2357; M3517 rs537301281 14273557 12152851 C->T
VK498_noUDG.SG T Estonia_EarlyViking.SG NA
2 I20799 T Hungary_Transtisza_LAvar NA
3 NA18967.SG T JPT.SG D1a2a1b2b1a1
4 vik_84001_noUDG.SG T Sweden_Viking.SG NA
5 I4424_noUDG T Vanuatu_150BP NA
6 I4106_noUDG T Vanuatu_150BP_noUDG NA
7 CNE1_noUDG T Nepal_Mustang_Chokhopani_IA NA
snp_24_8482393 = rs538642017
F1239 G2 M3483; PF2886 rs538642017 8482393 8614352 C->T
M3483 G2 F1239; PF2886 rs538642017 8482393 8614352 C->T
PF2886 G2 F1239; M3483 rs538642017 8482393 8614352 C->T
1 HG02307.SG T ACB.SG E1b1a1a1a1c1a1a3d4~
2 VK168_noUDG.SG T England_Viking.SG NA
3 VK175_noUDG.SG T England_Viking.SG NA
4 prs016_noUDG.SG T Ireland_Megalithic.SG NA
5 TAQ002 T Italy_Lazio_Viterbo_Etruscan NA
6 RISE98_noUDG.SG T Sweden_LN.SG NA
7 VK168.SG_3.2Mall T England_Viking.SG NA
Posts: 323
Threads: 8
Joined: Oct 2023
And there is one more special. PF3112
Show Content
Spoiler
P15 G2a PF3112 rs370167410 23244026 21082140 C->T
PF3112 G2a P15 rs370167410 23244026 21082140 C->T
1 I2520 T Bulgaria_EBA H
2 I10872_alt T Cameroon_SMA NA
3 I7187 T Czech_C_Baalberge I2a1b1b
4 KO1008 T Czech_EBA_Unetice R1b1a1b1a1a2b1
5 MIS002 T Czech_EBA_Unetice R1b1a1b1a1a2b1
6 MIB027 T Czech_EBA_Unetice_contam R1b1a1b1a1a2b1
7 I15667 T Dominican_LaCaleta_Ceramic Q1b1a1a
8 I15978 T Dominican_LaCaleta_Ceramic Q1b1a1a
9 I16173 T Dominican_LaCaleta_Ceramic Q1b1a1a
10 I16174 T Dominican_LaCaleta_Ceramic Q1b1a1a
11 I5379 T England_BellBeaker_highEEF R1b1a1b1a1a2c1
12 I6747 T England_N I2a1b1a1a1
13 VK552_noUDG.SG T Estonia_EarlyViking.SG NA
14 I4996 T Hungary_IA_LaTene NA
15 SZ7 T Hungary_Langobard I2a1b1a2b1a2a2~
16 I1496 T Hungary_MN_LBK C1a2
17 I2062 T Israel_MLBA R1b1a1b
18 R1221.SG T Italy_Medieval_EarlyModern_oCentralEuropean.SG J2a1a1b2a1b1b2c~
19 CL146 T Italy_North_EarlyMedieval_Langobards_1 R1b1a1b1a1a
20 I8918 T Kenya_PastoralN E1b1b1b2b2a1a~
21 DA100_noUDG.SG T Kyrgyzstan_TianShan_Hun.SG NA
22 SI-45_noUDG.SG T Lebanon_Medieval.SG NA
23 YAG001 T Mongolia_EBA_Chemurchek_2_dup.I12957 R1b1a1b1b3
24 I6365 T Mongolia_EIA_SlabGrave_1 N1a1a1a1a4a~
25 I13223 T Pakistan_Loebanr_IA L1a2a
26 I2244 T Peru_MH_LIP_Lambayeque Q1b1a1a
27 I2549 T Peru_Palpa_LIP_550BP Q1b1a1a
28 RISE495_noUDG.SG T Russia_Karasuk_oRISE.SG NA
29 I5288 T Slovakia_IA_Vekerzug NA
30 ALM064 T Spain_Almoloya_Argar_Late NA
31 I7602 T Spain_MLN I2a1b1b
32 LAV006 T StLucia_Lavoutte_Ceramic Q1b1a1a
33 ART017 T Turkey_Arslantepe_LateC J2a1a1a2b2a2b2~
34 ART042 T Turkey_Arslantepe_LateC H
35 I4085 T Turkmenistan_C_TepeAnau R2
36 I4105_noUDG T Vanuatu_150BP_noUDG NA
37 TAN001 T Vanuatu_200BP O2a2b
38 I17896 T Venezuela_LasLocas_Ceramic Q1b1a1a
39 C1192 T China_Xinjiang_Jierzankale(Jirzankal)_IA NA
40 GUY001_3.2Mall T Cuba_GuayaboBlanco_Archaic NA
41 LAV006_3.2Mall T StLucia_Lavoutte_Ceramic NA
I am providing also a PCA picture for this one:
Posts: 51
Threads: 1
Joined: Dec 2023
While I think the mechanism you propose might exists, making a list of "exemples" is not the way to go to prove it exists.
If this mecanism is frequent enough to be seen among ancient samples, then, you should see it in "confirmed" clades from the Y-tree (either FTDNA or YFULL).
I will propose you a statistical test :
--> Restrict yourself to SNP used for age estimation (to avoid regions with a too high mutation rate)
--> Take few macro-haplogroups
--> Select all subclades with TF < 5000 ybp
--> Classify them by likely geographical place of living since ~5000 ybp.
--> Compute the covariance matrix (== counting the number of redundant mutations per "yr.clade") of as a function of localisation and macro-haplogroup (PS: becarefull to correct for unknown private SNP).
If the effect you mention is significant, you should see a higher covariance for separate macro-haplogroups that spent time at the location compared to macro-haplogroups that weren't around the same place.
For exemple, you would expect many R-P312's SNP injections into minor western European macro-haplogroup subclades, whereas you won't expect such injections for the concerned macro-haplogroup subclades located in Asia.
I think that if you manage to show that such statistical signal exists, you would have a strong case that would be hard to explain by "lucky" convergent mutations (and if done with care, such detection would also definitely deserves a publication).
But showing some exemples won't never prove that the effect you speak of exists and is spatially coherent (which is a requirement considering the hypothesis you propose to explain such effect).
Posts: 323
Threads: 8
Joined: Oct 2023
This is a good one :
Show Content
Spoiler
snp_24_17610571
CTS7674 G M3562; PF3015 rs554374161 17610571 15498691 G->A
M3562 G CTS7674; PF3015 rs554374161 17610571 15498691 G->A
PF3015 G CTS7674; M3562 rs554374161 17610571 15498691 G->A
1 KO1003 A Czech_BellBeaker R1b1a1b1a1a2b1
2 I7208 A Czech_CordedWare R1a1a1a~
3 I14800 A England_MIA NA
4 VK491_noUDG.SG A Estonia_EarlyViking.SG NA
5 VK506_noUDG.SG A Estonia_EarlyViking.SG NA
6 HUGO_180Sk1 A Germany_Lech_BellBeaker_contam ..
7 PB1327.SG A Ireland_MN.SG I2a1b1a1a1b
8 I2201 A Israel_IA_o T1a1a1b2b2b1a1a2
9 I3966 A Israel_MLBA E1b1b1b2a1a1a1a1f~
10 TAQ013 A Italy_Lazio_Viterbo_Etruscan NA
11 BUR003 A Mongolia_Arkhangai_XiongnuEarlyMedieval_1 R1a1a1b2a2b2b~
12 TAF013_noUDG A Morocco_Iberomaurusian NA
13 S10_noUDG.SG A Nepal_Samdzong_1500BP.SG NA
14 I2960 A Pakistan_Medieval_Udegram_Ghaznavid_father.or.son.I2959 R1a1a1
15 TV3831_noUDG.SG A Portugal_MBA.SG NA
16 MK5004 A Russia_Caucasus_LateMaikop L
17 MK5001 A Russia_Caucasus_LateMaikop_rel.MK5004 L
18 I7335 A Russia_UstBelaya_Angara Q1b1a3
19 RISE94_noUDG.SG A Sweden_BattleAxe.SG NA
20 VK506.SG_3.2Mall A Estonia_EarlyViking.SG NA
21 CUC002_3.2Mall A Cuba_CuevaCalero_Archaic NA
Posts: 12
Threads: 0
Joined: Oct 2023
Gender: Female
Ethnicity: Bulgarian
Y-DNA (P): I2-Y16664
Y-DNA (M): E-V13
mtDNA (M): T2
mtDNA (P): H1b
I don't quite understand your theory. It was long known that some mutations happen in different haplogroups. In older times, when only a few SNPs were known this caused problems after individual SNP testing. Nowadays with the NGS the haplogroup is easily identified in the context of all mutations. Probably YFULL avoids including such SNPs.
For example in the past L277.1 was defining a major branch under R1b-Z2103. The same mutation, named L277.2 was found also under R1a and even O2 and N1b as S334 . It is possible such positions are unstable and should not be used in major Y tree building.
https://www.genetichomeland.com/dna-mark...ome-Y/L277
Posts: 32
Threads: 1
Joined: Nov 2023
Gender: Male
Ethnicity: Armorican
Y-DNA (P): R-FT162677 (R-L21)
mtDNA (M): U4c1
mtDNA (P): K1b2b
Yfull puts all SNPs positive including those which are known and which belong to haplogroups other than ours.
They are in "Knows SNPs".
I have some in private variants, but also at the level of my final markers.
Example:
I am positive for PF2595 (haplogroup F), Z2051 (haplogroup G), CTS12727 (haplogroup J), FGC48518 (haplogroup G), M10956 (haplogroup A1b) ect...
So, what is certain is that it is very common, yfull puts them for information but does not take them into account because they are only rated with one star.
On the other hand, I looked at the first 3 markers given by TanTin, they are rated 4 stars on Yfull. This means that Yfull still considers them quite reliable (Maximum 5 stars)
I have no knowledge of how this happens..
Posts: 51
Threads: 1
Joined: Dec 2023
01-22-2024, 10:32 AM
(This post was last modified: 01-22-2024, 10:36 AM by GHurier.)
(01-22-2024, 09:15 AM)eastara Wrote: I don't quite understand your theory. It was long known that some mutations happen in different haplogroups. In older times, when only a few SNPs were known this caused problems after individual SNP testing. Nowadays with the NGS the haplogroup is easily identified in the context of all mutations. Probably YFULL avoids including such SNPs.
For example in the past L277.1 was defining a major branch under R1b-Z2103. The same mutation, named L277.2 was found also under R1a and even O2 and N1b as S334 . It is possible such positions are unstable and should not be used in major Y tree building.
https://www.genetichomeland.com/dna-mark...ome-Y/L277
Indeed it is known, and expected, that some Y-haplogroup will share "lucky" mutations on the same SNP.
But the question TanTin is proposing is to check if these mutations are really only based on "luck" and random process or are some of them passed down via some mechanism from chrX to chrY ?
The later would imply a spatial coherence in the so-called shared "lucky" mutations ...
Another way (in addition to the one I proposed above) to detect the existence of such effect would be to place the shared SNP on a maps for a given "source" population (i.e., making a map of mutations shared with R-P312 but inside haplogroups of non-R-P312 descendents, with a weight of 1 for each mutation, therefore spreading this weight among all the carrier to avoid effects comming from later founding effect post-dating the mutation).
After you normalise this map by the density of habitants not descending from R-P312 (technically a normalisation by the integrated lifetime, since their TF, of the non-R-P312 clades would be more relevant ... but is also harder to build accurately).
If the effet proposed in this topic exists, then the mutations map won't be a "white noise" (random spatial distribution) and should show some signal on the angular power-spectrum of the map.
To enhance the signal in such attempt it is better to look for the Shared-mutation-map vs R-P312 density map cross-correlation angular power spectrum (because you expect more mutation sharing where R-P312 are living) ... that should be positive if the effect proposed in this topic exists and have a significant impact on the Y-tree.
Of course such analysis is a bit more tricky to perform, and would only speak to peoples with a strong background in data analysis (== peoples with a academic research level). At first order a simple "covariance matrix approach" would be more easy to understand for most peoples.
Posts: 296
Threads: 2
Joined: Sep 2023
01-22-2024, 05:36 PM
(This post was last modified: 01-22-2024, 05:54 PM by Kale.)
Hmmm, well here's a test that might suggest something going on...
1) Checked all 107 snps that define R-M269 in yfull to see if any other lineage is defined by those snps
2) Compared the number of matches to the number of snps in each major haplogroup (click the ytree statistics button)
3) Plot the matches against the total snps for the haplogroup
Clearly I is an outlier, and if there is a haplogroup you'd expect to have most contact with M269 it would surely be I.
It would be good to do a 'control' test with some haplogroup that's well restricted geographically, like S or M, instead of M269, to make sure everything behaves and this isn't an artifact.
TanTin and GHurier like this post
Posts: 51
Threads: 1
Joined: Dec 2023
Btw, did anyone have a database (in .csv, .txt formats or anything easily usable) of all SNPs defining haplogroups (if possible with HG38 positions) ?
Last time I tried to make a massive query on YFULL pages, I got blocked after retrieving few pages (too many connections in a short time).
And for some reasons, if I do it manually, I can get the the web-pages containing web-adresses with the HG38 position of the SNPs, but when retrieving the page with WGET I don't get the link and thus I only get the SNP name and not the position.
And I'm way too lazy to do that manually for all haplogroups (nearly ~35k for YFULL in total).
Posts: 12
Threads: 0
Joined: Oct 2023
Gender: Female
Ethnicity: Bulgarian
Y-DNA (P): I2-Y16664
Y-DNA (M): E-V13
mtDNA (M): T2
mtDNA (P): H1b
(01-22-2024, 09:03 PM)GHurier Wrote: Btw, did anyone have a database (in .csv, .txt formats or anything easily usable) of all SNPs defining haplogroups (if possible with HG38 positions) ?
Last time I tried to make a massive query on YFULL pages, I got blocked after retrieving few pages (too many connections in a short time).
And for some reasons, if I do it manually, I can get the the web-pages containing web-adresses with the HG38 position of the SNPs, but when retrieving the page with WGET I don't get the link and thus I only get the SNP name and not the position.
And I'm way too lazy to do that manually for all haplogroups (nearly ~35k for YFULL in total).
The Y tree is officially supported by the ISOGG, the scientific studies are based on it. This is their SNP list:
https://docs.google.com/spreadsheets/d/1...1934392066
https://isogg.org/tree/
GHurier and TanTin like this post
Posts: 323
Threads: 8
Joined: Oct 2023
(01-22-2024, 10:50 PM)eastara Wrote: The Y tree is officially supported by the ISOGG, the scientific studies are based on it. This is their SNP list:
https://docs.google.com/spreadsheets/d/1...1934392066
https://isogg.org/tree/
Where do you find which snip is 4* or 5* ?
Posts: 51
Threads: 1
Joined: Dec 2023
Thanks, for people looking at it like me, I also found this version with a mapping to YFULL-nodes :
https://ybrowse.org/gbrowse2/gff/snps_hg38.gff3
Posts: 323
Threads: 8
Joined: Oct 2023
https://www.biorxiv.org/content/10.1101/...658v2.full
Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation
Quote: Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has led to its 36 incomplete assembly and systematic omission from genomic analyses. Here, we present long-read de 37 novo assemblies of 43 diverse Y chromosomes spanning 180,000 years of human evolution, including 38 two from deep-rooted African Y lineages, and report remarkable complexity and diversity in 39 chromosome size and structure, in contrast with its low level of base substitution variation. The size 40 of the Y chromosome assemblies varies extensively from 45.2 to 84.9 Mbp and include, on average, 41 81 kbp of novel sequence per Y chromosome. Half of the male-specific euchromatic region is subject 42 to large inversions with a >2-fold higher recurrence rate compared to inversions in the rest of the 43 human genome. Ampliconic sequences associated with these inversions further show differing 44 mutation rates that are sequence context-dependent and some ampliconic genes show evidence for 45 concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest 46 heterochromatic region in the human genome, the Yq12, is composed of alternating arrays of DYZ1 47 and DYZ2 repeat units that show extensive variation in the number, size and distribution of these 48 arrays, but retain a 1:1 copy number ratio of the monomer repeats, consistent with the notion that 49 functional or evolutionary forces are acting on this chromosomal region. Finally, our data suggests 50 that the boundary between the recombining pseudoautosomal region 1 and the non-recombining 51 portions of the X and Y chromosomes lies 500 kbp distal to the currently established boundary. The 52 availability of sequence-resolved Y chromosomes from multiple individuals provides a unique 53 opportunity for identifying new associations of specific traits with Y-chromosomal variants and 54 garnering novel insights into the evolution and function of complex regions of the human genome.
|