Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.

Genetic Genealogy & Ancient DNA (TITLES/ABSTRACTS)
Posted May 16, 2024.

Simulating pedigrees ascertained on the basis of observed IBD sharing

Ethan M. Jewett, the 23andMe Research Team


Abstract

In large genotyping datasets, individuals often have thousands of distant cousins with whom they share detectable segments of DNA identically by descent (IBD). The ability to simulate these distant relationships is important for developing and testing methods, carrying out power analyses, and performing population genetic analyses. Because distant relatives are unlikely to share detectable IBD segments by chance, many simulation replicates are needed to sample IBD between any given pair of distant relatives. Exponentially more samples are needed to simulate observable segments of IBD simultaneously among multiple pairs of distant relatives in a single pedigree. Using existing pedigree simulation methods that do not condition on the event that IBD is observed among certain pairs of relatives, the chances of sampling shared IBD patterns that reflect those observed in real data ascertained from large genotyping datasets are vanishingly small, even for pedigrees of modest size. Here, we show how to sample recombination breakpoints on a fixed pedigree while conditioning on the event that specified pairs of individuals share at least one observed segment of IBD. The resulting simulator makes it possible to sample genotypes and IBD segments on pedigrees that reflect those ascertained from biobank scale data.

Competing Interest Statement

The author is an employee and shareholder of 23andMe, Inc.


https://www.biorxiv.org/content/10.1101/...XKgZaSXVIM


&



Posted May 16, 2024.

Correcting model misspecification in relationship estimates

View ORCID ProfileEthan M. Jewett, the 23andMe Research Team


ABSTRACT

The datasets of large genotyping biobanks and direct-to-consumer genetic testing companies contain many related individuals. Until now, it has been widely accepted that the most distant relationships that can be detected are around fifteen degrees (approximately 8th cousins) and that practical relationship estimates have a ceiling around ten degrees (approximately 5th cousins). However, we show that these assumptions are incorrect and that they are due to a misapplication of relationship estimators. In particular, relationship estimators are applied almost exclusively to putative relatives who have been identified because they share detectable tracts of DNA identically by descent (IBD). However, no existing relationship estimator conditions on the event that two individuals share at least one detectable segment of IBD anywhere in the genome. As a result, the relationship estimates obtained using existing estimators are dramatically biased for distant relationships, inferring all sufficiently distant relationships to be around ten degrees regardless of the depth of the true relationship. Moreover, existing relationship estimators are derived under a model that assumes that each pair of related individuals shares a single common ancestor (or mating pair of ancestors). This model breaks down for relationships beyond 10 generations in the past because individuals share many thousands of cryptic common ancestors due to pedigree collapse. We first derive a corrected likelihood that conditions on the event that at least one segment is observed between a pair of putative relatives and we demonstrate that the corrected likelihood largely eliminates the bias in estimates of pairwise relationships and provides a more accurate characterization of the uncertainty in these estimates. We then reformulate the relationship inference problem to account for the fact that individuals share many common ancestors, not just one. We demonstrate that the most distant relationship that can be inferred may be forty degrees or more, rather than ten, extending the time-to-common ancestor from approximately 200 years in the past to approximately 600 years in the past or more. This dramatic increase in the range of relationship estimators makes it possible to infer relationships whose common ancestors lived before historical events such as European settlement of the Americas and the Transatlantic Slave Trade, and possibly much earlier.

Competing Interest Statement

The author is an employee and shareholder of 23andMe.


https://www.biorxiv.org/content/10.1101/...pzdtt-pX-Q
Orentil, Megalophias, Andour And 6 others like this post
Paper Trail: 42% English, 31.5% Scottish, 12.5% Irish, 6.25% German, 6.25% Sicilian & 1.5% French.
LDNA©: Britain & Ireland: 89.3% (51.5% English, 37.8% Scottish & Irish), N.W. Germanic: 7.8%, Europe South: 2.9% (Southern Italy & Sicily)
BigY 700: I1-Z141 >F2642 >Y3649 >Y7198 (c.345 AD) >Y168300 (c.392 AD) >A13248 (c.871 AD) >A13252 (c.1051 AD) >FT81015 (c.1281 AD) >A13243 (c.1620 AD) >FT80854 (c.1700 AD) >FT80630 (1893 AD).
Reply


Messages In This Thread
RE: Genetic Genealogy & Ancient DNA (TITLES/ABSTRACTS) - by JMcB - 05-21-2024, 04:07 PM

Forum Jump:


Users browsing this thread: 1 Guest(s)