We use our DNA results to find relatives who match our DNA. Here we provide an understanding of how such DNA matching can work to help us identify a common ancestor (CA) with someone who shares parts of our DNA.
Autosomal DNA (atDNA or auDNA) is another term for recombinant DNA, the DNA that is inherited as 22 pairs of autosomes (recombining chromosomes), each pair consisting of one chromosome from each parent.
A human germ cell (gamete) contains a single set of 23 chromosomes, each randomly chosen from a corresponding autosomal pair, so each chromosome comes from one of the parents. Yet both parents are represented in each of these 23 chromosomes. How can one chromosome represent both parents?
Each of a gamete’s chromosomes contains a mixture of DNA segments inherited from the parents (a resulting child’s grandparents). This mixing of DNA segments is called crossover. It occurs in an early phase of the complex, multi-stage meiosis process, during gamete production. The details are not germane here; see [https://www.ncbi.nlm.nih.gov/books/NBK26840/ ‘Fat Alberts’], or other online discussions.
When a gamete joins an oppositely-sexed gamete during fertilization, the resulting offspring’s atDNA consists of paired chromosomes, each consisting of mixed segments representing both sets of grandparents. No ancestor gets left out in near-by generations, but by the 5th generation and later, the probability of having all ancestors represented in one’s DNA becomes vanishingly small:
- 96% chance of representing all 16 g-g-grandparents
- 54% chance of representing all 32 g-g-g-grandparents
- .01% chance of representing all 64 g-g-g-g-grandparents
Two or three segments are usually pseudo-randomly mixed on each chromosome during meiosis crossover. There are hotspots and coldspots on each chromosome where such splicing is more often or less often performed.
Combined with the 223 different configurations of human gamete that can be expressed during meiosis (a father or mother choice for each chromosome), it is clear that meiosis is a source of great DNA shuffling, and explains why non-twinned children of the same parents are so unique.
Segments Over The Generations – Finding A Common Ancestor (CA)
Each chromosome in a human gamete is inherited from one of the parents. As a result of crossover, at least one, and on average 1.6 segments of DNA on a gamete chromosome is sourced from a chromosome provided by the other parent. The resulting alternating-parent DNA segments each consists of many millions of DNA base pairs (BP), and measure several tens of centiMorgans (cM) in length.
A cM is not actually a fixed measure of DNA length, but rather a probabilistic measure, which accounts for the hot and cold spots mentioned previously. 1 cM is defined as the amount of location-specific DNA for which a 1% probability exists of its containing a segment crossover point. The average chromosome length is ~160 cM. Since there is, by definition, a 100% probability that a crossover point will occur in 100cM, all chromosomes are ensured to host one crossover segment; thus one can say no grandparent will be left out. (Other measures of chromosome length are BPs and SNPs; 1 cM contains on average about 1 million BPs, but this count varies widely in differing chromosomes and in M/F gametes.)
If another person shares an entire chromosome after crossover, it is likely they will be a sibling; the parent will be the CA. If one shares a significant part of such a segment, it is likely they will be a first cousin, and a grandparent will be the CA. The further removed in ancestry one is from the genome being matched, the smaller the segment that will be shared.
When a shared segment is found in common between two relatives, the end points of each relative’s chromosome segment will likely be different. What is actually shared is the overlap between the two chromosome segments. Further, this overlap area may itself be shared by other relatives with the same or different CA. Triangulation is needed to define which relatives belong to which CA, a process called triangulation grouping.
In segmentology used for CA determinations, one usually is looking at segment lengths between 25 and 125 cM, or 0.4% to 2% of shared DNA. The ancestral relations corresponding to these lengths are:
- 1.6% shared DNA or 110 cM – second cousins once removed, half second cousins, first cousin three times removed, half first cousin twice removed
- 0.8% shared DNA or 55 cM – third cousins, second cousins twice removed
- 0.4% shared DNA or 27 cM – third cousins once removed
There are two qualities of match when locating shared segments, Identity by Descent (IBD), and Identity by State (IBS). Above we exemplify IBD matching, where the matching genomes have at least one near-term CA. This is nirvana, finding an ancestor by segment matching.
It is possible to match a segment, but for the two parties not to share a recent CA. This is IBS, and can occur when a segment descends from multiple sources having different ancestry, and by chance matching the comparison genome. Triangulation matching is used to weed out such IBS false positives that falsely suggest a recent CA.
Heuristically, for segments larger than ~10cM (usually indicating a 4th-5th cousin or closer relationship), there is scant evidence of false positives through IBS. It is generally safe to assume IBD and to pursue identification of the CA. But at ~7cM of matching DNA, evidence suggests there is a 50-50 chance the match is IBS. For such shorter segments, extra information is needed to guarantee there is a CA. Triangulation is the source of such information, through finding other relatives who also share the matching segment.
GEDmatch provides Tier 1 (paid) tools to assist segment triangulation. There are likely other tools available also. But a spreadsheet and the standard (free) tools of 23andMe (where I tested DNA) and GEDmatch suffice for my purposes.
GEDmatch and 23andMe provided a combined list of about 150 DNA relatives with whom I shared at least one DNA segment >10cM and a total shared segment length of >25cM; these comprise my target population. Many 23andMe customers have uploaded their DNA to GEDmatch, so one only needs to fill in the remainder from 23andMe directly.
From the main page after logging into GEDmatch, request the One-to-many report from the Data Analysis panel, with segment length threshold set to 10 cM (default is 7 cM). This produces a table of people matching your DNA at some level. Copy the data columns of this report to your spreadsheet.
The table has a select column, and one must check it for each row entry for which further processing is required, which should be all people above a certain threshold of total segment length of matching DNA. I chose all people with total segment matching length of >23cM. That was close to 200 people, a tedious selection process. Save this page with selections made, so you don’t have to redo the selections if you want to re-run the analysis.
When all selections are made, click the Submit button near the top of the report page. The next page offers a choice of 2-D or 3-D chromosome browser. Select 2-D, then on the next page click on the word HERE. The result is a report of all shared segments in chromosome order.
Go through this report and enter the segment data (segment start position, end position, and length) into new columns in the spreadsheet. For easier readability, convert all segment locations to mbp units by dividing the displayed locations in base pairs by 1 million. Some on the list are duplicates or siblings with the same DNA. Remove these for a cleaner result.
A person with multiple chromosome segments will need a separate row for each segment; repeat rows as required to hold the data for each new shared segment. Remove any rows corresponding to segments less than 10 cM in length. (I allowed segments >8.5 cM if the same person had an adjacent segment >10cM, gaining a few more data points). Most segments smaller than 7 cM seem to be IBS matches, and they will confuse the process going forward; they must be culled.
Returning to 23andMe (or whatever DNA site used) to pick up the data for those relatives who had not uploaded their DNA to GEDmatch. On 23andMe, I matched their DNA individually with mine using the DNA tab, then copied the detailed matching segment data to the spreadsheet.
The spreadsheet has now expanded, with a row for each shared segment for each person. I ended up with 115 distinct relatives sharing with me 150 DNA segments spread over all 22 autosomes.
The next step is to identify and name the triangulated groups (TGs). TGs are assigned by rolling up overlapping segments on a chromosome and noting the least start location and greatest end location for the nested segments. Some judgement will be required regarding whether adjacent areas should be included, or excluded into their own TG.
I chose a numeric TG name format xxyyyzzz, where xx is the chromosome#, yyy and zzz are the segment start and end locations (mbp). This numeric format allows sorting rows by TG names, enabling our end game – organizing the table into mutually exclusive family groups, each TG corresponding to a common ancestor (CA), a set of g-, g-g- or g-g-g-grandparents whose segments are inherited.
Finally, one needs to collect all related TGs into a CA group. Find all relatives that share a TG and check which other TGs they share with anyone else. Then bring all people who share any of those segments into the CA group and repeat, until there are no more connections to follow. My 115 triangulated TGs comprised 10 identified, mutually-exclusive CA groupings, with a few uncategorized TGs left over. Each CA grouping is independent of the others (i.e. comparing people in different CA groupings revealed no IBD ancestry links).
My spreadsheets, after sorting and colorizing bt CA groupings, appear as follows:
where the column headers are:
Kit#, Sex, Generations, TG, Start, End, cM, Name
TG, Start, End, and Length (cM), are derived columns added to the original columns of the GEDmatch One-to-many report. A given TG will span all the start/end segment boundaries nested within it.
In my case, I had determined three CAs by direct match prior to triangulation. Two are common to the first large buff-colored CA group from my father’s side, one with my grandparents as CA, the other with my g-g-grandparents in the same lineage as CA. The remaining relative became associated with the sixth crimson-colored CA group, with g-grandparents as the CA from my mother’s side. Now I know with reasonable certainty that all the other people listed in those TGs are associated with these same or related CAs.
For the other eight CA groups whose CAs are not yet identified, the plan is to contact each person, tell them what I know, point them to my extensive tree with most all ancestors identified back to g-g-g-grandparents, and see if any of these relatives have a paper tree sufficient in depth to identify our common ancestor.
A cell with non-paired chromosomes, such as a gamete, is called haploid. Cells with paired chromosomes are called diploid. Each chromosome pair are homologs (from homologous), identical in structure but different in content.
The 23rd pair of human chromosomes are called allosomes, aka X and Y sex chromosomes, and largely do not recombine during meiosis. Also, the cell mitochondria contains non-recombinant DNA.
An entire branch of genetic genealogy deals with such non-recombinant segments of our genomes, our Y-DNA and mt-DNA, where Y is the formal name of the male allosome, and ‘mt’ stands for mitochondria, which contains non-recombinant DNA present in all cells, and passed from a female to all her children via her ova.