Using DNA to Explore Ancestry – One Person’s Experiences

Family genealogists research their ancestors’ paper trail as far back as records can be found. We then turn to DNA, to add evidence to historical records, and to further extend the reach of historical records. DNA comes in two types, recombinant (aka autosomal), and non-recombinant (Y-chromosome and mitochondria).

The non-recombinant DNA is inherited essentially unchanged, the Y-DNA from father to son, the mtDNA from mother to children (but only daughters can pass it on). But non-recombinant DNA only identifies a tiny slice of our genetic heritage, so at best could identify only a very small percent of our current and past relatives. More on that later.

Autosomal DNA is mixed and matched in every generation, some genes from the mother and some genes from the father. It shuffles the deck, making us diverse, and through diversity, healthier and with greater promise for future beneficial traits. All our ancestors are identifiable through autosomal DNA.

Recombinant DNA provides identification of  personal family relations, right up to the current day. Companies like 23andMe provide this service. They can determine how closely two people are related by comparing their autosomal DNA. For each test client, the company provides a list of DNA relatives who have also tested DNA with them. These lists extend from distant (about 6th cousins) down to siblings.

With adequate family histories and ancestral paper research, two identified close relations can quickly see how they are connected. This is called triangulation, two current day people identifying a shared common ancestor. The two paths to the common ancestor are thus genetically validated as actual ‘blood lines’, documented in the DNA.

When a match is found for either type of DNA, perhaps the other party will have had more success in historical research, which then can be mated to our own. I have shared my research with others who have matched my DNA; so far, my research has been the more advanced. But early on, I have also received much paper genealogy from others who have been researching for generations. We all can make sizeable leaps by borrowing from others’ analyses. Research should be about sharing ideas and data. That’s what makes it fun.

The author has used both types of DNA comparison to advance his genealogy. Using autosomal DNA triangulation at 23andMe, both mother’s and father’s blood lines were validated back to a g-g-grandparent, and I met online some cousins of whom I was formerly unaware.

By comparing my Y-DNA STR markers with others registered in the Ysearch database, I found an exact match with someone sharing my surname. By contacting the match through a DNA website created specifically to study this surname via DNA, an exchange of paper trails revealed that both lines passed though Virginia in Revolutionary War times, providing twice the clues to assist us in breaking through our mutual roadblocks.

This concludes the discussion of the impact of DNA on the author’s genealogy studies in historical times. For those who share the author’s interest in deeper pre-historic genetic studies, please read on. But the discussion will get more detailed.

While autosomal DNA does support general population studies in distant times, specific deep genetic ancestry is more definitely traced through non-recombinant DNA. Beyond direct links to historical ancestors, non-recombinant DNA can indicate a broad history of one’s biological clans going back over millennia. Researchers study all public genetics databases containing such DNA, to draw conclusions about where we come from and how we got here from there.

We use DNA in personal ancestral research to identify the smallest clan with which we are unequivocally related. When we identify our ancestral clan, the population-wide research will have suggested where and when the clan originated. Thus, both personal and population-wide goals are advanced by testing non-recombinant DNA. Here, DNA stands in for a surname, since patronymic surnames likely were not yet used when these clans originated.

The basic idea is that by studying the distribution of current peoples’ DNA, we can learn something about our distribution long ago. The resulting hypotheses can then be tested as technology improves for deriving DNA from ancient remains. It is this European human pre-history read from DNA that is the further subject of our quest. Specifically, I can hypothesize how my male DNA (Y-DNA) type might have come to England. For background on the terminology used below, refer to the prior essay Genetics: Our Genetic Clanship.

Non-recombinant DNA divides us into clans based on paternity and maternity. Our paternal and maternal clans’ ancestral wanderings through prehistory can be mapped from deep in the upper paleolithic era, down to recent times. This is why we record our non-recombinant genetic test results (a standard list of one’s DNA markers) in public databases. This permits us to look for others that match us. It further allows researchers to see where people of given DNA types cluster now, in order to extrapolate backward in time to where these types originated. The more people who register their results publicly, the greater the accuracy of such backward projections.

There are two types of DNA marker from which one can create an identifying signature. My Y-DNA testing has focused on defining my signature as a haplotype, the signature of a characteristic grouping of DNA STR markers. DNA testing also provides a different type of signature called a haplogroup, based on one’s most downstream SNP marker. Historically, SNP haplogroup testing has been expensive and inefficient. Most people first do STR testing, which supports haplogroup inference without requiring explicit SNP test. But ultimately it is the haplogroup that biologically asserts our genetic relatedness; the haplotype is only a rough proxy for this genetic fingerprint.

I joined the National Geographic DNA program early after its inception and got a 12 marker result that clarified my male ancestry in broad strokes, Haplogroup I-M223. Wanting more specificity, I asked the FTDNA lab to extend my testing with its 25-STR test (fee-based). I subsequently sent a DNA sample to SMGF’s Relative Genetics lab for access to 16 additional STRs (no fee, but 5-generation paternal history documentation provided).

Currently, 111 STR markers are available for testing, but my 41 are sufficient to place me in the finest resolution haplotype currently visible to us. Aside: Unfortunately, SMGF has since sold my DNA and my family history details to a for-profit firm in 2012; bad SMGF! I now make sure these details are also available online with free access, so others will not need to pay for its use unless they so choose.

This is the scourge of most investigations that rely on public data, often created by our government through our taxes1. The data begins freely available to citizen-researchers, but then some for-profit outfit comes in and buys up the rights to access all the data, and then charges us to look at it. Of course, this is the way of all-mighty capitalism. The capitalist antidote is not to patronize the outfits that seek to establish private revenue streams from our public data usage. So far, LDS FamilySearch, and Wikitree, have resisted selling out their databases. Thank you profoundly. And I have the last laugh on the outfits that have stolen my personal ancestral research and placed it behind a paywall. This data has substantial errors that have since been corrected. So you can pay to get the bad data, or get the latest and greatest for free at Wikitree.

Let’s talk haplogroups (SNPs). The testing above has shown that I, and all my fathers before me for over 30k years, are haplogoup I. Further, we are on the M223 branch, named I-M223, and dated to the re-expansion of peoples into northern Europe after the LGM.

My current nested hierarchy of distinct inferred SNPs has reached 22 branching nodes, spanning over 97% of our history outside Africa. My most recent (farthest downstream) clade likely dates to ~1000BCE, leaving a thousand year gap between the date of this SNP and the start of historical times in northern Europe. Although further new and unique downstream SNPs are becoming less likely, hopefully one will appear some day to close the gap. There is hope, for other branches have found quite recent terminal SNPs well into historical times.

Many more than 22 SNPs have been recorded for my ancestral line, but only the current 22 identify unique clade bifurcations. The bulk of known SNPs on my line are each currently phylo-equivalent to one of these 22, meaning everyone in the current population tests the same for such equivalent SNPs.

As the size of the tested population grows, it is likely that some people will be found that test differently for some currently equivalent SNPs, and new clades will be defined to give my branch of descent additional fine structure. Of course it is possible to imagine that everyone living gets tested and some phylo-equivalent SNPs will remain equivalent. This would indicate that there are no living descendants of the people who first expressed the equivalent SNPs; those lines all went extinct.

For most of the past decade, the M223 SNP has defined my downstream node, characterizing a portion of Germanic European Y-DNA, peaking in north Germany at about 7%. Early on it was learned, from studying the STR databases, that the haplotypes corresponding to known M223 carriers possess several modalities, identified as peaks in the STR distribution, and suggesting that M223 node in the biological tree likely has considerable finer structure (sub-clades of M223). In earlier research, these modal peaks have been named, creating sub-population nicknames. Consistent with this observation, as expected, a long list of recent SNPs, downstream along my branch from M223, have since been discovered, providing a more rigorous cladistic overlay of these closely related STR sub-populations within M223.

SNP inference from STR haplotype is made possible by grouping peoples with similar haplotype, then comparing the SNPs they have explicitly tested. Online haplogroup calculators are supported by such correlated data. By observing a commonality of certain tested SNPs within a haplotype subgroup, one then can infer a common SNP defining that group as a biological subclade.

For example, my subgroup, defined by a modal haplotype, was nicknamed M223-Continental1, and was distinguished from its nearest neighbors by an H4 STR repeat count of 9. H4 is inferred to be marker for a mesolithic Frisian population, by a Y-DNA haplogroup I researcher, based on ~100 examples in the SMGF database.

We can now see from the public DNA databases and related research websites that H4=9 is also strongly associated with SNP Z78. It can thus be inferred with near certainty that I possess the Z78 mutation (am within the Z78 clade), and thus possess all the prior SNPs defining that limb of the genetic tree. Further, L1198, an additional SNP downstream from Z78, is inferred for people who have tested DYS533=11; its ancestral value is typically 12.

Given that I am Z78+, I would next want to know which further downstream SNPs might also be inferred. Looking at the STR haplotypes of those who have explicitly tested for L1198, Z190 and Z79, I am likely ancestral to Z190 and Z79; my modal haplotype is slightly different from theirs, particularly my DYS393=15. And although my DYS452=31 seems out of place in the current identified L1198+ population, that group’s modal haplotype is overall most similar to mine. My haplogroup inference thus becomes I2-L1198* with acceptable (to me) likelihood. Here the * indicates a paragroup that is derived for SNP L1198, but ancestral for any further known downstream SNPs (e.g. Z190, Z79). I might now test explicitly to confirm if I am L1198+. Or perhaps for currently less investment, I could add DYS533 to my list of STRs and see if I am 11 or 12.

There is a certain amount I would be willing to invest to achieve more clarity, so I wait and see if further research and data will sort out these few remaining SNPs and lower the cost of testing. That time arrived when an inexpensive test for SNP Z166 (phylo-equivalent to L1198) was offered. Curiosity prevailed and I tested. The result was Z166+, so my clade is now L1198* with certainty. My father’s father’s …. father from ~60 generations ago likely lived in Frisia. And I know this with just this single SNP test – all prior SNP knowledge was inferred from my original 41 STR markers.

The latest installment on this saga began with the availability of a $20 test for SNP Y17535. It is one of two SNPs identified downstream from L1198 whose clan members mostly match my STR haplotype, and the only one to date for which an individual affordable test is available. I requested this test even though I gave me less than 50-50 chance of being derived for SNP Y17535. And I was proven correct, testing ancestral to Y17535. I may yet ask the YSEQ organization to create a test for FGC3634 for another $21.

Failing that, one waits for further analysis and testing of more people to characterize the place and time associated with this sub-clan. It would likely tell where my male ancestors were at the beginning of the current era.

Most people tracing genetic lineage to Continental Europe before the current era are now widely dispersed around the world; not many can trace roots directly back. My own search of the SMGF data base, looking for 85% matches to my modal haplotype, finds only a handful in which specific European paternity has been documented. These are spread from Netherlands/Friesland through coastal Germany and into Denmark. There are a couple of outliers in the state of Hessen in Western Germany and another couple along the Baltic coast of Poland. These latter are likely descendants of Mennonite (Anabaptist) Pilgrims, named after the Frisian Menno Simons who migrated from Netherlands/Friesland to the Baltic coast during the 16th Century Reformation.

In another database, I find two surnames that report L1198+ and deep Frisian ancestry. One is in the NW German region of East Frisia, the other in the Netherlands province of Frisia. The two towns are less than 100km apart. My 41 marker signature is the same distance from each Frisian signature, differing at 4 markers, each one step different. While as-yet scant evidence, these findings suggest a likely candidate for the nexus of my specific M223/L1198 ‘tribal’ homeland, at the mouth of the Ems River separating Germany and Netherlands, location of the current German city of Emden.

Using an online TMRCA calculator with a presumptive STR mutation rate of .002, I received an estimate, with 95% probability, that I share a common ancestor with each of these people in the last 1500 years. Since my male line seems to have been from England, based on early family naming, these Frisian ancestors probably went to England sometime after 500CE. And we know that this was the beginning of the invasion of England by the continental Anglo Saxon Federation.

Of course, where men go, women go hand in hand. My maternal clan is mtDNA haplogroup U5b1c2. Like Y-DNA haplogroup I2, mtDNA haplogroup U5 appears to be a very old signature dating deep into Europe of the Upper Paleolithic. It seems fortuitous that both mtDNA and Y-DNA trace potentially to the earliest human presence in Europe ~40ka. One can perhaps imagine these distant M-F ancestors being neighbors in a small encampment above the Danube 37ka. This is pure fantasy. But being able to track movements of the M-F clans through prehistory in detail may eventually be possible, when more powerful genetic technologies are applied to sufficiently-preserved paleolithic European human remains.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s