23andMe offered to test nearly 600K SNPs on their V4 chip for around $90/person. When researching them, they were offering both genealogy and health testing analysis, with emphasis on the latter. In this respect, they were in the vanguard of private direct-to-consumer (DTC) genetic testing and analytics companies involved in health research. I perceived them to be reputable and to provide a product with reasonable cost/benefit. This was just before the FDA ordered them to cease marketing/producing their health analytics product (see below).
23andMe obtains and associates genotype and phenotype from its customers. It gets a DNA sample from the client, and via its web site, asks the client to identify personal traits through answers to a long list of questions. One set of such data associations does not provide useful information. Only by analyzing thousands of such pairs can they begin to identify meaningful association patterns.
Ordering, Waiting, First Blush Results
After we ordered the testing and sent in our samples, the company advised that the FDA was shutting down the 23andMe health analytics product, leaving only their genealogy results in the final deliverable. That was disappointing, apparently a new chapter in the same old story. The AMA (major lobbyist to FDA) wants to keep health analytics on a prescription only basis. Their reasons are touted as consumer safety concerns, but one suspects that money and related turf protection is yet again the root of such overreach.
We were offered our money back, but declined. I discovered that 23andMe would still give us our raw data as a download, and that there were third party packages that could provide health analytics on this data. Our game was still on. Take that, FDA and AMA.
It took five weeks to get the raw testing results back for Debby, and nearly another two weeks to get mine (since they were mailed on the same day, it seems possible mine required some re-processing as part of QA).
23andMe is not positioning themselves as a major player in ancient (prehistoric) ancestry research. Their non-recombinant genealogy tests, upon which such research depends, are very basic (15 years behind by today’s standard). In my paternal line, they provide no historical resolution, nor any data relevant to the last 15K years. Further investment in genetic genealogy likely will not happen at 23andMe, for human ancestry analytics likely does not represent a core business interest, but rather a marketing tool. Thus, the FDA decision may deliver a bigger blow to the near term 23andMe business plan than one might suspect.
Since I already have much more deep ancestry resolution than their Y-chromosome and mtDNA genealogy testing provides, that portion of the testing has only vague corroborative utility for me. I had never before been explicitly tested for Y-DNA SNPs, but have inferred a detailed SNP lineage via another type of DNA marker. Debby had never been tested, however, so she would learn some basic facts about her maternal ancestry (V7a).
In the broader picture, the real and unique value of ancestry testing at 23andMe is the autosomal (recombinant) DNA matching service, which puts clients in touch with one another based on identified shared DNA segments. This is invaluable for extending one’s research beyond paper trails, and for validating paper trails that have been established. It takes us orders of magnitude beyond the discoveries possible from matching others in the non-recombinant DNA databases.
How do autosomal (recombinant) and non-recombinant genealogy differ? Functionally, they address different time frames. Autosomal genealogy can address about the same period as does paper (historical) genealogy, perhaps the last 500 years. Non-recombinant DNA covers earliest history and all pre-history.
That this is so can be deduced from the differing DNA processes. Non-recombinant DNA can be recognized by its unique type over many millennia. It is unchanged over generations except for a few random mutations. Autosomal DNA gets sliced and diced at every generation, so that after a few generations, there no longer remain distinct segments long enough to be identifiable as any specific type. Understanding that the bulk of human DNA is basically the same for everyone, one needs a critical mass of unique information in a segment to characterize it as belonging to some known type.
23andMe’s ancestry categories, identified by autosomal testing, are expressed as geographical regions containing significant percent of matching DNA profiles. For Europe, the designations are Europe-wide, then North, South, and East regional distinction, then specific sub-regions like the British Isles, Germany and France, Scandinavia, and a non-specific catch-all. There are three levels of confidence that can be requested, conservative, standard, and speculative. The difference is in how hard each level tries to make sense of any ambiguous segments.
Through the speculative filter, I am 99.9% European and 94.5% Northern European. Debby was similarly 99.7% European, but her sub-classification was 96.7% Ashkenazi. When it says we are X% this category or that, it means that X% of our ancestors were likely living in that place in the year 1500CE (before global travel was readily accessible to the commoner).
Ambiguity will likely arise in some areas. Two tiny fragments of my DNA resisted easy classification, but speculation resolved one to Native American and the other to North African, together contributing less than one thousandth of my DNA. There may have been some problems reading/interpreting these small DNA areas that accounts for some of the confusion.
In addition to geographical origins, the user is shown a list of relatives, other people who tested as having some measurable identifiable shared DNA. In the first day after my results were made available to me, I sent messages to my 25 closest identified genetic matches from among 23andMe customers.
I received a reply the next day from a 3rd cousin I had not identified before. I was able to provide him much detailed information regarding our common G-G-grandparents that he did not have. I made another contact to my closest match, a second cousin on my mother’s side. These two contacts genetically validated the paper trail for a significant part of my near ancestry, my father’s lineage and my mother’s father’s lineage, both back to nearly 1800. I now know my mother’s father’s haplogroup. I am who I think I am for this portion of my ancestry. This is genealogy nirvana.
I have since added a few more contacts to my original 25, relatives who have listed a surname matching one of my ancestral surnames. Only three contacts to date have seemed interested in discussing ancestry, a 10% success rate. This is sad, but not unexpected, since most of the early adopters at 23andme were seeking health data; genealogy was not their passion, and in the minds of many seems to open one to unnecessary privacy invasions.
There are practical limits to the efficacy of the relative finder facility. The period shortly after 1800 becomes problematic for most of us tracking ancestors in the USA, because the great westward expansion had begun, invariably disrupting any corroborating paper trail. Yet the current relative finder process loses utility at this time. The relative-finder algorithm doesn’t identify shared DNA segments much prior to 1800, since the autosomal evidence for such distant shared ancestry is too weak. Also, virtually none of the customers whose primary interest is health data have researched their ancestor surnames back that far, since it takes a real genealogy motivation to do so. Without names, learning how two relatives are related is an impossibility.
Shared DNA identification seems such a useful tool, 23andMe should attempt to promote it more, educating customers to its potential. Perhaps offer rewards for people who provide surnames for their known ancestry, as well as geographical details. Some of us have done years of research and it’s all available to others if they will just talk to us when we contact them. Advocacy and new contributing methods may be needed to realize the potential of this service.
Since I have spent some time with the site, I have more expectations of genealogy successes here. There is a lot of data, and some useful ways for accessing it. For example, under:
- Family and Friends : DNA Relatives :
- be notified of relatives and their expected relationship
- correspond with relatives and share genome data
- search DNA relatives by ancestry surnames or haplogroup
- My Results : Ancestry Overview : Ancestry Tools :
- Countries of Ancestry – shows how percentages of ones ancestry are geographically distributed. Under Advanced Controls, a slider allows one to select the maximum DNA window to compare, in centiMorgans., where 1 cM ~ 10^7 base pairs. By moving from small to large DNA segment selections, one in effect shows the ancestral location changing from distant time to near time. For example, for DNA window > 10 cM, my top three countries are UK, United States, Ireland. Choosing the minimum DNA window of 5 cM, my top three countries are UK, Germany, Ireland. This is expected, because my Y-DNA haplogroup was centered in Germany around 50 generations ago. My historical ancestry was divided between UK and USA 5 generations ago. I interpret this as a migration from Germany through England to the US over the last 50 generations.
- Family Inheritance: Advanced – allows one to compare ones DNA with up to five other friends or relatives with whom one has a genome-sharing relationship. A chart shows all the shared segments and their lengths in cMs. One can view a chart of the detailed segment locations, and download the comparison data as CSV data as well.
Global Relative-Finder and Phased Genomes
23andMe was a pioneer in ancestry genomic analysis, but now there are several competitors. Staying within a single vendor’s database will be limiting going forward.
Enter GEDmatch, a site for pulling all relatives together. One just uploads one’s raw data to the GEDmatch database. Then one can query genomes from all the various genetic genealogy vendors. Further, GEDmatch offers state-of-art tools for analyzing genetic relations based on shared genetic segments. And by uploading GEDcom data, complete lists of ancestors can accompany ones genome, enabling identification by name of hitherto unknown cousins who share segments of our genome.
Finding real relatives by genomic segment matching is a difficult proposition if just one’s own sample is the comparison base. To the rescue, GEDmatch supports phased kits as basis of comparison. If one’s parent’s genomes are also available, then it can be known which ancestral genetic segments one did not inherit. A phased kit thus has more ancestral data to work with, narrowing choices during analysis. (Full understanding of phased kits is still beyond my pay grade. If you know of a good description of the process, please leave a comment. Thanks.)
Having no reason to dwell on the now vacuous health portion of the 23andMe site, I grabbed our raw data files and headed to the Internet. This is now a DIY project.
I first decided to try my hand at analyzing Debby’s DNA for specific genes, namely BRCA1 and BRCA2. I found on the Internet a description of SNPs within these genes that were implicated in cancer. I discovered that 23andMe had tested about 70% of these, and for these tested SNPs, Debby possessed normal alleles. That was an encouraging result. But clearly such a manual process was not going to get me far.
Promethease apparently goes through the user-supplied raw data file of tested SNP alleles and looks up each SNP in SNPedia (keying by rsid), processing the information found there for that allele, sorting it by positive/negative/neutral impact, assigning to it the SNPedia-determined importance factor, and noting how many publications reference it. A link to the SNPedia entry is returned to facilitate user access to the relevant literature.
I ran the program against our raw data. In Debby’s negative issues report segment, there were no entries referencing BRCA, confirming my initial manual observation that she was ‘normal’ for these genes. In each of our reports, there were just a few interesting findings. The remainder were a large number of SNPs with small (importance 2/10 or less) statistical evidence of an association with some trait, each likely having virtually zero predictive value by itself, and with no way to establish functional correlations among them.
The means of integrating all these minor statistical suggestions into meaningful health hypotheses remains far from our current grasp. Meanwhile, we can avail ourselves of the referenced documentation behind each study finding and see the directions of current research.
Technical Aside: Making Sense of Sense
When manually comparing SNP alleles as above, one must be aware of the sense (+, -) of the tested base. DNA occurs in two mirror-matched strands, the positive (aka sense) mRNA-like strand, and the negative (aka anti-sense) mRNA transcription strand. An allele may be derived from either strand during the test, so the strand sense must be known as well as the allele base.
All 23andMe raw allele data is conveniently expressed relative to the positive strand. But comparison data from SNPedia can be relative to either strand. Therefore, one needs to check the reported SNPedia sense and if minus, convert the base to its mirror opposite (C-G or A-T) when comparing to a raw 23andMe allele.
Where’s the Meat?
Customers may wonder what to make of their results. I have discussed and tempered expectations in my DNA testing overview article, but one still asks ‘Where’s the meat?’
Part of the meat is the metric of actionable information quality of each report item. SNPedia authors use a magnitude scale of 1-10 to rank informational quality. For example, at the top of the scale, BRCA1 and BRCA2 alleles of bad repute are assigned a 10, the level of the most important findings with largest potential impact on health outcomes. In our common experience, informational quality 4 is the most important finding for each of us. We each have a couple of these, and have brought them to the attention of our physicians just as FYI.
Other meaty results provide information on one’s genetic tolerance for and sensitivity to medications, interesting information that should be shared with one’s physician. In the case of predicted high toxicity, the information could be life-changing. Mostly, it is not definitive enough to be actionable. One still might want to opt for the best medication for the circumstance, even if there is indication of reduced efficacy for one’s genotype.
Beyond the alleles of highly-ranked importance, there is scant meat on the bones. It becomes an educational exercise, with some expertise in medical genetics research being required to get much out of it.
My Own Take-Aways
In spite of the lack of meat, this is exciting stuff. The first viewing of my results was a great step up on my lifetime quest for knowledge. I have no experience with the company’s results viewer, since it is no longer available to me. But my raw data, as processed by third party software together with SNPedia look-up, provided detailed and prioritized information to the level of current state of the art.
I approached this as a general quest for knowledge, and not from a potential medical intervention perspective. I realized from all I’ve discovered that medically-actionable information would most likely be scarce or non-existent in the current time frame. But much of the information peaked my interest. I love learning such things about myself and can’t wait to digest as much as possible. Finally, they are talking about me, the real me. This is who I am.
This sport is in its infancy. Possibly only ~100 of my SNPs had any ascribed importance ranked above 2. That’s out of a half million SNPs. We have so much to learn. Some of my SNPs predict diagnoses I have already received. Some showed extra risk for things I expect I will never experience. Others offered potential explanations for personality traits I had never considered as having a logical explanation. There are intriguing genetic correlations.
For me, living on the older side of town, my results are most interesting because they confirm rather than predict who I am. It is their explanatory nature that makes them most relevant to me. A younger person might have a different perspective. My life experience can corroborate some predictive elements inherent in my data, and thus perhaps provide additional data points by which to judge the general utility of genetic testing for health prediction.
My testing revealed two findings of importance rank 4/10, my most important findings in SNPedia’s grading scale. Both predict health status that is characteristic of my current health state. There were other findings of lesser import which also seem accurate in their predictive potential.
The great bulk of other associations of SNPs with various types of condition are decidedly not actionable, but many seem interesting and I will learn more about them. Nothing else in my report matches anything of my current health history, which remains overall pretty uneventful.
One of the arguments used against DTC genetic health testing is that family history is more relevant and actionable. My family medical history tells me the exact same story as one of the major findings, but this is the only story that history consistently tells. Thus family history, while a valid proxy for some related genetic conditions, fails to divulge anything close to a complete story. The mechanics of autosomal DNA reinforce this family history limitation. Each parent could be heterozygous for a deleterious allele, and I could be homozygous for that allele. Typically, this means whatever condition might result from that allele will be magnified as a personal risk factor for me.
There are lots of studies related to genotypes and reactions to medicines. Three well-researched associations applied directly to me, where two drugs were significantly more efficacious for my genotype and one was significantly less. I notified my cardiologist about the lessened response to one of my primary drugs, because it potentially leaves me open to bad events. We decided not to change course, since my time on the medication demonstrated useful efficacy for me, either by luck, or because the dosage permits some latitude.
My physicians had never mentioned to me a genetic variation in response to medicine. As far as current practice can distinguish, one dosage still fits all, even if the FDA notes the dangers on the packaging. They argue the data is complex and ambiguous, so it may be a while before protocols can be established. And since genetic information on most patients is unavailable, those protocols will be long in coming. Wide availability of genetic testing in utero likely will be necessary to jump-start personalized medicine.
The Future for 23andMe
I sense that 23andMe is now beginning to struggle with the research/commercial transition, evidenced by the recently obtained patent for a process for detecting DNA underpinnings of Parkinsons. This trial balloon may have difficulty withstanding challenge as patent rules are slowly being modernized.
It appears their DTC testing is partly marketing tool. 23andMe needs to generate sufficient funds and attract sufficient clients to keep them going at their primary task, collecting user-supplied health trait information to augment the client genotype information that supports their research. Once they have populated a sufficiently large database, their research, augmented by independent GWAS efforts, may aim to produce marketable, genetic-based health diagnostics and perhaps eventually genetic-based disease interventions. Or maybe they can just sell the raw information to big pharma. Large profit streams will potentially accrue from such products, services, and raw data.
23andMe reportedly is looking to regain FDA approval for its health analytics, perhaps with some modification to their marketing claims. Meanwhile, the FDA action may make their interim finances a shaky proposition. Fortunately, there are deep pockets behind their enterprise.
Even though they can no longer share health information explicitly with the client, they still collect client phenotypes, indicating their strategic goals remain in play. But the AMA, the dinosaur lurking in the shadows, may prove to be a bigger wrench than they anticipate. They may need stronger FDA lobbyists.
They are unlucky to be a start-up in a start-up industry that has raw edges, unanswered questions, yet unfulfilled promise, and a reactionary regulator. Rather than helping them shape the new industry into maturity, the regulator has chosen to shut them down. What’s behind such harsh judgement? Their mistake may have been failure to court the FDA/AMA from the earliest stages of business development. Now, seven years later, it is too late to avoid the current head-butting.
Improvement Suggestions (Aka Grousing)
Initially, before our data became available for presentation, the 23andMe web site was a mishmash of information representing different states of user data collection, sample data display, signup, registration, and support options. There should be process feedback to inform the user site experience, so that only information and user options relevant to one’s current state are presented. Our account home pages were still requesting us to register the kit even though it was registered before submission. The initial user site states could be: overview information and sales pitch; test drive; sample preparation and registration; test status (received, testing progress and expected completion date).
Their site became a slightly improved experience after our results passed final QA and became fully reflected on our site pages. A To-Do box was annoying when there are no to-do items (perhaps since removed?). It occupies prime real estate on the screen. The thinness of the information, absent health data, remained disappointing.
There is no tracking of sample status on their site other than the binary status: incomplete-complete. Just keep checking back, I guess. We submitted two samples together, and got an email when the first had completed processing. This was misleading, since the data was not yet through QA and available for presentation. That happened the following day, then it took another ten days before the other sample passed QA, and no email was sent. I finally sent them a status query. Then magically my results appeared the same day.
Amazingly, the easiest way I found for getting in touch with customer service requires leaving their site and performing a web search for 23andMe customer service. That’s yet another measure of how bad their site is, but they do not yet seem the least embarrassed about it.
One box on the site requests information from the client in the guise of quick questions, but offers no explanation of how much information is needed or what it will be used for. Abandoning one dialog after it had proceeded for several minutes (not my definition of quick), on a subsequent visit it seemed to revert to question 1 as if it had forgotten all that I had entered. Again, status feedback would help.
Ultimately, we early donors of genotype/phenotype information may hope to gain some reward from the companies we assist with our data. We are after all a crowd-funding source, supplying both fees and valuable data. To make us more willing to supply information, perhaps DTC companies such as 23andMe would consider giving us each a proportionally small piece of the company. Coupons for additional services, or a share of stock, would be a more meaningful reward than a fancy chart saying, for example, we share a heritage with 95% of other Europeans. We might even be tempted to pay a little more up front.
Additional testing capability would be welcomed, perhaps Y-chromosome SNPs. FTDNA has done extensive Y-chromosome mapping through their Big-Y tests. Perhaps it would be possible for 23andMe to offer a set of SNP tests specific to the customer’s Y-DNA haplogroup, based on SNPs discovered by Big-Y. Reward coupons could be used to obtain such results.
The Promethease software provider puts a button on their page that offers the best version of their app for an incentive fee of $2. Else, they warn, the user will receive a sabotaged version that increases the run processing time by two orders of magnitude. (Note to provider: Perhaps it’s more productive to put an unconditional Donate button on your site and then provide the user with the best experience you can offer.) The Mac version of the software is several iterations behind other platforms and the authors note no plan for more Mac updates. Further compromising the user experience, the final HTML report from Promethease has imbedded google ads.
Since Promethease otherwise does a workmanlike job, we use the software, remove the ads, ignore the snark, and donate nada, thus compensating ourselves for the insult.