Archaic Hominin DNA Admixture with Human DNA

Abstract

A paper published in Science on 7 May 2010 reports the 60% sequencing of a reference Neandertal genome. A paper published in Science on 31 August 2012 reports complete sequencing of the genome of a Denisova hominin.

By comparing the archaic DNA to modern humans from varying locations, both these papers report archaic signatures in current people, indicating DNA transfer from these archaic hominins to modern humans in the distant past. The picture is somewhat clouded because in the most recent paper, the historical population analysis based on nuclear DNA is not consistent with that based on mitochondrial DNA. In the former paper, mitochondrial DNA showed no evidence of Neandertal transfer. This and other X-chromosome observations could be interpreted as showing it was archaic males that were passing their DNA to modern females.

The papers further characterize the DNA exchanges. Melanesian/Papuan and Australian aboriginals have ~6% of their nuclear DNA in common with Denisovans. All humans outside of Africa have ~1-4% of their nuclear DNA in common with Neandertal. Thus Melanesians have over 7% of their nuclear DNA in common with archaic hominins. There is a statistical oddity in the Neandertal admixture quantification, for Asian individuals seem to have more admixture than Europeans, a result not in line with Neandertal remains having been found only in Eurasia. The newness of the science and the lack of testable material means we are not yet ready to answer if Neandertal and Denisovan interbred, but it seems likely considering their long period of overlap in Eurasia.

Aside: Statistical models can only give rough estimates of average individual retention of archaic DNA. Actual genetic studies across the population genome may reveal that different segments of the archaic genome may be retained in different individuals, so that the total genome retention of overlapping archaic signatures may sum to more than the current statistical admixture estimates.

When did these transfers occur? While the older study’s primary explanation of DNA exchange with Neandertal suggests gene sharing happened in the last 100ky outside of Africa, one finds little in the study to prefer this option over the study’s conservative secondary hypothesis that the sharing happened in NE Africa over the last 250ky, but is not observed in sub-Saharan Africa today due to a population substructure affecting gene flow in Africa over the last quarter million years. A more detailed study of modern humans in North and East Africa can perhaps decide which explanation is the more likely.

The newer study clearly shows DNA exchange with Denisovans and modern humans outside of Africa, probably within the last 50ky. Genetic population studies hypothesize that Y-DNA haplogroup C (and D?) comprised the earliest sustained human migration wave out of Africa into southeast Asia. Based on the geography and statistical timing of the Denisovan admixture observation, one could hypothesize that some of these Y-DNA haplogroup C/D populations would have experienced the admixture when passing through the subcontinent on its way to SE Asia 50ka.

The broad scientific benefits of these remarkable observations and technology include:

  • a more precise timeline for genus homo through the last million years,
  • a clear picture of the changes in the genetic code that have established themselves in humans since their split with archaic forms,
  • in a broader view, illuminating insight into human developmental advances based on these genetic differences

Historical Context

Dating based on these genome examples points to a similar timeline for Denisovans and Neandertal, indicating they may be sibling species that each left Africa from 300-500ka, then further diverged due to drift. The Denisovans occupied central and southeast Asia, while the cold-adapted Neandertal occupied Eurasia.

  • NE Africa, ~1mya: The most recent common ancestor (MRCA) of modern humans and Denisovans lived. His descendents forked into separate clades leading to Denisovans and the human/Neandertal ancestral lineage.
  • NE Africa, ~825kya: The most recent common ancestor (MRCA) of modern humans and Neandertal lived. His descendents forked into separate clades, leading to Neandertals and modern humans. Note this date is based on an assumption that the MRCA of the chimpanzee and modern humans lived 6.5mya. If our MRCA with chimps was earlier than that, this MRCA date becomes proportionately earlier as well.
  • NE Africa, ~500kya: A Denisovan branch of genus homo slides on over to colonize southeast Asia, probably leaving behind some kin in NE Africa.
  • NE Africa, ~355kya: A Neandertal branch of genus homo slides on over to colonize Eurasia, probably leaving behind some kin in NE Africa. Speciation probably occurred after this time, creating homo neandertalensis. Archaic humans in the species homo sapiens continue development in Africa, still possibly alongside Neandertal and Denisovan-branch descendants. Gene flow is to be expected, via which DNA is shared between sister populations remaining in Africa.
  • NE Africa, ~100kya: the first proto-modern humans begin excursions into the Middle East from Africa, where they encounter their sister species from whom they’ve been separated for around a quarter million years. Whether these early explorers’ descendants returned to Africa, or simply perished, is not yet known, but there is a paleological evidence gap of some 30ky when no proto-modern-human remains are apparent outside of Africa.
  • NE Africa, ~70kya: The first modern human population leaves Africa permanently and goes on to populate all corners of the Earth beyond Africa. They live among and compete with the Neandertal in Eurasia and Denisovans in Asia for over 20ky until the archaic hominins go extinct.

Amazing Technology

Tremendous feats of science and engineering have occurred, enabling accurate gleaning of billions of nucleotide base pairs from tiny samples of highly (95%) contaminated bone powder from the late-Pleistocene remains of four archaic females, three Neandertal and one Denisovan girl. Part of the magic of the articles is learning some of the technical problems that were overcome, and speculating about the future knowledge that is just around the corner now. For in the near future, the technology advances described here will enable further testing of archaic bones, expected soon to support a much more specific map of genetic diversity in genus homo over the last 200K years.

The Neandertal paper describes sequencing 60% of the genome from 400 milligrams of bone, a composite from three females. The Denisovan paper indicates that the technology has much improved in just two years, enabling complete genome analysis from only 10 milligrams of much older bone material. A main problem with the lower resolution Neandertal analysis was that the archaic DNA would break down into single strands of nucleotides. The more recent research developed a sequencing method that would work with just single strands of DNA. The new methods are now being applied to Neandertal samples as well, to bring that prior research up to par with the Denisovan results.

The challenges presented by archaic DNA are formidable. Only small segments survive intact, on average less than 200 base pairs per fragment. This creates a jigsaw puzzle of monstrous dimensions. Most of the Neandertal fragments (95% to 99%) discovered are not endogenous to the organism being studied, but of invasive microbial origin. The Denisovan DNA is reported to be considerably less contaminated from both microbial and human sources, contributing to the completeness of its decoding. Further, enzymatic removal of Uracil residues improved accuracy by an order of magnitude beyond the earlier Neandertal sequencing technology.

Non-endogenous DNA fragments have to be filtered out of the analysis using restriction enzymes designed to preferentially cut DNA from microbial sources to enrich the endogenous DNA ratio by 5-fold (at the cost of some organism DNA that limits the ability to obtain total organism DNA coverage). Resulting sequences are compared against known primate model genomes, to identify those remaining sequence fragments likely belonging to the organism under study. In this study, the genomes of modern humans and other apes were used to filter and normalize the sequencing outcome. Contamination by human DNA, virtually identical to the DNA being sequenced, has been detected during the trial sequencing at the main laboratories; over 10% contamination has been detected during the Neandertal study. This has been cut to less than 1% by the time of the recent Denisovan study. To control for residual human contamination, the fragments are prepared in a clean environment and then tagged with special DNA codes to identify them as part of the sample before being sent out from the lab for sequencing.

Archaic DNA has further chemical decomposition problems, particularly at the ends of nucleic acid strands, where C-T false transitions and G-A false transitions can occur at rates up to 40%. To compensate, sequences are aligned with the model genome sequences and their terminal base substitutions corrected.

In the end, all the analysis is for naught if the error of the result cannot be estimated. The cleverness of the analysts did not fail here either, and independent estimates of error rate were devised that each estimated the cumulative error to be below 1%, rendering reliable the Neandertal genome thus far sequenced.

Neandertal Study Objectives

Three results are sought.

  • Dates are estimated for the branching point between the Neandertal and modern human genomes, and for the subsequent complete population separation of the genomes (speciation).
  • The study looks at gene flow and estimates how much of them is in us (1% to 4%) and of us is in them (0% in the small sample used).
  • The study looks for specific variations in genes between us and them.

Genetic and Population Divergence: Us Vs. Them

Two separate genomes differ by a set of accumulated base pair transitions (mutations). The count of base pair transitions between a subject genome and a reference genome provides a genetic divergence metric that has a direct time correlation, the time since common ancestor.

Relatedness of genomes can seem a little counter-intuitive because there are two different concepts of relatedness. The measure of relatedness discussed above tells how long two genomes have been diverging (time of MRCA genome). This metric is a simple function of difference in base pairs, proportional to time depth. Alternatively, our intuitive concept of genetic relatedness, the homogeneous external characteristics of a population, is determined by recent genetic differentiation via the SNPs that define our clades. Thus, intuitively Africans seem more different from Europeans than from each other. But the genetic difference among Africans themselves is considerably greater than genetic difference between Africans and Europeans, based on base pair transitions. In other words, Africans have been diverging from each other for much longer than non-Africans have been diverging from Africans.

The study establishes a hypothesized reference genome of the MRCA of chimpanzees and us, using it as a base against which a divergence metric can be established. The reference individual is hypothesized to have lived 6.5mya. The study did not state a confidence in this time frame, but so far as I know it has an uncertainty of perhaps -10%, +20%. Thus, all derived dates will be equally ‘fuzzy’. A further assumption underlying the divergence metric is that the rate of base pair transitions over time is constant on all lineages descending from the reference.

To compute the divergence, the study set up a three-way comparison between Neandertal, human, and chimpanzee genomes, calculating the unique base pair differences present in each lineage for a small, carefully aligned common segment of the genomes. The count of unique human changes with respect to Neandertal represents, in terms of base pair transitions, the time since the divergence of humans and Neandertal. The base pair change representation of the entire time frame back to the reference genome is the genetic distance between the human and chimpanzee genomes. Forming the ratio of these two base pair difference quantities, then multiplying by the reference genome age, gives the age of the Neandertal-human divergence. The study concluded this ratio is 0.127 with a 95% confidence. 12.7% of 6.5my is 825ky, the age attributed to divergence of the human and Neandertal clades. This date is older than for the divergence of any known human genotypes from each other. Five human genomes were tested for comparison. They show divergence from each other in the range 8.2% to 10.2% of the reference genome time frame.

After Neandertal clade genetic divergence, there was a period of time that both ancestral and derived haplotypes remained part of the same ancestral population. Then came a population divergence point, the last time the ancestral and derived clades were able to exchange genes. The study estimates the separation of the Neandertal Eurasion population from the ancestral population in Africa to have been 355kya±85ky. This date range lies mostly within the date estimates derived from paleontological and archaeological study.

Gene Flow between Us and Them

For over a century, the great question for the genus homo has been whether modern humans exchanged genes with Eurasian Neandertal. The study attempts to answer this question, and does so, although raising further questions in the process. The study estimates 1-4% of our DNA has Neandertal origin. However, this number seems dependent on when the purported gene flow took place. But since that is only a hypothesis in the study, both the time frame and amount of flow are linked, and to open questions.

Because the two genomes are siblings and thus have time depth in common (~500ky), it may be expected that Neandertal will appear as closely related to modern humans as they are to each other in terms of base pair differences. One study difficulty with the closeness of the genotypes has already been mentioned: prevention of sequencing confusion caused by contamination of the archaic DNA by modern human DNA required special clean room tagging.

As a further difficulty, closeness of the genotypes predicts the gene flow hypothesis likely will be represented by a weak signal in the data. To rise above the null hypothesis, the data must exhibit resemblances between Neandertal and human DNA across multiple independent DNA regions. Further, these resemblances should be found in some regions of the world, but not others. Two independent assessments of the probability of gene flow were made and deemed compatible in their indications:

  • The study looks for DNA segments with low divergence from the Neandertal state and high divergence from other humans not inheriting Neandertal DNA. Such evidence is distinguishable from DNA differences not related to gene flow, which would show low divergence in both cases above. Only haploid DNA sequences were used in this search for a genetic flow signal, so that only inheritance from one parent would still register a signal. This is possible within the reference genome, which is determined to be an individual who is half European and half sub-Saharan African. Separating out the African and European DNA produces the haploid DNA desired.
  • As a cross check, a search detected 12 regions of current human DNA that exhibit high divergence between sub-Saharan Africans and genomes with Eurasian ancestry. Ten of 12 identified regions show low divergence from Neandertal state, a signal for gene flow from Neandertal to human.

The study proposes incontestable evidence of one-way gene flow, from Neandertal to human, for all humans except those whose ancestors never left Africa. Two options are left open for when this flow occurred.

  • Primary (expansion bottleneck) conclusion: Gene flow happened when proto-modern humans left Africa between 80-50kya, exchanged genes from Eurasian Neandertal soon after leaving Africa, then went on to populate the non-African world.
  • Alternative conclusion: After the divergence of the human and Neandertal populations some 355kya, some Neandertal-related archaic hominins remained in Africa and continued to exchange genes with the proto-human haplotype. It was such an admixture that was the source of population expansion(s) out of Africa. Further, this admixture remained geographically confined within Africa, so it has not yet been detected.

On the surface, there appear difficulties with the primary hypothesis that seem to render it less parsimonious.

  • The primary hypothesis predicts gene flow would have occurred between 50kya and 80kya in the Middle East. This encompasses a more than a 20ky gap for which there is little or no paleontological evidence for the existence of proto-modern humans outside of Africa.
  • It seems reasonable the population expansion out of Africa would have happened in multiple waves over 10-15ky. For instance, the peoples who migrated toward India and SE Asia could be different from and perhaps older than that which migrated throughout the Middle East and Europe and came later to Asia. The proposed explanation would have each of these waves accepting a homogeneous admixture of Neandertal DNA, increasing the speculative nature of this reasoning.
  • To achieve the observed, fully homogeneous result among all non-African populations via the purported outside-of-Africa event, the genetic admixture would necessarily have to fit in a small time window perhaps associated with a single bottleneck involving a very small population, which then gave rise to all subsequent non-African human populations. This is perhaps too tidy an explanation. Natural events are likely to assume a more ragged topology.
  • The lack of detected mtDNA overlap between Neandertal and humans, together with the absence of human autosomal material in the sampled Eurasian Neandertal DNA, seem a further damper on the proposed explanation that Eurasian Neandertal are the source of the Neandertal DNA expressed in modern non-African populations.

Perhaps the study could have done more to anticipate such questions and then to answer them preemptively through further analysis. It could have bolstered the contention that this was an entirely uni-directional, Neandertal M -> Human F gene flow, by citing other population expansion studies that confirm this is characteristic rather than unusual. Also,  further study of more current human DNA from NE Africa might have been enlightening.

Unfortunately, definitive answers based on archaeo-evidence is beyond the reach of the study. Even if samples were available, DNA sequencing was not advanced enough to permit studying proto-modern human DNA from NE Africa (looking for ancestral Neandertal DNA), or Neandertal DNA from near the Arabian Peninsula dated to 60ka (looking for localized gene flow from humans in the putative admixture region).

The primary study hypothesis does have some indirect support. For the predicted expansion bottleneck within a single exodus migration event ~60-70ka may have occurred. Based on Y chromosomal and mitochondrial population analysis, such a bottleneck scenario from ~60ka is increasingly viewed as a possibility in the area of Yemen. But until archaeo-evidence and enhanced DNA sequencing technologies become available, this remains a best guess. Participants of this first and only migration wave may have resided along the southern coastal zone of the Arabian peninsula for some time, where they interbred with Neandertal residing there. Ultimately, climate permitting, the modern humans continued their migrations into Eurasia and beyond. But the local Neandertal population perished, leaving as their only trace the evidence of slight gene flow from Neandertal to human nuclear DNA.

The secondary hypothesis also seems worth pursuing, for it is hard to justify believing all sibling species to sapiens have been absent from Africa for over 100ky. Some intriguing results in support of the secondary hypothesis are beginning to materialize. An article in July 2012 Cell found archaic (non-Neandertal) introgression into some Tanzanian tribal DNA and also into Pygmies. But again, physical archaeo-evidence of more recent archaic forms in Africa is currently unknown.

Detailed Genetic Difference of Humans and Neandertal

The study found six gene changes that affect the protein-coding capacity of genes, for which the Neandertal is ancestral to the human genome. Apparently, relatively few amino acid changes have become fixed in the last few hundred thousand years of human evolution.

  • SPAG17 encodes a protein important for the beating of the sperm flagellum.
  • PCD16 encodes for a cell-cell adhesion molecule that may be involved in wound healing.
  • TTF1 regulates ribosomal gene transcription.
  • CAN15 encodes a protein of currently unknown function.
  • RPTN encodes repetin, an extracellular epidermal matrix expressed in the epidermis and at high levels in eccrine sweat glands, the inner sheaths of hair roots, and the filiform papilli of the tongue.
  • TRPM1 affects gene that encodes melastatin, an ion channel important for maintaining melanocyte pigmentation in the skin.

The study found it intriguing that skin-expressed genes comprise three out of six gene changes compiled above, suggesting “selection on skin morphology and physiology may have changed on the hominin lineage”.

Other gene regulatory changes were noted in microRNAs, likely to have altered target specificity. The study noted several other types of genetic change between Neandertal and human genomes whose significance is not yet clear.

The study looked at human accelerated regions of the genome (HARs), those regions that remained fixed for most all of vertebrate evolution, but then began to change frequently after the split of humans from chimpanzees. Neandertal represent the derived state for over 90% of the HARs, showing that most acceleration happened before the human-Neandertal split, though 45 HARs invite further study, where only humans show the derived state.

Of special interest to the study was detection of positive selection for SNPs for which humans are derived but Neandertal is ancestral. Such cases are termed selective sweeps because a new SNP that is selection-favored will increase in frequency rapidly in the population and become fixed (other alleles disappear).

Direct knowledge of the functions of the DNA regions where sweeps are suspected is beyond our means currently. But such functions can be inferred by studying human diseases that are associated with changes in or near these regions. Some of the sweep regions are thus inferred to be associated with human cognitive ability, energy metabolism, and morphology of the cranium and upper torso.

One sweep region is in the vicinity of the THADA gene. A defective THADA is associated with type II diabetes, inferring an association with this region and energy metabolism. Genes in several sweep regions are implicated in cognitive disabilities, including autism, Down syndrome and schizophrenia. Mutations to the RUNX2 gene, which is in a suspected sweep region, cause cleidocranial dysplasia, characterized by delayed closure of cranial sutures and more prominent frontal bone, hypoplastic or aplastic clavicles, a bell-shaped rib cage, and dental abnormalities. These are the same morphologies underlying the most apparent physiological differences between humans and Neandertal.

Detailed Genetic Difference of Humans and Denisovans

The latest study found 23 highly conserved DNA regions showing significant differences between humans and Denisovans. These differences suggest the modern human brain has significantly evolved beyond the archaic forms in its ability to make new connections, advancing our brain’s network of functional linkages.  For example, differences were found in genes SLITRK1, KATNA1, ARHGAP32, and HTR2B, all active in nerve growth and function. Modified genes ADSL, CBTNAP2 and CNTNAP2 have been linked to language disorders involving our ability to see things from another person’s viewpoint and to consciously mediate when to conceal things (white lies).

The similarity of Denisovan and human DNA, combined with the completeness of the Denisovan sequencing, enabled researchers to determine the young female Denisovan lived as long as 80ka and had dark skin, brown hair, and brown eyes.

Comments Welcome