The following abbreviated information is intended as minimal preparation for reading any of the pages under the page heading Genetics.
We begin, both in the evolutionary and reproductive sense, as a single cell of type eukaryote. This cell type we share with all macroscopic plant and animal life on earth, and with a lot of microscopic life as well. Only bacteria are fundamentally different (more primitive) at the cellular level.
Our eukaryotic cell structure first appeared on earth over 1.6 billion years ago. Bacteria are more than twice as old and remain largely as they were. Our eukaryote ancestors remained microscopic single-celled creatures for another billion years. Then 600 million years ago, the first multi-celled creatures evolved and we quickly went macroscopic.
A eukaryote has an outer cell membrane, a double layer of water-repellant lipid molecules amid numerous proteins. The outer membrane encloses cytoplasm, a viscous liquid with some small structures embedded. Whereas bacteria have only cytoplasm within their cell, in the center of a eukaryote resides a cell nucleus, encased in its own membrane and filled with chromatin, a dense substance.
Two macro-molecule types are found in cells: protein (polypeptide) and DNA/RNA (polynucleotide).
Protein is synthesized within the cell and can be used within a cell, or exported through the cell membrane to interact with other cells. Within a cell, proteins act either as enzymes, or as structural elements. Structural protein exists within cellular membranes or in the inter-cellular scaffolding. The protein structure of cells varies by cell type.
DNA is found in chromosomes, structures which resolve out of the chromatin residing in the nucleus of eukaryotes. DNA carries the genetic blueprint for the organism as a whole. DNA consists of genes and their control RNA is a related molecule that assists in reading out information encoded by gene segments within the DNA, facilitating protein synthesis.
Cytoplasm’s viscous liquid consists of high concentrations of proteins, fatty substances, carbohydrates, and salts. Floating within are specialized discrete components: ribosome and organelle. A ribosome is a combination of proteins and RNA in an orderly arrangement that chemically mechanizes the synthesis of protein out of amino acids. An organelle is a membrane-enclosed space containing high concentrations of enzymes, minute biochemical factories used principally for cell energy production and regulation.
Our adult bodies contain ~400*(10^12) eukaryotic cells of many different types. Our initial embryonic cells are totipotent stem cells. They have potential to transform into all types of cells. They continue to increase in number by repeated cell division (mitosis), eventually losing their totipotent potential, transitioning to a more specialized, pluripotent potential character, finally differentiating into the many unique types of tissue cells in the body.
During mitosis, the chromatin organizes into thread-like structures called chromosomes. In most human cells, there are 46 (23 pairs) of chromosomes in the nucleus. Such somatic cells are termed diploid, containing homologous (different source, same function) paired chromosomes. Another cell type, the human germ cell called a gamete, is haploid, containing just 23 chromosomes.
During mitosis, the centrosome organelle in the cytoplasm divides first, the resulting two organelles migrating to opposite poles of the cell, where radiating stellate fibers reach out and connect to the chromosomes in the nucleus, each pulling a single chromosome to its pole while the nucleus divides. Once polarization is completed, the cell with its two nuclei then divides itself. Each divided cell contains one each of the paired diploid chromosomes from the precursor cell.
Chromosomes consist of strands of DNA, which provide the genetic blueprint both for our life form in general, and for each distinct individual. Our DNA makes us all roughly the same, but continually inserts small differences.
Existing in all living things, DNA has a ladder type structure twisted into a helix, where the rails and rungs are a long string of sugars and phosphates. On each rung are a pair of nucleotides (bases), called a base pair (bp), chosen from only four possible bases, thymine, cytosine, adenine, guanine, abbreviated T, C, A, G. The DNA ladder can be viewed as two equivalent strands if split down the middle. The strands carry the same information because bases only pair A-T and C-G.
The order of the bases along the helix encodes all life information: the chemical composition of the organism, the processes to be used for building the organism, and the processes for regulating the life of the organism once built. Human DNA contains 3.1 billion bp, compared to single-celled life with typically a few million bp, nearly a thousand-fold less complex than ours. But we are not the most complex organism; some plants have hundreds of billions of bp, a hundred-fold more complex than our DNA. The longest human chromosome consists of 220 million bp.
Not all our DNA resides in the nucleus. Mitochondria, ancient invading strands of DNA (mtDNA), reside in the cytoplasm and generate energy for the cell. An exact copy (as well as nature can manage it) of mtDNA is passed from a female to all her children in the cytoplasm of her ova. The nuclear DNA together with the mitochondrial DNA of an organism comprise its genotype.
Meiosis, cell division of sexual germ cells prior to reproduction, involves replicating a diploid germ stem cell to create 4 pairs of homologous chromosomes, then dividing twice to form four gametes, sperm in males, ova in females. Fertilization is the reverse process, combining two gametes, sperm and ovum, to form a diploid cell, the fertilized ovum.
An autosome is a chromosome that exists in equal numbers in male and female. The human diploid cells contain 22 autosomal chromosome pairs and one pair of sex chromosomes. During the division steps of meiosis, autosomal pairs have various DNA segments mixed and matched, a DNA recombination process called chromosomal crossover. Crossover ensures the haploid set of chromosomes in each gamete are different from those of either father or mother.
Of a different nature are the two sex chromosomes: X (female), Y (male). They are paired as chromosomes in the nucleus, males X-Y, females X-X, but there is no true chromosomal crossover during meiosis, except for a small homologous pseudoautosomal region where X and Y can crossover between themselves, both X-X and X-Y. Although the X is inherited in females from both father and mother, one is randomly deactivated early in development so that both males and females have effectively one active X chromosome. Sex chromosomes in the haploid gametes, excepting the small pseudoautosomal region, match the sex chromosomes contributed by the father and mother. Thus the non-autosomal part of the Y chromosome is passed unchanged (as well as nature can manage it) from father to son.
A gene is a functional segment of DNA, from a few hundred bp long to over two million bp. Genes have differing functions, most not well understood. The best understood genes code for protein. Human DNA has only ~20,000 protein-coding genes, comprising ~30% of the genome; a roundworm has nearly as many. Gene DNA is copied (transcribed) within the nucleus into messenger RNA and transported to ribosome protein factories in the cytoplasm where the RNA ‘message’ (order of the bases, grouped in threes) is read (translated), informing the linking of amino acids that comprise the protein coded by the gene. Reading the three-letter words of RNA, the ribosomes interpret AAA as the code for lysine, AGA for arginine, and so forth. Mistakes in DNA or the transcribing/translating process cause the protein to be garbled, often resulting in harmful effect to the organism (disease/death).
Random variations in DNA can occur due to copy errors during germ cell reproduction. The old and new variants of such a modified DNA sequence at any locus are called alleles. Frequently, allele is used to mean a variant of a gene.
Not all gene bps are transcribed into RNA. A typical gene has ~8 disjoint bp sequences called exons, comprising ~1.5% of the gene’s bps; only the exons are typically transcribed. The other 28.5% of the DNA residing within known genes, called introns, are skipped during transcription (spliced out of the RNA). The transcription process can be varied, sometimes skipping exons or including introns, so that a gene is able to code for a variety of protein.
The remaining ~70% of inter-gene DNA was originally thought to be merely junk, ‘gene deserts’. It is just beginning to be understood that there may be thousands of genes lurking there that do not code for protein. They are transcribed, but their RNA serves other functions, perhaps a source of timing signals for when other genes are enabled (turned on) or disabled.
Each cell contains essentially the same DNA as all the rest. But each different type of cell may use different genes and different gene transcription to perform its function, as would be expected for widely different types such as liver, muscle, and brain cells. Each time a cell divides, its entire genome must be copied. The copies are not always faithful; mistakes creep in. External environmental effects such as radiation and cigarette smoking can increase the chance for error. One class of errors results in cancer; the cells reproduce endlessly with turn-off switch disabled.
~50% of the human genome has been introduced over the eons via DNA parasites that insert their own sequences in our DNA. Once introduced, they can make copies of themselves and insert these randomly across the genome. A small percentage of these parasitic sequences have been found useful and thus were preserved and fixed by natural selection. Others have neutral effect on the organism and so remain as clutter. (One ultimate use of DNA editing technology may be to remove the true clutter.)
Specific gene alleles correspond to many observable organism traits (phenotypes). In peas, red and white flowers correspond to alternate alleles of a single gene. Sexual (diploid) organisms have somatic body cells containing two copies of each gene, one allele from the male parent, one from the female. Each parent provides one half of their diploid gene complement in their gametes, containing only a single allele of each gene. If the inherited alleles (one from sperm, one from ovum) are different, the organism is heterozygous for that gene; organisms with identical alleles are homozygous.
The gene combinations in the somatic cells, after crossover, determine the accumulated inherited phenotype. An allele’s contribution to trait determination is either dominant (upper case) or recessive (lower case). For pea blossom color genes, the red allele is dominant, the white recessive. Whenever one of the alleles carried is dominant, the dominant trait is expressed. Only if both alleles are recessive is the recessive trait expressed.
When homozygous individuals with different alleles (AA x aa) breed in generation F0, the offspring are identical (Aa) in generation F1 (Law of Uniformity). When F1 heterozygous individuals breed (Aa x Aa), F2 contains equal amounts AA, Aa, aA, and aa. Thus in peas, F2 contains 25% white flowers (Law of Segregation). If two independent traits such as color and smoothness (wrinkled is recessive) are tracked, F2 will contain only 1/16 peas that are both white and wrinkled (Law of Independent Assortment). Such cross-breeding will exhibit recessive trait combinations in F2 that were not visible in either F1 or F0.
A single nucleotide polymorphism (SNP) is an allele formed by a change to a single nucleotide base, either by a substitution, insertion, or deletion. SNPs can be variable, perhaps switching from one allele back to the other repeatedly. Others though are unique, occurring only once or very rarely. These are sometimes called a unique event polymorphism (UEP). This is the type of SNP that is useful in defining groups of related populations. Persons inheriting an SNP are said to be in the derived state for that SNP. Persons not expressing an SNP are said to be in the ancestral state for that SNP.
A clade is a branch on the genetic tree consisting of an individual and all its descendants that have some genetic component that serves as a unique marker for clade members. The group of all people derived for an SNP form a clade. The SNP associated with the clade is called the haplotype (haploid genotype) of the clade’s members. (Genetic testing companies reserve the word haplogroup for this meaning, and use haplotype to refer to grouping by a different type of allele called a short tandem repeat (STR)).
When a new haplotype first appears in a generation, some parent generation contains the most recent common ancestor (MRCA) for both the descendants that are derived for the defining SNP, and the remaining descendant population that did not inherit the SNP. For a Y-DNA SNP, the MRCA will be the prior generation father that produced two sons, one ancestral to the derived SNP and one to the branch not containing the SNP. Looking forward, the MRCA is the point of clade divergence. Working backward through all the generations of a clade, the MRCA is the point of clade coalescence.
The migration of gene alleles throughout a population is called gene flow. Gene flow happens mostly within a species, but can in rare cases happen across species. Gene flow is moderated by the forces that drive organism evolution: genetic drift and natural selection.
The relative frequency of alleles across a population varies randomly due to the random nature of the allele formation process. Genetic drift refers to the changes in population allele frequencies over time. Two facets of genetic drift are important to understand. First, the magnitude of genetic drift varies inversely with population size. Also, the relative frequency of an allele in a population says nothing about the usefulness of that allele to the organism. Any allele may be functionally beneficial, neutral, or detrimental to its longevity in the population. External conditions may favor the reproductive success of organisms with one allele over those with another. This is Darwin’s Survival of the Fittest by Natural Selection.
When gene flow is interrupted between two populations, usually due to a physical barrier between populations, the separated populations’ genetic material drifts apart due to evolutionary forces. When the genetic distance reaches a certain threshold, speciation is said to occur and the two populations are deemed to be separate species.
Adverse physical environments can reduce populations drastically, which tends to eliminate genetic diversity in the population. Upon subsequent more favorable conditions, the population expands from a smaller gene pool, which tends to accelerate genetic difference with related but distinct populations, accelerating speciation. Recall, genetic drift is greater in smaller populations.
A gene pool constriction similar in effect to a bottleneck occurs when one segment of a population splinters off to form a new sub-population. The genetic material forming the base of the new population is restricted to the alleles carried by the founders of the new population, called the founder effect.