Evolution of DNA - C-Value Enigma
Introduction
First Protein Transcription
First Genetic Replication
First Feedback
Puddle Evolution
First Dispersal & Evolution
First Parasite
First Organism
First Cell Metabolism
First Self-Sufficiency
Aromatic Assistants
First Assimilation
First Transfer Molecules
Eight Molecule Life
Complementary Base Pairs
Energy Sources
Conquering the Oceans
First Cells
Cellular Explosion
Gene Regulation
Chromosomes
First DNA
Introns
Wider Reading Frames
Complementary Triplets
Cellular Scripts
The Spread of Foxy
Second Parasite-- Transposons
First Schism
Improved Gene Regulation
Cell Structures
Eukaryote Explosion
Multi-Cellular Scripts
Cambrian Explosion
Epilog
Appendix 1-- Prebiotic Earth
Appendix 2-- Primordial Puddles
Appendix 3-- Primordial Catalysts
Appendix 4-- C Value Enigma
Cast of Characters

One of the many mysteries in modern genetics is called the C-value enigma (formerly called the C-Value Paradox). In a nutshell, the total amount of DNA in organisms is not directly related to their complexity, as you might think it should be.

For example, the genome size among different animals varies as follows, with all measurements in billion base pairs :

Group Smallest Largest

Prokaryotes

Archaea .0005 (Nanoarchaeum) .002 (Archaeoglobulus)

Bacteria .0006 (Mycoplasma) .007 (Rhizobacteria)

Inveterbrates

Protozoa .003 (Encephalitozoon) .02 (malaria)

Nematodes .03 (root-knot nematode) 1.9 (horse roundworm)

Molluscs .4 (owl limpet) 5.7 (Antarctic whelk)

Echinoderms .5 (sea star) 5.3 (sea cucumber)

Crustaceans .2 (water flea) 36.7 (deep-sea shrimp)

Insects .1 (Braconid wasp) 16.3 (mountain grasshopper)

Vertebrates

Fish .4 (puffer fish) 128.3 (lung fish)

Amphibians .9 (burrowing frog) 115.8 (mudpuppies)

Reptiles 1.1 (skink) 5.2 (Greek tortoise)

Birds 1.0 (cut-throat weaver) 2.1 (ostrich)

Mammals 1.6 (bent-wing bat) 6.1 (echimyid rodents)

Plants

Algae .01 (phytoplankton) 19.6 (stonewort)

Mosses .2 (guiana moss) 2.0 (horn calcareous moss)

Ferns .06 (spikemoss) 72.7 (whisk fern)

Angiosperms 2.3 (gnetum vine) 36.0 (Mexican white pine)

Gymnosperms .1 (creamy strawberry) 127.4 (fritillaria)

Overall, the minimum genome size seems to increase with complexity as you would expect. However, the average and largest genome sizes are not at all logical. For example, they are much larger in many of the ‘simpler’ organisms (particularly in some fish and amphibians) than in the more ‘complex’ birds and mammals.

Humans are approximately in the midpoint for mammals, with 3.3 billion base pairs.

Most of the excess genome length in the ‘big genome’ organisms is made up of repetitive DNA, which we have already theorized is generally used for scripting data. So the question is, why does the C-value vary so much, and how might Foxy and Moxy be to blame?

Duplication and Debris

Before we look at the impact of script data on genome size, let’s first consider some alternate explanations for the variation between species.

Polyploidy

One condition which can add to the genome size is polyploidy— extra chromosomes which result when two species merge their entire genetic code. Polyploidy is rare in animals, but common in plants (for example, wheat is a hexaploid species which contains the merged genome from three different species of grasses, and maize is probably a tetraploid formed from the merger of two precursor species).

Over a long period of time, most polyploid species gradually lose some chromosomes or duplicated genes, but for a while they will contain two or three times the normal amount of DNA.

Duplicated Genes

Another condition which adds to genome size is simple duplication of genes, caused by transposons or replication accidents.

Many species include multiple copies of important genes. The advantage of that is that it helps reduce lethality if one copy is accidentally disabled by a mutation or replication accident. The disadvantage is the higher metabolic cost of the extra DNA, plus the slower removal of bad copies of the genes (since individuals with a mutation in one copy don’t drop dead so promptly).

Duplicating a few protein-coding genes would probably not cause a huge increase in genome size, but it would have some effect.

Junk DNA

So far we have talked about the use of most repetitive or ‘junk’ DNA as data for Foxy or Moxy scripts. However it’s also likely that some of the DNA in most organisms really is junk— sequences inserted randomly by transposons or gene parasites, dud genes that are no longer functional, or any other genetic debris that hasn’t been weeded out by natural selection yet.

It is possible that organisms going through rapid evolutionary change will accumulate more junk than normal, since they’ll be changing scripts and enzymes at a more rapid rate than usual. Organisms in a cushy ecological niche may also be able to accumulate more ‘junk’, since they are under less competitive pressure.

Truly junk DNA probably can’t account for the huge variations in genome size, but it may account for some of it.

Script Sizing

Now let’s take a look at the impact of Foxy and Moxy scripts on genome size.

Number of Scripts

One way that the amount of script data might vary between species is in the number of scripts used by the organism. That could vary quite a bit, depending on the ‘algorithm’ used to develop the structures in each organism within a species.

For example, a ‘simple’ plant might simply send out new roots and leaves from a central point, and continue to do so in an infinite loop for a fixed time interval or until some environmental condition appeared. Then it might switch to flower and seed production until its death. It would be a very short script. Arabidopsis thaliana, a simple plant used in many genetic experiments would have a simple ‘life script’ like that, and it does have very little repetitive DNA in its genome.

A ‘complex’ and longer-lived plant might have many types of tissues, complex branching patterns, and different behavior under different environmental conditions, all controlled by multiple scripts felling each type of tissue what to do. That might result in hundreds or thousands as many scripts as the simple plant.

Script Sizes

Another factor that influences genome size is the length of each script.

A large and long-lived organism would probably have longer scripts, on average, though the script lengths would also depend on the ‘efficiency’ of programming found in each script. We’ll talk more about script efficiency next.

Alternate Scripts

Yet another possible source of extra genome size is the possibility that some species might include ‘alternative’ scripts that are currently not use, but that are still conserved.

For example, many parts of the world suffer extreme changes of conditions during the ice age cycles. As an adaptation to that, some species may carry dormant copies of genes that apply only to a set of conditions in some other part of the cycle. That way, a simple mutation in a script ID could shift a population drastically and quickly.

Code Compression

Humans contain a total of about 6 x 1013 (60 quadrillion) cells. With only about 3 x 109 (3 billion) DNA base pairs, that means there is only one nucleotide for every 2,000 cells.

Based on that math, it might appear that having full control over the placement of each cell would require a genome that is several thousand times larger. However it is actually possible that our repetitive DNA does code for the placement of each individual cell.

How could it do that?

As with computer programming, there are some ‘tricks of the trade’ that DNA might use to get maximum value from the data in Moxy scripts. Let’s take a look at some possibilities now.

Count Compression

In previous chapters we talked about Foxy and Moxy scripts using lengths of repetitive DNA as a way to set the length of a structure.

The simplest way to specify a mixture of cells in a tissue would be to have a long script, with one or more base pairs specifying each type of cell. It’s a system that encourages effective evolution, since there is a bit of slippage at each generation that would ‘nudge’ the length larger or smaller, so the structure could drift to its optimum size.

Once a species reached its ‘perfect’ size, it could switch to a different length-coding system which would be less evolvable, but more compact.

For example, an extremely clever reader of short scripts might be able to ‘parse’ a number from a short DNA chain, similar to the way modern computers read numerical data. In theory, a length of 5 nucleotides could code for up to 45 different lengths, sufficient for any lengths under 1,000. 10 nucleotides could code for over one million values, and 15 nucleotides could manage over one billion.

Of course there may not be a protein capable of reading nucleotide sequences into real binary numbers like that (it would need to somehow give each possible nucleotide a value between 0 and 3, and then multiply the second nucleotide by 4, the third by 16, the fourth by 64 and so on).

However an organisms could also code for lengths compactly by storing a variety of scripts, each with a different length and a different script ID. Multiple scripts coding for length could then reference one of the ‘constant’ sequences, and Moxy could grab a copy, and use it as a traditional length-based script. That way a short gene ID of maybe 17 or 20 base pairs could ‘represent’ a script that might be thousands of elements long.

Script Looping

We’ve already talked about Moxy calling other Moxy scripts. The use of script ‘routines’ calling ‘subroutines’ also allows organisms to reduce the total length of scripts.

For example, one way to specify a stretch of 10,000 skin cells would be to use a script that is 10,000 base pairs long. Doing that offers a high degree of precision, but it may be much more detail than an organism would really need.

An alternate approach would be to run a master script that is 100 base pairs long, and use it to call a lesser script that is also 100 base pairs long— and run it 100 times.

By necessity, the result would have a ‘repeat pattern’ of 100 cells, with no way to vary the pattern individually in each block. On the other hand, the system would also require only 200 base pairs to manage the placement of 10,000 cells.

Analog Controls

Still another way to reduce the length of the genome would be to abandon precise ‘digital’ control, and go back to a more ‘analog’ system.

For example, to fill in a bunch of skin cells, an organism might simply repeat a 100-element script indefinitely, until it runs out of places to put the skin, or until some sort of timer runs out.

Script Reuse

Another way to reduce the volume of the genome is to use lengthy scripts for more than one function.

We previously talked about using a single script from two different Foxy or Moxy proteins as a way to ‘link’ two properties, but of course, the same thing would also shrink the number of base pairs required in the genome.

Compression and C-Value

All the script compression techniques have one thing in common— they reduce the size of the genome, but they also reduce the ease in which an organism can evolve.

Using a number or a constant instead of a full script means it’s impossible to throw one or two unusual cells into the middle of a tissue. In many settings that might actually be a good thing, but it does reduce an organism’s capacity to come up with an interesting new structure via random mutations.

Likewise, looping and analog controls mean that an organism loses some precision in the way it lays out its tissues.

Using the same script for more than one function doesn’t take away any details, but it causes its own evolutionary problems, since the two different tissues will be linked in the evolutionary sense. Any mutation that is beneficial for one of the tissues may be detrimental for the others.

Design Tradeoffs

The C-value paradox may simply represent differences in evolutionary strategy between different organisms, along with some possibly practical consequences of the varying amounts of genetic material.

An organism with small amounts of genetic material would represent a highly optimized species— with highly efficient coding of its organs and tissues, but with less potential to evolve quickly into new forms.

On the opposite extreme, organisms with huge amounts of genetic material would have more of their structural information coded in simple scripts, which could evolve more quickly if conditions changed.

Evolutionary Timing

Differences in the C-value may also represent organisms that are in different stages of evolutionary development.

Shortly after new Hoxy genes are introduced, a species may start using many new scripts, which might take up large amounts of genetic material simply because they haven’t had time to be optimized yet.

Later on, metabolic pressures would reward individuals that used more efficient coding, particularly after the optimum scripts and organ sizes had developed for the species.