Evolution of DNA - Complementary Triplets

First Protein Transcription
First Genetic Replication
First Feedback
Puddle Evolution
First Dispersal & Evolution
First Parasite
First Organism
First Cell Metabolism
First Self-Sufficiency
Aromatic Assistants
First Assimilation
First Transfer Molecules
Eight Molecule Life
Complementary Base Pairs
Energy Sources
Conquering the Oceans
First Cells
Cellular Explosion
Gene Regulation
First DNA
Wider Reading Frames
Complementary Triplets
Cellular Scripts
The Spread of Foxy
Second Parasite-- Transposons
First Schism
Improved Gene Regulation
Cell Structures
Eukaryote Explosion
Multi-Cellular Scripts
Cambrian Explosion
Appendix 1-- Prebiotic Earth
Appendix 2-- Primordial Puddles
Appendix 3-- Primordial Catalysts
Appendix 4-- C Value Enigma
Cast of Characters

As we've mentioned before, amino acid chains are very bendy. The carbon-to-carbon and carbon-to-nitrogen bonds make a very sharp turn (109°) at each link in the peptide chain, and there is a great deal of rotational flexibility at the bonds (unless bulky side chains prevent it).

Proteins can form into rigid structures, but they usually do that by wandering around and looping back and forth in 3D space, and then holding the structure together with linkages between molecules in different parts of the chain (either by hydrophobic bonding, polar bonding or cysteine sulphur bonds).

In general, protein enzymes work by having an occasional polar amino acid that sticks out and interacts with some other amino acid further down the chain (the two molecules just happen to be physically close because the chain looped around and back). Enzymes rarely have more than one or two amino acids in a row that are part of an active group, simply because it's difficult to get the branch chains from three consecutive amino acids to be physically close to each other.

Problems with Fred

What that flexibility means for protein transcription is that proteins are not the ideal structures for reading genetic chains. Fred and Roscoe managed to do it, but it was going to be a design challenge for them to ever expand into wider reading frames that were sufficient to code for proteins with a larger variety of amino acids.

It probably could be done, but once Cassius started using complementary nucleotides, there was a better way.

Chains Reading Chains

From a structural point of view, the ideal way to match up with a short stretch of backbone chain is to use another chain. It's equally compact, and rigid, and thanks to complementary pairing, just the right shape to line up with the first chain. So this might (finally!) be the right time to start thinking about protein replication that is closer to the modern system.

A few chapters back we already looked at Fatcat, which managed versions of Fred that were each a 'transfer molecule' that brought in one amino acid.

What if Fatcat managed RNA-based carriers instead? And what if those carriers matched up to more than one chain molecule at a time? Matching two base pairs at a time would allow proteins to include up to 15 different amino acids (plus a stop codon), while a triplet system would allow up to 63 amino acid choices.

That sounds great, but there was one big problem. For a while at least, the new system still needed to coexist with the older single-reading-frame system. Cassius still needed its 'legacy' genes such as Sofia and Sorrel, and any new triplet system that interfered with the transcription of Fred and Roscoe would be lethal.

Triplet Transition

There are two possible ways that Cassius could have switched to three-molecule reading frames, with no clear evidence for which way it actually went.

One possibility is that the transition happened very early, right after the assimilation of a Caleb with an alt-Caleb. There was a brief 'window of opportunity' when a new system could have slipped in, with reduced risk of interfering with legacy proteins.

Another, and probably more likely option, is that the shift happened later, with the triplet coding arising from an entirely different pathway, that was unrelated to the old Fatcat/Fred transcription system.

We'll consider these two paths in more detail, now.

Gradual Triplets

If the transition to complementary base pairs and triplet coding happened soon enough after the assimilation of a Caleb and alt-Caleb, there was a relatively easy way to start using a few triplets to add a few new amino acids-- similar to the method that we mentioned in the previous chapter, when looking at double-wide Fred.

The 'legacy' genes that came from the original Calebs would have contained just the two nucleotides from Sofia, while the alt-Caleb genes would have contained just the two complementary nucleotides from alt-Sofia. So any new triplet-Freds that matched with a mixture of nucleotides and alt-nucleotides would have been guaranteed not to interfere with any of the original genes.

Let's look at how a fifth amino acid might end up in a four-molecule protein, via a combination of triplet reading frame, and complementary base pairing.

1. We'll start with the usual 'alternating Freds' transcription of a protein from a chain. Fatcat and Fred transcribe a bunch of amino acids using one chain molecule at a time, just like normal.

2. After a while, Fatcat comes to the just right chain spot, but a triplet Fred pops into the spot instead of a regular Fred. This particular Fred contains three chain molecules which happen to match up with the next three base pairs . It also contains a knee which attracts a new, fifth amino acid. Fred bonds complementarily, and adds the brand new amino acid to the protein.

3. Fatcat and an old Fred then continue with normal transcription.

4. The result is a chain that contains a new, fifth amino acid.

Chain Results

Of course, as with all the previous changes in protein transcription, this new protein was probably not beneficial, and may even have been lethal.

However, after many false attempts, the triplet-coding system eventually might have produced a useful new protein. So the organism that contained it would have thrived.

Over a long period of time, the beneficial effect of having more amino acid choices would have given a selective advantage to any Cassius which was capable of using the new triplet system. Because of that, the system would become established.

Triple Wide Origins

The jump from single-wide Fred to triple-wide Fred might seem like an implausibly large change, but there is a bit of existing cell chemistry that would make it much more possible.

A few chapters back, we mentioned the use of 'carrier' molecules that would bring in a small, raw material molecule, and link it up to a blueprint chain via complementary pairing.

Those carrier molecules had a function that was almost identical to Fred's, it it's quite possible that they might have intruded accidentally into a protein transcription, and then had a beneficial function.

Sixth to Twentieth Amino Acids

Once Fatcat and Fred established the fifth amino acid (and the first to be coded from triplets), the same system would have gradually become established in more and more polypeptides, introducing more and more new amino acids.

Some triplet Freds might have picked up a different chain triplet, and added additional triplets that would code for the same protein. Some of those new triplets may have mutated at the knee end, and started bringing in new amino acids.

Presumably, at some point the whole system gradually evolved into our current 20 amino acids coded from 61 of the 64 possible triplet combinations, with a few leftovers which we'll discuss later.

Replacing Fred

Over time, the triplet-based amino acids would have gradually become the dominant components of proteins in early organisms, and the original single-frame transcriptions would have become less and less important. At some point the old Fred system would have probably become so non-essential that it could simply disappear.

The question is, what happened to the four original amino acids from the original Fred and alt-Fred polypeptides?

The odds are good that triplet coding would have already been introduced for those amino acids-- having two different systems coding for some presumably essential amino acids would not be harmful in any way.

It's also possible that one or more of the very original amino acids were not quite as useful to newer organisms, for whatever reason. In that case, they would have simply disappeared when the triplet system became the only form of protein coding. By then they would have been contained in only a few 'legacy' proteins anyhow.

Mixed Reading Frames

Unfortunately, the gradual approach to triplet encoding still has most of the disadvantages of the double-wide Fred system. The biggest problem is that there would sometimes be ambiguous reads, since the three base pairs for a triplet would sometimes be read by three old Freds, producing an entirely different protein with three 'old' amino acids substituting for the one new one.

There may have been clever ways for cells to prevent that, but it's also possible that the modern triplet-reading system evolved from a completely different pathway. Let's take a look at that now.

Independent Triplets

A while back, we mentioned the 'blueprint' chains, which used a short length of RNA and complementary base pairing to position several enzymes into a supercatalyst. We also talked about the use of RNA as a 'carrier' for bringing in coenzymes and raw materials for synthetic reactions, and positioning them via complementary base pairing. And we mentioned that the raw material carriers probably used a fairly short RNA match sequence, since they needed to 'pop' out of the complementary pairing, once they had delivered a molecule of raw material.

As we mentioned in the previous section, a blueprint carrier molecule might have introduced a new triplet-coded amino acid into the existing 'legacy' Fred/Fatcat system. But it's also possible that the blueprint system could have turned into a triplet-based transcription system, completely independently of Fred.

Let's take a look at how it would work.

First Triplet Transcription

The very first triplet blueprint might have looked something like this:

1. A Fatcat protein matches to the beginning of a blueprint chain. Some carrier molecules link up to the remainder of the blueprint chain, bringing in a series of amino acids.

2. Fatcat moves down the line of amino acids, and assembles them into a short polypeptide. Mission accomplished.

First Useful Triplet Proteins

The first polypeptide synthesized by the new triplet system was probably very short-- it may have been composed of just two or three amino acids . That wouldn't have been long enough to make an enzyme, but it might have served as a coenzyme, or a messenger molecule, or it may have been a component in some longer, structural polymer.

As a temporary expedient, cells may have also created short polypeptides with new amino acids, and then added them to the regular four-molecule proteins created by Fred and Fatcat.

For example, cells probably already had carriers for glycine, since it's very common, and it has some useful properties. They would have probably gained a huge advantage from adding glycine to existing proteins since a Gly-xx-Gly-xx-xx-Gly sequence is very good at linking to the phosphate portion of an ATP molecule. Glycine is a little too small and simple to make a good ingredient for a two-molecule or four-molecule protein chain, but it is so common that it probably was a part of early cell metabolism, and would have been very beneficial to manage.

Once the triplet system was building a few useful short polypeptides, it could have gradually started reading longer and longer chains, and building longer proteins.

There would have been a period of overlap, when Cassius was building some proteins from the old four-molecule Fred/Fatcat system, and some multiple-molecule proteins using triplets. However, eventually the triplet system would have eventually taken over completely, simply because it was able to build proteins from a much larger range of amino acid components.

Amino Acid Repertoire

It's quite possible that 8-molecule versions of Cassius were already using some additional amino acids-- as coenzymes, as a component in cell membranes, or for some other minor function.

In that case, Cassius probably already had carriers for those additional molecules. So the first triplet-based protein synthesis may have started right out with more than four amino acids to build with.

After that, to add a new amino acid to its repertoire, all Cassius really needed to do was develop a carrier for that new molecule, along with a different match sequence for it, in the genetic chains.

Each new amino acid would provide a significant advantage to the cells that could build it into proteins, but some would have given more of a boost than others. For example, amino acids with aromatic rings would have allowed proteins to manage electrons, which would have expanded their use as enzymes. The sulfur-containing amino acids would have added new chemistry that was not present in the early amino acids, nor in the nucleic acids.

Because there were 64 possible permutations of triplets, the new triplet system could handle a much wider range of amino acids than Fred ever could.

Then a cellular Cassius just had to wait for a genetic chain to appear that created a useful polypeptide from that new amino acid, and one more molecule would be on its way to becoming the next fad in protein-building.

Triplets and Fred

The new triplet system could have started out creating just a few obscure, small polypeptides, since the most basic life proteins were still being create by Fred and Fatcat from the four original amino acids.

As the triplet system increased the number of amino acids it could handle, eventually it would have started producing a wider and wider range of useful proteins. At some point, it could have completely replace the Fred-based transcription system. However, there was no need for that to occur until the triplet system had a full repertoire of proteins that was sufficient to take over from Fred and Roscoe.

Reading Frame

We were vague about the length of the match sequences for the blueprint chains and carrier molecules. There's no specific length that is necessary for either task, and it is possible that organisms developed protein-coding systems that used different reading frame lengths.

Organisms with a reading frame of four or more would have been able to uniquely identify a larger number of amino acid components, but that would only help if there was a selective advantage to using a larger choice of molecules, when building proteins.

The wider reading frame organisms would have had some mild disadvantages, as well. Their genetic chains would have to be longer, consuming more energy and raw materials than their shorter-chained cousins. They also would have had a more difficult time finding matches for all possible permutations (there are 256 choices for a reading frame of four, and 1024 choices for a reading frame of five).

On the other hand, a reading frame of two molecules would have only allowed 16 different amino acids (possibly fewer if some permutations were duplicates, or used for other tasks).

Since the triplet reading frame is 'just right', any organisms that used it would have had an eventual selective advantage, which would have established that system.

Modern Transcription

The new triplet-coding system was basically the same as the modern transcription process. Later generations would refine many of the details of the process, but we're now close enough to the final version that we won't need any more creative molecular evolution, at least not for protein transcription.

Presumably the introduction of the 'final 20' amino acids took quite a while, and it's likely that there were many alternative organisms that used different sets of amino acids that eventually proved inferior.

The triplet system would have had some interesting times in its early days, since some RNA sequences would not match with any carrier molecules. But we won't try to follow the details of that chemistry in this story.

Transfer RNA

Whether triplet coding for proteins began as a gradual substitution or an independent process, it still needed some sort of carrier molecules to bring in each amino acid, and add them to a new protein chain.

The carriers could have been proteins, but because of the tight fit at the triplet end, RNA chains would have worked much better. That would have been easy to arrange, since this was the peak of the 'RNA world', and there were probably already plenty of RNA-based carriers for other molecules, that could have shifted into the role of amino acid carriers.

So it seems likely that the new triplet system would have started right out using some precursor to tRNA, the modern RNA chains which still carry amino acids as part of the transcription process.

Whenever a new tRNA sequence introduced a new amino acid, its Cassius would have either prospered (if it created useful new proteins) or died (if it interfered lethally with existing metabolism). Over time, that means that organisms using the 'best' amino acids would have increased in number, along with the tRNA genes that brought them.

Eventually, proteins would have bulked out to their full complement of 20 amino acids, with a full set of tRNA molecules to match any triplet in the genes. However, that process may have been extremely gradual, possibly extending over many millions of years.

t-RNA Components

It is interesting that t-RNA uses several non-standard nucleic acids (pseudouricil, inosine and hypoxanthine). Modern cells first build their t-RNA from normal RNA components, and then modify some of the molecules later.

There is some change that the alternate RNA molecules represent vestiges of some earlier alt-Caleb chains that were assimilated prior to the final DNA/RNA system, or some 'RNA world' chains that were absorbed from some separate, more RNA-based organisms, and then used to help make the transition from Fred to modern ribosomes.

However, it seems more likely that the non-standard nucleic acids are a later modification that allow some of the t-RNA carriers to more effectively carry their amino acid load.

The Protein World

The first Cassius was probably built from just four amino acids-- and, most likely, fairly simple amino acids at that. So it had to rely heavily on its nucleic acids to handle most of the chemistry in its enzymatic actions.

However, once cells started to use triplets to carry amino acids, then the chemical repertoire of proteins could gradually expand, until they took over much of the enzymatic chemistry that was formerly handled by the aromatic nucleotides.

Better Protein Chemistry

Once proteins included aromatic components (histidine, phenylalanine, tryptophan and tyrosine), they could start to take over the 'electron management' and 'proton management' that was necessary for most enzymatic reactions, and that formerly had been handled by the nucleic acids.

The addition of the two sulfur-containing amino acids (cysteine and methionine) added some extra chemical repertoire that wasn't available in the nucleic acids. Cysteine also provided better positioning and structural possibilities, since it's capable of forming strong cross-link 'bridges' between different parts of the protein chain.

Adding proline and glycine made it easier to 'design' proteins with a specific structural shape. And the eight remaining amino acids provided 'design flexibility' for the evolution of new proteins, since they had chemical properties that were similar to existing amino acids, but in a slightly larger or smaller size.

Protein Domination

In many cases, a purely protein enzyme could have been more effective than either a ribozyme, or a combination of a protein with a chain. That would happen since it is easier for a protein to precisely position each catalytic molecule in an active group, thanks to the extreme flexibility of the peptide bonds.

So it seems likely that proteins would have become increasingly dominant, as cells were built from a larger and larger selection of amino acids. Some ribozymes and helper chains continued to exist, but the 'RNA world' gradually was replaced with the modern 'protein world'.

Amino Acid Evolution

As organisms added new amino acid components, there would have been plain old Darwinian selection between them and the older organisms that were based on fewer components. There may also have been competition between different classes of cells that used different 'new' amino acids.

It seems likely that some cell lines used other compounds which simply didn't provide a sufficient advantage, so they were eventually eliminated.

At some unknown point in the early evolution of life, cells fixed on the current twenty amino acids that are used by all modern life forms (with a few minor exceptions). And modern, protein-based organisms took their place in the primordial seas.

Farewell to Helper Chains?

Once proteins started to include amino acids with aromatic side chains, it would have been possible to create fully functional enzymes without the use of RNA helper chains to provide their chemical oomph.

As enzymes grew larger and more sophisticated, it also would have stopped being so important to use RNA chains to position multiple small enzymes. Cells could use a plain old amino acid chain to contain multiple active groups and position them properly, rather than relying on several short chains positioned with RNA.

Likewise, cells could have started to replace ribozymes with proteins for many metabolic functions-- though not completely. RNA still remains as a carrier for amino acids in protein synthesis (called tRNA), and it's still used as an enzyme to this day in a few special roles.

Although it's unlikely that intron-based helper chains would have disappeared completely, it's also probable that they became less vital for cell functioning, and probably declined in number .

Farewell to Roscoe

Eventually, of course, it was time for cells to replace the 'alternating Roscoe' system with a modern system of DNA replication and RNA transcription, using complementary base pairs.

Most likely that change happened in parallel during the switch to triplet pairing, since many of the same enzyme systems could have worked for both transcription and replication.

However, there was no real pressure to make the switch, and it's possible that Roscoe still continued for eons later. Since replication happens just one molecule at a time, there was never need for a dramatic shift in the process, as happened with protein transcription.