Evolution of DNA - Complementary Base Pairs
Introduction
First Protein Transcription
First Genetic Replication
First Feedback
Puddle Evolution
First Dispersal & Evolution
First Parasite
First Organism
First Cell Metabolism
First Self-Sufficiency
Aromatic Assistants
First Assimilation
First Transfer Molecules
Eight Molecule Life
Complementary Base Pairs
Energy Sources
Conquering the Oceans
First Cells
Cellular Explosion
Gene Regulation
Chromosomes
First DNA
Introns
Wider Reading Frames
Complementary Triplets
Cellular Scripts
The Spread of Foxy
Second Parasite-- Transposons
First Schism
Improved Gene Regulation
Cell Structures
Eukaryote Explosion
Multi-Cellular Scripts
Cambrian Explosion
Epilog
Appendix 1-- Prebiotic Earth
Appendix 2-- Primordial Puddles
Appendix 3-- Primordial Catalysts
Appendix 4-- C Value Enigma
Cast of Characters


































































































We talked earlier about the ways that different versions of Caleb might 'drift' into the use of different amino acids and chain molecules, as they moved into new puddles that were stocked with different raw materials. As those Calebs merged with neighboring alt-Calebs, the result would have been many different versions of eight-molecule organisms, each based on a different permutation of four aromatic chain molecules, and four amino acids.

Of course there would be serious 'chemical competition' between all those varying flavors of Caleb and Cassius. Those that contained amino acids and chain molecules that were particularly good at building enzymes would have prospered more than their neighbors, and expanded into more new puddles.

Out of all the possible permutations of chain molecules, eventually one Caleb or Cassius appeared that just happened to include a set of four aromatic chain molecules with a very interesting new property-- complementary pairing.

That means that its four aromatic chain molecules were shaped exactly right so they could link up in matching pairs, similar to the way that RNA and DNA work today. In fact, it's very likely that the first complementary molecules were actual RNA, or at least something very similar.

Let's look closer at the consequences of that first complementary pairing.

Meet the Nucleotides

Up until now we have been vague about the specific aromatic chain molecules found in Sofia and the other genetic chains. They really could have been any sort of molecules that were flat enough to link up into long, straight chains. And in fact, there were probably alt-Sofias built from many different combinations of aromatic molecules.

But chemical evolution has proceeded long enough now, that we can be fairly certain at least one particular Cassius was built from the specific chain molecules that are found in modern life-- namely the nucleotides (adenosine, cytidine, guanosine and uridine), or possibly some very close cousins.

As a quick recap of some well-known chemistry: adenine and guanine are purines, with a large, double-ring structure. Cytosine and uridine are pyrimidines, with a smaller single-ring structure. Cytosine and guanine each have three atoms that can line up for hydrogen bonding, while adenine and uridine only have two. The result is that when two RNA chains are close together, they will bond tightly to each other in any place where there is a C-G or an A-U match The full chemical notation looks like this, with the dotted lines representing hydrogen bonds:

The simplified Sofia-style notation looks like this:

You might think of it as a chemical zipper, capable of linking up and letting go, as needed.

Consequences of Complementary Pairs

What were the consequences of this pairing?

Well, at the beginning, it had absolutely no effect on either protein transcription or gene replication, contrary to what you might expect. Sure, that is its most important use in modern living organisms, but in Caleb and Cassius, neither Roscoe nor Fred cared whether Sofia's chain molecules came in pairs. It will be a few chapters yet before there is any advantage to having complementary base pairs in protein transcription or genetic chain replication.

However, complementary chain molecules would still have given some almost immediate advantages to the first Caleb or Cassius that manage to include them. Let's look at some of the possibilities.

Mixed Chain Helpers

One place where complementary chain molecules would have helped a Cassius was in the 'helper' chains that positioned multiple enzymes into a larger supercatalyst.

With complementary pairing, Cassius could link polypeptides together with a 'Velcro' action in the complementary pairs. That could have easily given it some sort of metabolic advantage-- for example, the ability to quickly link up several enzymes, or to pop different enzymes in and out of an enzyme complex very quickly.

Cassius could also manufacture each enzyme individually with a short length of chain attached to it, and then rely on complementary pairing to link them up properly. To do that, it could start with the individual enzymes, each with a short 'marker' sequence, plus a master chain that would position them.

Thanks to the complementary pairing, the master chain could easily bind and position the three individual enzymes into a larger complex.

Enzyme Tertiary Structure

Short complementary chains might also be useful within the structure of a single enzyme. For example, complementary pairing in an attached backbone chain would be a good way to hold an enzyme into two different conformations, as shown below.

The weak hydrogen bonding between a few complementary pairs would be just strong enough to keep a protein enzyme in position for a short while, but not so strong that the protein couldn't flip from one state to the other.

In fact, changing the length of the complementary chains would adjust the binding force at each position, and make it possible to 'fine tune' the amount of time that the protein would stay in each of the two conformations.

Larger Structures

The new complementary pairing may have also helped Cassius to build larger enzyme complexes and larger physical structures. It's almost like the invention of chemical Velcro-- a set of molecules that can easily bind proteins together, in ways they might not be able to accomplish on their own.

In fact, short backbone chains with complementary pairs are 'smart Velcro', since they'll only link up with the correct, matching combinations of nucleotides. Once Cassius built the right protein-chain combinations, they would pretty much assemble on their own into just the right positions and orientations. Not unlike a set of prefab Swedish furniture that's able to self-assemble, sticking tab A into slot B automatically!

Pure Chain Enzymes

With complementary pairing, Caleb could also have developed fully functional enzymes made entirely out of nucleotides-- something that wasn't possible with the earlier, non-complementary chains.

How could that happen? Well, let's take a look at the folding of a 4-molecule backbone chain that contains a few portions that are complementary to other nucleotides in the same chain.

After this particular chain is created by a Roscoe, the complementary regions in the chain link up and form relatively straight, rigid sections. That linkage forces the remaining parts of the chain into hairpin turns that impose a sharper bend than usual on the chain molecules.

Within those turns, the active parts of the chain molecules stick out in a way that makes them much more available for chemical action. With the right sequences, the RNA chain will bend just right, so several of those active groups wind up very close together. That allows them to work together chemically, and produce an enzymatic effect.

RNA-based enzymes would have been particularly useful for our new 8-molecule organisms, since they probably didn't include any amino acids yet that contained aromatic rings (with their ability to donate or absorb electrons, protons and energy).

Chain Enzyme Production

These new RNA enzymes were produced differently from polypeptide enzymes. Fred was not involved at all-- instead, Roscoe replicated them, at which point they could start their enzymatic action immediately.

Chain enzymes might have helped provide some 'load balancing' for Caleb, especially if the local environment contained raw materials that made it easier to build chain molecules than amino acids. By using the chain molecules for enzymes as well as for genetic chains and small 'helper' chains, Cassius may have been able to conserve amino acids for use in structural and enzyme proteins.

Other Chain Advantages

Chain-based enzymes would have included active groups with aromatic rings, and would have been especially good at catalytic reactions involving the transfer of an electron or proton. They also would have worked together well with other compounds that included aromatic rings.

Because of their flatter shape, RNA-based enzymes might have also been able to fit into places where the more globular protein-based enzymes couldn't reach.

They also would have interacted very well with 'helper' chains, thanks to their complementary base pairs. For example, we talked earlier about ways that a 'positioning chain' could line up protein enzymes, as long as some short chains were attached to each protein as a sort of ID marker. Positioning an RNA-based enzyme would have been even easier, since the ID sequence could be built right into the chain sequence.

In general, RNA-based enzymes provided one more tool in the chemical toolbox.

Chain Enzyme Evolution

In the days when Caleb still had only four amino acids for building its proteins, it may have gotten an extremely valuable boost from enzymatic chains (called ribozymes if made from RNA). Either by themselves or in combination with proteins, they would have provided new enzymes at a time when Cassius still didn't have as many chemical options in its proteins as it would have later.

Overall, chain-based enzymes were rather more 'clunky' in shape than protein enzymes, and eventually they would be replaced almost entirely by modern proteins built from a full 20 amino acids. But before the full range of amino acids were available, ribozymes would have been a real lifesaver for the earliest versions of Cassius.

Complementary Headers

We talked earlier about the need for some sort of 'gene header', so Fred wouldn't transcribe helper chains into proteins (and also so Fred would have an easier time finding the beginning of each gene).

Complementary pairing provided a perfect solution to both problems. It might have worked like this:

1. When Fatcat is created, it picks up a short backbone chain which attaches loosely near the 'elbow' end of the Fatcat molecule.

2. The chain bonds to the beginning of a genetic RNA chain via complementary pairing.

3. Fatcat disconnects from the match sequence, and starts transcribing at the next chain molecule in the sequence, just like normal.

Header Advantages

Having a 'landing site' at the beginning of each gene would ensure that Fatcat and Fred would start reliably at the correct location, every single time. Using a header sequence would have also allowed Cassius to restrict which polypeptides it actually created. It wouldn't have wasted time and resources by transcribing bogus polypeptides from helper chains and ribozymes.

Just as importantly, the header would also prevent Fred from copying any parasitic or just plain old random garbage chains that might be lurking in the neighborhood. Adding a barrier to parasitic infection would have been particularly useful if there were parasitic chains that created a particularly deleterious enzyme-- for example, a parasite version of Fred that only replicated its own chain at the expense of Cassius's own components .

Headers and Roscoe

Adding gene headers would have also been beneficial for chain replication, since it would have allowed Roscoe to distinguish between Cassius's own 'real' genes, and any stray or parasitic chains.

If chains used as a 'helper' or as a ribozyme contained a different header sequence, Roscoe could also have made a point of replicating them more frequently, since they were used in day-to-day metabolism as well as when creating new copies of Cassius.

Restricting replication and transcription to just 'known' genes would have made Roscoe and Fred more efficient, but it would also have slowed down the speed of evolutionary change. It's hard to know which effect would be dominant, although presumably if headers were too restrictive, they would not have had an evolutionary advantage, and would not have appeared yet.

Header Evolution

The 'landing site' header system was a great innovation, but Cassius couldn't have just jumped immediately to a system of 'required' headers. The problem is that its existing genes would have lacked the header, and wouldn't have been transcribed. That obviously would have been a quickly lethal condition.

The transition may have occurred gradually, with 'old' Fred still transcribing the original chains, and the 'new' Fred transcribing chains that included a landing site.

It would have been extremely convenient if Cassius could have regulated the addition of a 'landing site' to existing chains, so they could switch to the header system immediately. A polypeptide that could do that is plausible, but it is less certain whether it would have given Cassius an overall advantage, since it may also have marked some 'garbage' chains for replication and transcription.

As with headers in general, evolutionary pressure would presumably have controlled whether a 'header adder' protein would have appeared in early versions of Cassius.

The Complementary Socket Set

We've already mentioned how complementary nucleotides could help assemble proteins and ribozymes into a larger protein complex. But they could also have served a much larger role, by providing a much faster way for enzymes and enzyme complexes to evolve.

You might think of it as an entirely new type of genetic inheritance, on a slightly larger scale than mere protein coding. And the only things it would require would be a positioning chain, and a marker sequence on each enzyme.

Here's one way it could work:

1. Caleb develops a 'library' of enzymes, each with a different 'match sequence' consisting of a few nucleotides.

2. The enzymes could be proteins with an attached chain, as shown above, or they could be RNA based ribozymes that had a match sequence built into their RNA chain.

3. Caleb also develops a helper chain that is designed to position enzymes into a larger structure. It contains sequences that match some of the enzymes.

4. The enzymes diffuse into place, and the chain lines them up into specific positions.

5. The combination of enzymes acts together as a supercatalyst, and does something useful for the cell.

6. If there is a mutation in the helper chain, it results in a different sequence of enzymes…

7. which has an entirely different function.

Once Caleb produced a few good enzymes with a short 'marker' chain, it could combine them into several different ways, by using different master chains that had a different sequence of nucleotides. It could also 'nudge' the positions of each enzyme, by adding or deleting 'spacer' nucleotides in the master chain.

Blueprint Chains

Something interesting has just happened here that is worth looking at, a little more closely. A change in the helper chain has resulted in an entirely different supercatalyst, without actually changing the chemical structure of any of the individual enzymes. So there is useful genetic action happening in a slightly larger way than the usual protein coding.

You might think of the helper chain as a 'blueprint', which assembles components into a larger structure that does something more than its individual components. Changing the sequence of the blueprint chain has a rather limited and directed effect on the Cassius that contains it-- all it will do is change the selection of enzymes in that larger structure.

A Cassius might only need a 'library' of a few dozen enzymes for common chemical reactions-- oxidations, reductions, methylations, demethylations, carboxylations, decarboxylations, and so on. It could then create hundreds of useful supercatalysts from that library, by combining them into different arrangements that each perform slightly different catalytic actions. It's not unlike building a chemical factory, which each synthetic step arranged in sequence-- perhaps in a tubular structure, held together in the right positions by a blueprint chain of RNA.

That means that a blueprint sequence that puts some proteins into useful positions would have just as much survival value as a new gene that codes for a useful new enzyme.

And of course, plain old natural selection would work on the blueprints, just the same as it would on regular protein-coding or ribozyme RNA. Copies of Cassius with a script that produced a lethal combination of enzymes would die out, while those with a particularly useful new supercatalyst would thrive.

Advantages of Blueprints

From an evolutionary point of view, the blueprint sequences offer some very important advantages over the more traditional protein-specifying genes. You might say that they provide 'directed evolution', by limiting the effects of mutations, and improving the chances of beneficial changes.

Changing one molecule in a gene that is transcribed into a protein will have a more or less random effect. The amino acid substitution may do nothing, or it may completely change the shape and function of the protein. If the protein is already functional, then the odds of making an improvement are small, and the chance of making a damaging or lethal change are large.

Because of that, improvements in proteins are rather 'expensive' in an evolutionary sense. They happen very slowly, and they require a species to endure many lethal mutations for every good one. You may say that protein changes are a luxury that can only be enjoyed by a species with large populations, and a short lifetime.

On the other hand, changing one molecule in a blueprint gene has a controlled and predictable impact. It will change one enzyme or one carrier protein in some process, and that is all it will do. The odds that it will be beneficial are much higher, while the odds it will be lethal are much lower.

Previous Blueprints

We talked earlier about helper chains that helped to line up molecules into a supercatalyst, by directly connecting to the components.

They also acted like a blueprint of sorts, but there's a big difference between those chains, and the newer blueprint chains that use complementary pairing and a match sequence.

Back in the old days of direct connections, it was hit or miss whether a chain would attach to an enzyme. The sequence of chain molecules needed to match something in each enzyme so it would bind, and a change in the enzyme's sequence of amino acids would probably require a matching change in the helper chain.

This new system of match sequences makes the whole enzyme-assembly process much more 'modular'. Once each enzyme is attached to a short nucleotide chain with a unique sequence, then Cassius can work it into supercatalysts, as soon as it has a blueprint chain with the matching sequence.

Molecular Carriers

The short 'script' genes could have done more than just assemble enzymes. Since one important part of the enzymatic process is delivering the right ingredients to the right place, Cassius could also have developed 'carriers' for enzymatic raw materials, or for important coenzymes such as ATP or NADH.

Here's how it might work:

1. Caleb develops a 'library' of carrier proteins (or ribozymes). At one end, each carrier has a 'match sequence' consisting of a few nucleotides. On the other, each carrier is shaped so it will link up with a specific compound.

2. The helper chain includes match sequences for a combination of carriers and enzymes.

4. Enzymes and carrier molecules diffuse into place, and the chain lines them up into specific positions.

5. Now the enzymes can work on the raw materials, or use the coenzymes that the carriers have delivered.

6. A mutation in the script chain results in a different carrier molecule, which delivers a different raw material…

7. So the same enzymes could produce an entirely different set of end products.

Carrier Choices

In the examples above, we use protein chains to carry each molecule, mainly because they are more compact, and easier to draw.

However, ribozymes might have actually made the better carrier molecules. For one thing, they are already built from nucleotides, so all they need is a loose bit of the chain to act as a match sequence.

The relatively flat, planar structure of a ribozyme might also be better suited than a bulkier protein, for tasks such as squeezing in between enzymes to deliver a molecule, or snuggling up close to an enzyme to provide an ATP, and give some process a jolt of energy.

Carriers vs Blueprints

The raw material carriers could have worked along with the blueprint chains that we mentioned earlier. If the carrier delivered a coenzyme such as NADH, it would pretty much have been just another blueprint component, nestled amongst the regular enzymes.

For delivering raw materials or an energy jolt of ATP, it would help if the carrier could come and go easily. That way a single 'slot' in the blueprint chain could serve as a landing place for many deliveries of raw materials or energy.

The easiest way to accomplish that would be to have the 'temporary carriers' attached by a short sequence of RNA complements. That way, there would not be much of a binding force attaching the carrier to the blueprint chain, and it could easily leave (and be replaced by another carrier).

Cassius may have also evolved proteins that could assist in the attachment and removal of carriers, to more effectively deliver the correct raw materials to the correct locations in the enzyme complex.

Match Sequence Length

In the previous examples, we used a five-molecule sequence to bind each type of molecule to a blueprint chain.

There is no particular reason to use that exact length, and in fact, different processes may have used different lengths of 'match sequence' to connect the carriers to the blueprint chain.

A longer match sequence would bind more firmly, which would be a good idea for an enzyme or coenzyme that needed to stick around for a long time. A short sequence would be better for raw material carriers that needed to 'pop' in and out very quickly.

Chains carrying raw materials might have had a match sequence as short as three molecules in length. That would still provide 64 possible choices for incoming raw materials, which might have been sufficient for Caleb's relatively simple metabolism.

The Saturated Socket Set

In fact, if Caleb managed to create a full set of carriers that filled up all possible permutations in its match chains, it could ensure that mutations would always produce an enzyme process that would do something. Some mutations would link up molecules and enzymes that wouldn't react, or that produced an end product that wasn't useful to the organism. But at least the 'saturated' system would never produce duds that wouldn't bind to any enzymes or raw materials.

The RNA World

True RNA, with its complementary base pairs, is quite an amazing molecule! Once some lucky Cassius was built from a modern set of nucleotides, it would have gained a wealth of new options for developing metabolic enzymes and physical structures, thanks to the rigidity and chain-matching properties of RNA. Cassius was still limited to proteins that were built from just four amino acids, so it seems likely that RNA could have filled in for many functions that proteins just couldn't handle, yet.

Of course, with Roscoe on the scene to replicate the RNA chains, it was not difficult at all to create life forms that used RNA, whenever it was most convenient. So it would be reasonable to call this post-Cassius period the true 'RNA world'.

This version of the RNA world is much easier to explain than one based solely on RNA, since there are also proteins available to provide structure and replication, and to fill in for the chemical tasks for which RNA is not well-suited.

Later on, of course, organisms were able to build proteins from a much wider range of amino acids. Once that happened, proteins again became the dominant form of cell enzymes. But some RNA enzymes still live on in every cell, and if our hypothesis is correct, RNA still serves many smaller functions as well.

Meanwhile, RNA and its complementary pairing are not done, just yet. There are still some more useful things that it will do for Cassius. But it's time to take a break from pure genetics, and look at some larger issues facing Cassius, as it made some additional changes that would help convert it into a true living organism.