| 
             Beckman Research Institute of the City or Hope Duarte, California 
              91010, USA  
            A. Introduction  
            While it is believed that life on this earth started as long ago 
              as a few billion or more years ago, a number of true innovations 
              in evolution appears to have been rather dismally small. Most of 
              the successful adaptive radiation of living organisms have apparently 
              been accomplished by extensive plagiarization of those preciously 
              few innovations via the mechanism of gene duplication [1]. Furthermore, 
              it appears that most of these true innovations have occurred at 
              the very beginning, before the division of prokaryotes from eukaryotes. 
              For example, nearly all the sugar-metabolizing enzymes appear to 
              have achieved their inviolable functional competence at the above-noted 
              early date. Natural selection has since been spinning wheels in 
              the air.  
             
              B. The Story of Glyceraldehyde 3-Phosphate Dehydrogenase  
            It would be noted in Fig. 1 that the 332-residue-Iong glyceraldehyde 
              3-phosphate dehydrogenase of the pig differs from the lobster enzyme 
              only at 86 positions. Inasmuch as vertebrates, or rather chordates 
              diverged from crustaceans roughly 500 million years ago, one can 
              conclude from the above and similar data on additional species that 
              this enzyme has been undergoing 1% amino acid sequence divergence 
              every 20 million years, thus accumulating 26% amino acid sequence 
              difference in 500 million years. If such a rate calculation can 
              be extended indefinitely, however, even at this snail's pace one 
              still expects this enzyme to have undergone 100% amino acid sequence 
              divergence in 2 billion years. Now 2 billion years ago would have 
              been about the time prokaryotes diverged from eukaryotes. Yet the 
              bacterial amino acid sequence from Bacillus' stearothermophilis 
              , also shown in Fig. 1, still maintains 177 out of the 332 sites 
              (53%) homology with the pig enzyme, and similar 180 out of 332 sites 
              homology with the lobster enzyme. In fact, there are 19 segments 
              (tripeptidic or longer), comprised of 92 residues in total, that 
              remain invariant in all three species. The longest conserved segment, 
              tridecapeptidic in its length, occupying 144th to 156th position, 
              represents the most critical of the substrate binding sites, 149th 
              Cys forming the thiol linkage with substrate intermediates [2]. 
              1ndeed, after achieving the appropriate degree of functional competence 
              2 billion or more years ago, glyceraldehyde 3-phosphate dehydrogenase 
              has not changed in its essence; evolutionary compatible amino acid 
              substitutions that accompanied successive diversification and speciation 
              merely symbolizing futile spinning of the wheel. Such a futility 
              is also evident in Fig. 1, for at the 14 positions, a eukaryote 
              (the pig) and a prokaryote (Bacillus stearothermophilis,) share 
              the identical residues, while the other eukaryote (the lobster) 
              is left out as an oddball; e.g., the third position of the pig and 
              the bacillus is Val, while that of the lobster is Ile. At these 
              and many other positions, the game of musical chairs  
               
             
             
             
               
              Fig.l. The amino acid sequences of glyceraldehyde 3-phosphate 
              dehydrogenases from three divergent species are compared- Bacillus 
              refers to Bacillus stearothermophilis. Discordant and identical 
              residues are shown slightly displaced from each other; discordant 
              ones are placed little above identical ones. Amino acid residues 
              of tripeptidic or longer conserved segments are shown in large capital 
              letter and segments are boxed in. Deleted residues are identified 
              as black boxes  
             
              have apparently been in play among a limited number of functionally 
              compatible amino acids. Analogous situations have been found with 
              regard to other sugar metabolizing enzylles, e.g., phosphoglycerate 
              kinase, triose isomerase etc. Furthermore, all these sugarmetabolizing 
              enzymes are constructed of the same mould. The amino terminal half 
              and the carboxyl terminal half forming two distinct domains, a cleft 
              between the two accommodating the substrate and the coenzyme. The 
              amino terminal half is for the coenzyme binding and the carboxyl 
              terminal half is for the substrate binding. Furthermore, Rossman 
              [3], among others, has pointed out that in the case of kinases, 
              the mononucleotide (e.g., A TP) binding site of the amino terminal 
              half is comprised of three p-sheet-forming segments and two alfa-helixforming 
              segments in the following order from the amino terminus; ß 
              alfa ß alfa ß. The dinucleotide (NAD or NADP) binding 
              site of dehydrogenases, on the other hand, evolved from the above 
              by duplication; thus, it can be expressed as 2 x ß alfa ß 
              alfa ß. Inasmuch as the most critical portion of the substrate 
              binding site evolved within the last segment of the duplicate ( 
              e.g., 144th to 156th tridecapeptide of Fig. 1), this intrusion of 
              the substrate binding active site into the dinucleotide binding 
              domain froze the dinucleotide binding domain of each enzyme as uniquely 
              its own. Thus, there is no more than 20% amino acid sequence homology 
              between dinucleotide binding sites of different enzymes in spite 
              of the fact that all are made of the same 2 x ß alfa ß 
              alfa ß mould. It would be recalled that within the same enzyme, 
              conservation of greater than 50% homology is the rule for the whole 
              enzyme, therefore, the dinucleotide binding amino terminal half. 
              At any rate, two notable facts emerge from the above. First, coding 
              sequences for sugar-metabolizing enzymes and probably for many other 
              enzymes (e.g., proteases) have already achieved the appropriate 
              degree of functional competence before the division of prokaryotes 
              from eukaryotes. Second, repetitions were the rule of the game from 
              the very onset of life on this earth; the dinucleotide binding site 
              evolving from the mononucleotide binding site by duplication, and 
              that the mononucleotide binding site it self likely to have evolved 
              by 2.5 times duplication of the one ß alfaor alfa ß 
              unit.  
             
              C. Ingeniousness Embodied in the First Set of Coding Sequences 
              that Were Repeats of Base Oligomers  
            Orgel's group [4] has shown that in the presence of Zn ion, nonenzymatic 
              synthesis of nucleic acids occurs in the proper 3'- to 5' linkage, 
              provided that there is a template. Thus, it would appear that what 
              was in short supply in the prebiotic world, before the emergence 
              of life on this earth was long templates from which copies can be 
              made. Put it more succinctly, the first primordial question is: 
              "How did oligonucleotides manage to extend themselves to become 
              worthy coding sequences?" There is one simple answer: One tandem 
              duplication of the preexisted oligomer assures indefinite extension 
              of that template, as illustrated at the top of Fig. 2. What if the 
              heptalleric template CAGCCTG duplicated to become tetradecaller? 
              After completion of its complementary strand, the two might pair 
              in the manner shown; second copy pairing with the first copy of 
              the complementary strand. The paired portion would now serve as 
              the primer for the next round of nucleic acid synthesis. At the 
              completion of the second round, the 14-ller template now becomes 
              21-ller. In this way, the indefinite extension of the primer is 
              assured a priori, a paired segment always serving as a primer for 
              the next round of nucleic acid synthesis. The above then is the 
              first reason for believing that the first set of coding sequences, 
              or rather all nucleic acids in the prebiotic world that presaged 
              the emergence of life, on this earth were all repeats of various 
              base oligomers. How accurate was a copying function of the nonenzymatic 
              nucleic acid replication? Of various nucleic acid polymerases known, 
              the most error prone appear to be reverse transcriptase of retroviruses, 
              for their error rate has been estimated as of the order of 10-3/base 
              pair/year [5]. This is one million times higher error rate compared 
              to DNA polymerases of vertebrates, and at this rate, there would 
              be 100% base sequence change everyone thousand years. The inherent 
              error rate of prebiotic, therefore,  
               
             
             
             
               
              Fig,2. Replication or nucleic acids is based upon the inherent 
              complementarity that exists between two purine-pyrimidine pairs; 
              A pairs with Tor U , while G pairs with C. Accordingly, provided 
              that there is a template (the heptamer CAGCCTG shown at the top), 
              mononucleotides would readily assemble themselvcs in the 3', 5' 
              linkagc to form a complementary strand in the presence of Zn [4] 
              as shown at the top. What was in short supply in the prebiotic world 
              then were templates of substantial lengths. What if the above noted 
              haptamer repeated itself in tandem or some of the base oligomers 
              were by chance tandem repeats (two copies of the shorter oligomer) 
              to begin with. It and its complemcntary strand can pair unequally 
              in the manner depicted at the middle. As a paired segment now functions 
              as a primer for the next round of nuclcic acid synthesis, infinite 
              extension of tcmplates is now assured. All it takes to start this 
              process is the one tandem duplication. of long oligomeric repeats 
              thus formed, those that evolved to be the first set of coding sequences 
              likely started from oligomeric units whosc numbers of bases werc 
              not multiples or three. There were two distinct advantages: (1) 
              They gave longer periodicities to polypeptide chains; e.g., repeats 
              of the base octamer would have given octapeptidic periodicity while 
              repeat or the base nonamer would have only the tripeptidic periodicity. 
              (2) They would have encoded polypeptide chains of identical periodicity 
              in all three reading frames. Within the periodic unit such repeats 
              could have given both alfa-helical segment and fisheet forming segment 
              as shown at the bottom Such alternating alfa / ß structures gave 
              rise to the mononucleotide binding site (3) which would have been 
              utilized immediately as parts or the primitive nucleic acid polymcrase. 
              Later they gave rise to A TP and NAD, NADP binding sites or many 
              enzymes as discussed in the text  
            nonenzymatic nucleic acid replication is expected to be higher 
              than the above-noted 10-3; as error prone 
              as they are, reverse transcriptases are, after all, the enzyme of 
              a sort. Prebiotic coding sequences had to contend with this very 
              high replication error rate and should still have been able to encode 
              polypeptide chains of potential function. Provided that the number 
              of bases in the oligomeric unit was not a multiple of three, repeats 
              of the base oligomer would have been very stable under this mostly 
              trying circumstance of constant base substitutions, deletions, and 
              insertions. This is also illustrated at the bottom of Fig. 2. Since 
              the monodecamer CGAAGCTGCTG cannot be divided by 3, three consecutive 
              copies of it translated in three different reading frames gives 
              the monodecapeptidic periodicity to a polypeptide chain. Contrast 
              the above to repeats of the base dodecamer, which can give only 
              the tetrapeptidic periodicity to the polypeptide chain. Furthermore, 
              since within a given reading frame three consecutive copies of the 
              monodecamer are to be translated in all three reading frames, such 
              repeats encode polypeptide chains of the identical periodicity in 
              all three reading frames. This openness of all three reading frames 
              give them a great deal of imperviousness to base substitutions, 
              deletions, and insertions. Repeats of the monodecamer shown at the 
              bottom of Fig. 2 encode both potentially IX-helix-forming segment 
              and potentially {3-sheet-forming segment within one monodecapeptidic 
              unit. In fact, sugarmetabolizing enzymes in general and phosphoglycerate 
              kinase in particular might have originally been encoded by repeats 
              of such a monodecamer, for AAGCTGCTG portion of the monodecameric 
              unit recur in many variations in the modern coding sequence (e.g., 
              of man) for phosphoglycerate kinase as already noted in our previous 
              paper [6]. 
              
              D. Repetition as the Essence of Coding Sequences and Musical 
              Compositions   
            Earth on which life has evolved has always been governed by the 
              hierarchy of periodicities. First, earth rotates on its own axis 
              to create days, while the moon's revolution around the earth gives 
              months, with neap tides and spring tides to be topped by years, 
              reflecting the earth's travel around the sun. It is small wonder 
              if life itself was born out of periodicities embodied in repetition 
              of unit base oligomers. Just as man eventually devised seconds, 
              minutes, and hours as arbitrary units of time measurement, one of 
              the periodicities embodied in polypeptide chains encoded by the 
              first set of codeing sequences that were oligomeric repeats must 
              soon have been chosen as the arbitrary time-measuring unit by the 
              ancestral biological clock. It now appears that this arbitrarily 
              chosen unit was the simplest dipeptidic periodicity. The polypeptide 
              chain encoded by per locus of Drosophila merlanogaster, fundamentally 
              involved in the expression of biological rhythms such as cicardian 
              behaviors and 55s rhythm of courtship song, is largely comprised 
              of the Gly- Thr dipeptidic repeats interspersed with short stretches 
              of its deviant Gly-Ser dipeptidic repeats, and that the homologous 
              gene encoding the polypeptide chain of the above-noted dipeptidic 
              periodicity is conserved in the mouse as well [7]. Observing the 
              per locus coding sequence, one notices that there have been numerous 
              neutral base substitutions, e.g., free base substitutions at the 
              redundant 3rd base position of glycine codons. Thus, it would appear 
              that the time-keeping was done from the beginning at the polypeptide 
              level rather than at the level of coding sequences, although the 
              initial periodicity of that polypeptide chain had to be the consequence 
              of its coding sequence being repeats of unit base oligomers. Now 
              we come to the origin shrouded in mist, of the prehistory of musical 
              compositions. Inasmuch as songs of canaries and skylarks are as 
              pleasing to our ears as they must be to their mates as well as to 
              themselves, it is clear that melodies as such are no human invention. 
              Furthermore, the vocal cord and other sound-making apparatuses of 
              our immediate relatives (e.g., Homo neanderthalensis ) appear to 
              have been rather underdeveloped. Accordingly, I wonder if early 
              Homo sapiens were capable even of imitating beautiful bird songs 
              noted above even if they wanted to. I would rather believe that 
              music as such were invented by primitive man as purely rhythmic 
              timekeeping device. For example, a hunting party intent on bringing 
              down a mammoth or two would have to coordinate activities of several 
              cohorts spread over a wide arc surrounding the herd of mammothes. 
              This, I suspect, was done by rhythmic beatings of hollowed tree 
              trunks for example; fast repetitions of a given rhythm conveying 
              an urgent need to close in whereas slow repetitions of the same 
              rhythm meaning cautious approach. It would thus appear that music, 
              too were initally born out of repetitious rendition. Even today 
              of wonderous melodies, music is still used as a time keeping device, 
              as in dancing and military parades. Rhythm of the latter, marching 
              music are essentially that of our heart beat. Our heart beats slow 
              in slumber and contemplation, while it beats uncontrollably fast 
              in fright. Rhythm of marching music should be somewhere in between 
              to indicate willingness either to go forth against formidable adversaries 
              or to defend against adversaries until death. Because of this homage 
              to the periodicity inherent both in coding sequence construction 
              and musical composition, the way was sought to interconvert the 
              two. The solution  
               
             
             
             
               
              Fig.3. An initial part of the treble-clef musical score of 
              Prelude No.1 from well-tempered clavichord by I. S. Bach, accompanied 
              by the base sequence and the amino acid sequence transcribable from 
              that base sequence  
            that we arrived at is to assign a space and a line on the octave 
              scale to each base in the ascending order of A, G, T, C in such 
              away so that the classical middle-C position would be occupied by 
              C on the line, A in the space occupying the position immediately 
              above [6]. In Fig.3, the treble-clef musical score of Prelude No.1 
              from well-tempered clavichord by J. S. Bach, the great master of 
              the early Baroque, is accompanied by the base sequence transcribed 
              from it according to the rule stated above. It would be noted that 
              with regard to every 4/4th or 8/8th time signature unit, the second 
              half is the exact repeat of the first half. Furthermore, until the 
              3rd line, each half is repeats of four notes, the four-note subunit 
              consisting of one 3/ 16th note and three 1/16th notes followed by 
              one 1/4th note and four 1/16th notes. Translated to base sequence, 
              the first time signature unit is comprised of four exact copies 
              of the AGCA tetramer followed by four copies of a single-base substituted 
              deviant of the above-noted tetramer A TCA. The AGCA recurrs again 
              8 times. Since 4 is not a multiple of three, these tetrameric repeats 
              are capable of giving the tetrapeptidic periodicity to a polypeptide 
              chain, but alas. chain terminators T AA and TAG come in pairs at 
              the extreme right of 2nd line. From the 4th line onward, one 3/16th 
              note and a quarter note are relegated to the base clef; therefore, 
              the treble-clef score becomes trimeric repeats. When translated, 
              this portion yields polyserine interspersed with teterailsoleucyne 
              and tetraarginine. In general, I found musical compositions of the 
              early Baroque period to be repeats of short base oligomers, these 
              oligomers being single-base substituted variants of each other. 
              Indeed, their resemblance to what I conceived as the first set of 
              coding sequences at the very beginning of life on this earth is 
              uncanny (see Fig.2). Most of the coding sequences possessed by modern 
              organisms  
               
             
             
             
               
              Fig.4. The heart of the coding segment for tyrosine kinase 
              domain of the human insulin receptor p-chain (8). Amino acid residues 
              of the two active site segments are shown in large capiraller rers. 
              This musical transformation for violin of the coding scgment is 
              in E minor, 4/4th or 8/8th time signature 
             have endured for hundreds of millions of years. In the case of 
              those for sugar-metabolizing enzymes, 2 billion years or more as 
              already noted. Thus, their original periodicities are obvious only 
              for discerning eyes. Not surprisingly, musical compositions of the 
              late Romantic period resemble these coding sequences. We have previously 
              shown that Frederic Chopin's nocturne Opus 55, No.1, resembled the 
              last exon for the largest subunit of RNA polymerase II [6]. In Fig.4, 
              the musical transformation for violin of the most functionally critical 
              part of the tyrosine kinase domain of the human insulin receptor 
              p-chain [8] is shown. This segment includes two active site segments 
              most critical for the assigned function of tyrosine kinase. Amino 
              acid residues of these two active site oligopeptides are shown in 
              large capital letters. It would be noted that nearly all of the 
              second active site is encoded by tandem repeats of the dodecamer 
              GTGGTCCTTTGG, thickly underlined by solid bars (2nd from the last 
              line of Fig. 4). Its two truncated derivatives at the top line of 
              Fig.4 are also underlined by solid bars. Other, more musically pertinent 
              repeats are also underlined by open bars and shaded bars; e.g., 
              the hexamer TCCCTG in 3rd and 4th lines of Fig. 4.  
             
              E. Summary  
            In prebiotic nucleic acid replication, templates appear to have 
              been in short supply. A single rOl1nd of tandem duplication of existing 
              oligomers assured progressive extension of templates to the length 
              adequate for encoding of polypeptide chains. Thus, the first set 
              of coding sequences had to be repeats of base oligomers encoding 
              polypeptide chains of various periodicities. On one hand, the readiness 
              of these periodical polypeptide chains to assume alfa helical and 
              / ß sheet secondary structures contributed to the extremely rapid 
              initial functional diversification of these polypeptide chains. 
              It would be recalled that most, if not all, of the sugar-metabolizing 
              enzymes had already achieved the inviolable functional competence 
              before the division of prokaryotes from eukaryotes. On the other 
              hand, a certain ( dipeptidic?) of the peptidic periodicities was 
              apparently chosen as the timekeeping unit by the biological clock. 
              Musical compositions too apparently evolved originally as a timekeeping 
              device. Accordingly, repetitiousness is evident in all musical compositions. 
              Evolution of musical compositions from the early Baroque to the 
              late Romantic parallels that of coding sequences from rather exact 
              repeats of base oligomers to more complex modern coding sequences 
              in which repetitious elements are less conspicuous and more varied. 
              Inasmuch as the earth is governed by the hierarchy ofperiodicities 
              (days, months and years), such reliance on periodicities is rather 
              expected.  
             
              References  
            1. Ohno S (1970) Evolution by gene duplication. Springer- Verlag, 
              Berlin Heidelberg New York 518  
              2. Dayhoff MO (ed) (1972) Atlas of protein sequences and structure. 
              National biomedical research foundation, Silver Springs, Maryland 
               
              3. Rossman MG (1981) Evolution of glycolytic enzymes. Philos Trans 
              R Soc Lond [BioI] B293.191-203  
              4. Bridson PK, Orgel LE (1980) Catalysis of ac curate poly (C)-directed 
              synthesis of 3'-5'linked oligoguanylates by Zn + 2. J Mol BioI 144.567-577 
               
              5. Gojobori T, Yokoyama S (1985) Rates of evolution of the retroviral 
              oncogene of Moloney murine sarcoma virus and of its cellular homologues. 
              Proc Natl Acad Sci USA 82:4198-4201  
              6. Ohno S, Ohno M (1985) The all-pervasive principle of repetitious 
              recurrence governs not only coding sequence construction but also 
              human endeavor in musical composition. J Immunogenet24:71-78  
              7. Shin H-S, Bargiello TA, Clark BT, Jackson FR, Young MW (1985) 
              An unusual coding sequence from a Drosophila clock gene is conserved 
              in vertebrates. Nature 317:445-451  
              8. Ulrich A, Bell J R, Chen EY, Herrera R, Petruzzelli LM, Dull 
              TJ, Gray A, Coussens L, Kiao Y -C, Tsubokawa M, Mason A, Seeburg 
              PH, Gunfeld C, Rosen OM, Ramachandran J (1985) Human insulin receptor 
              and its relationship to the tyrosine kinase family of oncogenes. 
              Nature 313.756-761  
           |