|  
             Beckman Research Institute of The City of Hope, Duarte, California 
              91010, USA  
            A Introduction  
            The generally held belief that any gene whose expression is precisely 
              regulated in development ought to perform an indispensable function 
              to the host organism is not quite correct and, in fact, has no solid 
              foundation. It should be recalled that the DNA replication mechanism 
              of modern organisms has developed a high degree of precision (the 
              error rate is 10 -10 per 
              base pair per replication or thereabout) and that whatever damage 
              sustained by DNA is effectively rectified by multitudes of DNA repair 
              mechanisms. Accordingly, even dispensable pcptide chains such as 
              fibrinopeptides A and B do not change their amino acid sequences 
              very rapidly, a 1 % change in their amino acid sequences taking 
              roughly 1.0- 1.6 million years [ 1 ]. Under these circumstances, 
              a gene which has lost its usefulness to the host organism does not 
              disappear quite so readily. The half-life of an enzyme gene that 
              has become redundant has been estimatcd as 50 million years [2]. 
             
            B. Redundant and Useless Genes May Persist for 50 Million Years 
              or More  
            By becoming tetraploid, an organism initially gains four alleles 
              at every gene locus. Subsequently, each set of four homologous chromosomes 
              differentiates into two pairs, thus completing the process of diploidization. 
              At this stage, freshly diploidized tetraploid species are endowed 
              with twice the number of gene loci when compared with their diploid 
              counterparts. This is the stage, at which trout and salmon of the 
              teleost family Salmonidae, whitefish of the family Coregonidae, 
              and grayling of the family Thymallidae find themselves [3]. A few 
              of those duplicated, and therefore redundant, genes manage to acquire 
              anew role; e.g., of all the vertebrates, only diploidized tetraploid 
              teleost fish are endowed with liver-specific lactate dehydrogenase 
              (LDH), in addition to the customary skeletal muscle and heart LDH. 
              The mechanism of gene duplication as the means to acquire new genes 
              with previously nonexistent functions, however, is very inefficient, 
              having a very low success ratio: the phrase Salvandrum paucitas, 
              dammnundrum multitudo gives ample testimony to its high failure 
              ratio. Accordingly, older diploidized tetraploids of the teleost 
              family Cyprinidae as well as Catostomidae have lost progressively 
              larger numbers of these redundant, duplicated loci by silencing 
              mutations. Since the fossil record gives the origin of these diploidized 
              tetraploids, Ferris and Whitt [2] were able to calculate the average 
              half-life of enzyme loci that became redundant as 50 million years, 
              It should be noted here that this halflife refers to the average 
              time needed for half of the redundant enzyme loci to lose their 
              assigned functions. After losing their assigned functions, these 
              redundant enzyme loci may continue to code for functionless polypeptide 
              chains. The case in point is the murine Sip locus, situated in the 
              middle of the major histocompatibility (MHC) antigen gene complex 
              region of the mouse genome. While the neighboring Ss locus specifies 
              C4 (complement 4 of antibody-mediated lysis), a protein specified 
              by the Sip locus has already lost its assigned function as C4 owing 
              to accumulated mutations. Yet, this SIp locus was androgen dependent 
              in most mouse strains, and operator constitutive mutation of this 
              locus was found in wild mice [4]. It would thus appear that a substantial 
              portion of the redundant gene loci may continue to specify functionless 
              proteins even after 100 million years of independence from natural 
              selection. Although mammal-Iike reptiles were already an independent 
              lineage at the time of the dinosaurs, mammals as we know them came 
              into being only 70 million years ago. This should make us realize 
              the unreality of the statement that any gene loci with precisely 
              regulated expression must be indispensable to the host organism. 
              The fact is that genes that have outlived their usefulness may linger 
              on for 50-100 million years.  
            C. Most Oncogenes are Evolutionary Relics of the Cell Autonomous 
              Stage of Development  
            Although multitudes of cellular oncogenes perform divergent functions 
              (some of their products are found in the nucleus, while others are 
              found inside the plasma membrane), it is clear that all of them 
              function as intracellular cell growth factors. In unicellular eukaryotes 
              such as baker's yeast as well as in many of the multicellular eukaryotes 
              with underdeveloped circulatory systems such as insects, these intracellular 
              growth factors have apparently played a vital role, for it should 
              be recalled that embryonic development of insects is still largely 
              a cell autonomous process as discussed in detail elsewhere [5]. 
              With the advent of the cardiovascular system, development of vertebrates 
              became a centrally controlled affair via multitudes of peptide and 
              steroid hormones and cellular autonomy was suppressed. While intracellular 
              growth factors of earlier times served as ancestors of these peptide 
              hormones as well as of their plasma membrane receptors, they themselves 
              largely became evolutionary relics whose functions have become redundant 
              [5].  
            D. Near Immortality of Certain Oncogenes Conferred on Them by 
              Their Original Construction  
            If cellular oncogenes became redundant at the onset of vertebrate 
              evolution, most of them should have become silent by now, in spite 
              of their long estimated half-life of 50 million years, for primitive 
              vertebrates were already in evidence more than 300 million years 
              ago. However, the continued performance of essential functions need 
              not be invoked to explain this persistence for more than 300 million 
              years. The view first expressed in 1981 [6] that all the coding 
              sequences originally were repeats of base oligomers has found increasing 
              support from independent sources [7-9]. Provided that the number 
              of bases in the oligomeric unit is not a multiple of three, coding 
              sequences made of oligomeric repeats are inherently impervious to 
              normally very damaging base substitutions, deletions, and insertions, 
              thus possessing a near immortality. In this kind of oligomeric repeat, 
              three consecutive copies of the oligomeric unit translated in three 
              different reading frames gives the unit periodicity to their polypeptide 
              chains: while nonameric repeats, the unit sequence being a multiple 
              of three, can give only tripeptide periodicity to their peptide 
              chains, three consecutive copies of decameric repeats encode the 
              decapeptidic periodicity to its polypeptide chain. Thus, if one 
              reading frame of this kind of oligomeric repeat is open, the other 
              two are automatically open as well. It follows then that the potentially 
              most damaging base substitution that changes an amino acid-specifying 
              codon to the chain terminator ( e.g., Trp cod on TGG to chain-terminating 
              TAG or TGA) merely silences one of the three open reading frames. 
              Deletions or insertions of bases that are not multiples of three 
              are usually as damaging, for resulting frame shifts alter downstream 
              amino acid sequences and most often result in premature chain terminations. 
              In this type of oligomeric repeat, such insertions and deletions 
              are of no consequence either, for downstream amino acid sequences 
              are not at all affected by frame shifts. In a previous paper [10], 
              we have analyzed the published coding sequence of human c-myc gene 
              [II] in detail. Within the 5' half of c-myc coding sequence, we 
              identified one each of recurring base tetradecamer, duodecamer, 
              and two monodecamers. The significance of this becomes clear once 
              it is realized that if c-myc is a unique sequence sensu stricto, 
              even a given base decamer is expected to recur only once every 1048576 
              bases. Yet, here we found a recurring base tetradecamer within a 
              mere 687-base 5' half c-myc coding sequence. Furthermore, recurring 
              duodecamer and monodecamers were found to represent slightly modified 
              parts of the tetradecameric sequence GGCCGCCGCCTCCT. Thus, it was 
              concluded that the entire 5' half of the c-myc coding sequence originated 
              from repeats of the previously noted base tetradecamer. Since 14 
              is not a multiple of 3, three consecutive copies of it translated 
              in three different reading frames would have given the following 
              tetradecapeptidic periodicity to the original c-myc polypeptide 
              chain, at least the amino terminal half of it  
             Gly Arg Arg Leu Leu  
              GGC CGC CGC CTC CT/G  
               
              Ala Ala Ala Ser Trp  
              GCC GCC GCC TCC T/GG  
               
              Pro Pro Pro Pro  
              CCG CCG CCT CCT  
               
              Indeed, the human c-myc coding sequence, at least the 5' half of 
              it, apparently inherited a measure of immortality from its original 
              construction, for we found two long, alternative open reading frames, 
              one covering the first 30 I bases and the other from the 599th to 
              952nd bases. When this region of human c-myc coding sequence [ II] 
              was compared with the corresponding region of v-myc coding sequence 
              of avian retrovirus MC29 [12], we found that the two differed from 
              each other not so much by amino acid substitutions as by five stretches 
              of insertions and two stretches of deletions. Thus, c-myc's inherent 
              imperviousness to deletions and insertions was shown [ 10].  
            E. Resurrection of a Silenced v-src Gene by Utilization of its 
              Alternative Open Reading Frame  
            A measure of immortality inherited by some of the oncogenes from 
              their original construction was indeed shown by the following experiment 
              of Mardon and Varmus [ 13]. First, they established the rat cell 
              line that was transformed by the integration into the genome of 
              a single copy of strain 877 Rous sarcoma virus v-src coding sequence. 
              One of the defective mutations sustaincd by the integrated v-src 
              that deprived from the rat cell line of the transformed phenotype 
              was identified as an insertion of a single base A between 146th 
              Glu cod on GAA and 147th Glu codon GAG. A resulting frame shift 
              created anew chain terminator 51 bases further downstream, thus, 
              silencing a mutated v-src [ 13]. The surprise was the second mutation 
              that resurrected a silenced v-src as a transforming gene. This second 
              event was an insertion of a duplicated 242-base segment into the 
              position between T and GG of the 148th Trp cod on in the original 
              reading frame. This 242-base segment started from GAT representing 
              the 68th Asp codon in the original reading frame and ended in T 
              of 148th Trp also in the original reading frame, thus including 
              a previously inserted A. Since the inserted segment is now translated 
              in an alternative reading frame, the resulting double frame shifts 
              restored the original reading frame, starting from GAG of the 147th 
              Glu of the wild-type v-src and downward which in the resurrected 
              v-src became the 228th Glu. Such restoration of function by an insertion 
              in the midst of the polypeptide chain of an 81-residue new amino 
              acid sequence is hardly believable, unless anew sequence specified 
              by a repeated coding segment translated in an alternative open reading 
              frame resembles parts of the preexisting amino acid sequence. Such 
              a resemblance, in turn, is expected only if the coding sequence 
              itself still maintains a sufficient vestige of the ancestral construction; 
              i.e., the coding sequence originating from repeats of a base oligomer 
              in which the number of bases in the oligomeric unit was not a multiple 
              of three. Indeed, the existence of so long an alternative open reading 
              frame itself is a reflection of the v-src coding sequence's ultimate 
              derivation from oligomeric repeats, the number of bases in the oligomeric 
              unit not being a multiple of three. As might be expected, when translated 
              in an alternative open reading frame, amino acid residues 1-8 encoded 
              by a duplicated 242-base segment were Thr-ProSe-Arg-Arg-Arg-Ser-Val. 
              In the standard amino acid sequence of Rous sarcoma V-src, the very 
              similar nonapeptide,  
              Thr-Pro-Ser-(Gln)-Arg-Arg-Arg-Ser-Leu customarily occupies positions 
              10-18 [5].  
            F. Summary  
            Contrary to the popularly held view, genes that have lost their 
              usefulness to the host organism may continue to encode proteins 
              for 50 million years or longer. Accordingly, precisely regulated 
              expression of genes can not be taken as proof of their indispensability. 
              My view is that multitudes of oncogenes of vertebrates are evolutionary 
              relics harking back to the days of invertebrate ancestors in which 
              embryogenesis was still a cell autonomous process. Parts of certain 
              oncogene coding sequences originated from repeats of base oligomers 
              whose numbers of bases were not multiples of three. Thus, these 
              segments are still endowed with a measure of immortality in that 
              they are impervious to normally very deleterious base substitutions, 
              insertions, and deletions.  
            References  
            I. Dayhoff MO (ed) (1972) Atlas of protein sequences and structure. 
              National biomedical research foundation, Silver Springs. Maryland 
               
              2. Ferris SO, Whitt GS (1977) Loss of duplicated gene expression 
              after polyploidization. Nature 265:258-260  
              3. Ohno S (1970) Evolution by gene duplication. Springer, Heidelberg 
              Berlin New York  
              4. Hansen TH, Shreffler OC (1976) Characterizalion of a constitutive 
              variant of lhe murine serum prolein allolype, Slp. J Immunol 117: 
              1507-1513  
              5. Ohno S ( 1984) Repeats of base oligomers as the primordial coding 
              sequence of lhe primeval earlh and their vestiges in modern genes. 
              J Mol EvoI20.313-321 (1984)  
              6. Ohno S (1981) Original domain for the serum albumin family arose 
              from repealed sequences. Proc Natl Acad Sci USA 79.1999 –2002  
              7. Blake C ( 1983) Exon -present from the beginning? Nature 306: 
              535-537  
              8. Go M (1983) Modular structural unils, exonsand functions in chicken 
              lysozyme. Proc Natl Acad Sci USA 80: 1964-1968  
              9. Alexander F, Young PR, Tilghman SM (1984) Evolution of the albumin: 
              a-feloprotein ancestral gene from the amplification of a 27 nucleolide 
              sequence. J Mol BioI 173.159-174  
              10. Ohno S, Yazaki A (1983) Simple construction of human c-myc gene 
              implicated in B-cell neoplasms and ils relalionship with avian v-myc 
              and human lymphokines. Scand J ImmunoI18.373-388  
              11. Watt R, Stanton LW, Marcu KB, Gallo RC. Croce CN, Tovera G (1983) 
              Nucleotide sequence of cloned cONA of human c-myc oncogene. Nalure 
              303.725- 727  
              12. Alitalo K, Bishop MJ. Smilh OH, Chen E, Colby WW, Levinson AO 
              (1983) Nucleotide sequence of the v-myc oncogene of avian relrovirus 
              MC29. Proc Natl Acad Sci USA 80. 100-105  
              13. Mardon G, Vermus HE (1983) Frameshift and inlragenic suppressor 
              mutations in a Rous sarcoma provirus suggest SRC encodes two proteins. 
              CeI132:871-879  
           |