Open main menu

The Kozak consensus sequence (Kozak consensus or Kozak sequence) is a nucleic acid motif that functions as the translation initiation site in most mRNA (link) transcripts.[1] The sequence was named after the scientist who discovered it, Marilyn Kozak. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation.[1] As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen.[1][2][3]

The Kozak Sequence was determined by sequencing of 699 vertebrate mRNAs and verified by site-directed mutagenesis.[4] While initially limited to a subset of vertebrates (i.e. human, cow, cat, dog, chicken, guinea pig, hamster, mouse, pig, rabbit, sheep, and Xenopus), subsequent studies confirmed its conservation in higher eukaryotes generally.[1] The sequence was defined as 5'-(gcc)gccRccAUGG-3' where:[4]

  1. The underlined nucleotides indicate the translation start codon, coding for Methionine.
  2. upper-case letters indicate highly conserved bases, i.e. the 'AUGG' sequence is constant or rarely, if ever, changes, with the exception being the IUPAC ambiguity code [5]
  3. 'R' indicates that a purine (adenine or guanine) is always observed at this position (with adenine being claimed by Kozak to be more frequent)
  4. a lower-case letter denotes the most common base at a position where the base can nevertheless vary
  5. the sequence in parentheses (gcc) is of uncertain significance.

IntroductionEdit

This sequence on an mRNA molecule is recognized by the ribosome as the translational start site, from which a protein is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site (RBS), that being either the 5' cap of a messenger RNA or an Internal ribosome entry site (IRES).

In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence.[6] Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, GUG is used as an initiation codon, but methionine is still the first amino acid as it is the met-tRNA in the initiation complex that binds to the mRNA.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength.[7] There is also evidence that a G in the -6 position is important in the initiation of translation.[2]

There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence.[8] For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.

 
A sequence logo showing the most conserved bases around the initiation codon from 10 000 human mRNAs.

MutationsEdit

Research has shown that a mutation of G—>C in the -6 position of the β-globin gene (β+45; human) disrupted the haematological and biosynthetic phenotype function. This was the first mutation found in the Kozak sequence. It was found in a family from the Southeast Italy and they suffered from thalassaemia intermedia.[2]

Variations in the consensus sequenceEdit

The Kozak consensus has been variously described as:[9]

     65432-+234
(gcc)gccRccAUGG (Kozak 1987)
       AGNNAUGN
        ANNAUGG
        ACCAUGG (Spotts et al., 1997, mentioned in Kozak 2002)
     GACACCAUGG (H. sapiens HBB, HBD, R. norvegicus Hbb, etc.)
Kozak-like sequences in various eukaryotes
Biota Phylum Consensus sequences
Vertebrate (Kozak 1987) gccRccATGG[4]
Fruit fly (Drosophila spp.) Arthropoda atMAAMATGamc[10]
Budding yeast (Saccharomyces cerevisiae) Ascomycota aAaAaAATGTCt[11]
Slime mold (Dictyostelium discoideum) Amoebozoa aaaAAAATGRna[12]
Ciliate Ciliophora nTaAAAATGRct[12]
Malarial protozoa (Plasmodium spp.) Apicomplexa taaAAAATGAan[12]
Toxoplasma (Toxoplasma gondii) Apicomplexa gncAaaATGg[13]
Trypanosomatidae Euglenozoa nnnAnnATGnC[12]
Terrestrial plants
acAACAATGGC[14]
Microalga (Dunaliella salina) Chlorophyta gccaagATGgcg[15]

See alsoEdit

ReferencesEdit

  1. ^ a b c d Kozak, M. (February 1989). "The scanning model for translation: an update". The Journal of Cell Biology. 108 (2): 229–241. doi:10.1083/jcb.108.2.229. ISSN 0021-9525. PMC 2115416. PMID 2645293.
  2. ^ a b c De Angioletti M, Lacerra G, Sabato V, Carestia C (2004). "Beta+45 G --> C: a novel silent beta-thalassaemia mutation, the first in the Kozak sequence". Br J Haematol. 124 (2): 224–31. doi:10.1046/j.1365-2141.2003.04754.x. PMID 14687034.
  3. ^ Hernández, Greco; Osnaya, Vincent G.; Pérez-Martínez, Xochitl (2019-07-25). "Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes". Trends in Biochemical Sciences. doi:10.1016/j.tibs.2019.07.001. ISSN 0968-0004. PMID 31353284.
  4. ^ a b c Kozak M (October 1987). "An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs". Nucleic Acids Res. 15 (20): 8125–8148. doi:10.1093/nar/15.20.8125. PMC 306349. PMID 3313277.
  5. ^ Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences, NC-IUB, 1984.
  6. ^ Kozak M (1984). "Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo". Nature. 308 (5956): 241–246. Bibcode:1984Natur.308..241K. doi:10.1038/308241a0. PMID 6700727.
  7. ^ Kozak M (1986). "Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes". Cell. 44 (2): 283–92. doi:10.1016/0092-8674(86)90762-2. PMID 3943125.
  8. ^ Dunston JA, Hamlington JD, Zaveri J, et al. (September 2004). "The human LMX1B gene: transcription unit, promoter, and pathogenic mutations". Genomics. 84 (3): 565–76. doi:10.1016/j.ygeno.2004.06.002. PMID 15498463.
  9. ^ Tang, Sen-Lin; Chang, Bill C.H.; Halgamuge, Saman K. (August 2010). "Gene functionality's influence on the second codon: A large-scale survey of second codon composition in three domains". Genomics. 96 (2): 92–101. doi:10.1016/j.ygeno.2010.04.001. PMID 20417269.
  10. ^ Cavener DR (February 1987). "Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates". Nucleic Acids Res. 15 (4): 1353–61. doi:10.1093/nar/15.4.1353. PMC 340553. PMID 3822832.
  11. ^ Hamilton R, Watanabe CK, de Boer HA (April 1987). "Compilation and comparison of the sequence context around the AUG startcodons in Saccharomyces cerevisiae mRNAs". Nucleic Acids Res. 15 (8): 3581–93. doi:10.1093/nar/15.8.3581. PMC 340751. PMID 3554144.
  12. ^ a b c d Yamauchi K (May 1991). "The sequence flanking translational initiation site in protozoa". Nucleic Acids Res. 19 (10): 2715–20. doi:10.1093/nar/19.10.2715. PMC 328191. PMID 2041747.
  13. ^ Seeber, F. (1997). "Consensus sequence of translational initiation sites from Toxoplasma gondii genes". Parasitology Research. 83 (3): 309–311. doi:10.1007/s004360050254. PMID 9089733.
  14. ^ Lütcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA (January 1987). "Selection of AUG initiation codons differs in plants and animals". EMBO J. 6 (1): 43–8. doi:10.1002/j.1460-2075.1987.tb04716.x. PMC 553354. PMID 3556162.
  15. ^ Kadkhodaei, Saeid; Hashemi, Farahnaz S. Golestan; Rezaei, Morvarid Akhavan; Abbasiliasi, Sahar; Shun, Tan Joo; Memari, Hamid R. Rajabi; Moradpour, Mahdi; Ariff, Arbakariya B. (2016-07-05). "Cis/transgene optimization: systematic discovery of some key gene expression elements integrating bioinformatics and computational biology". bioRxiv 061945.

Further readingEdit