Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins.[1] These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

Review of RNA Production

edit
 
The pathway from DNA to protein expression fundamental to the central dogma of biology.[2]

In 1956, Francis Crick proposed what is now known as the "central dogma" of biology:[3]

DNA encodes the genetic information required for an organism to carry out its life cycle. In effect, DNA serves as the "hard drive" which stores genetic data. DNA is replicated and serves as its own template for replication. DNA forms a double helix structure and is a composed of a sugar-phosphate backbone and nitrogenous bases; this can be thought of as a ladder structure where the sides of the ladder are constructed of deoxyribose sugar and phosphate while the rungs of the ladder are composed of paired nitrogenous bases.[4] There are four bases in a DNA molecule: adenine (A), cytosine (C), thymine (T), and guanine (G). Nucleotides are a structural component of DNA and RNA, being made of a molecule of sugar and a molecule of phosphoric acid. The double helix structure of DNA is composed of two antiparallel strands which are oriented in opposite directions. DNA is composed of base pairs in which adenine pairs with thymine and guanine pairs with cytosine. While DNA serves as template for production of ribonucleic acid (RNA), RNA is usually responsible for making protein. The process of making RNA from DNA is called transcription. RNA uses a similar set of bases except that thymine is replaced with uracil. A group of enzymes called RNA polymerases (isolated by biochemists Jerard Hurwitz and Samuel B. Weiss) function in the presence of DNA. These enzymes produce RNA using segments of chromosomal DNA as a template. Unlike replication, where a complete copy of DNA is made, transcription copies only the gene that is to be expressed as a protein.[5]

Initially, it was thought that RNA served as a structural template for protein synthesis, essentially ordering amino acids by a series of cavities shaped specifically so that only specific amino acids would fit. Crick was not satisfied with this hypothesis given that the four bases of RNA are hydrophilic and that many amino acids prefer interactions with hydrophobic groups. Additionally, some amino acids are very structurally similar and Crick felt that accurate discrimination would not be possible given the similarities. Crick then proposed that prior to incorporation into proteins, amino acids are first attached to adapter molecules which have unique surface features that can bind to specific bases on the RNA templates.[5] These adapter molecules are called transfer RNA (tRNA).

Through a series of experiments involving E. coli and the T4 phage in 1960,[5] it was shown that messenger RNA (mRNA) carriers information from DNA to the ribosomal sites of protein synthesis. The tRNA-amino acid precursors are brought into position by ribosomes where they can read the information provided mRNA templates to synthesize protein.

RNA Splicing

edit

Creating a protein consists of two main steps: transcription of DNA into RNA and translation of RNA into protein. After DNA is transcribed into RNA, the molecule is known as pre-messenger RNA (mRNA) and it consists of exons and introns that can be split apart and rearranged in many different ways. Historically, exons are considered the coding sequence and introns are considered the “junk” DNA. Although this has been shown to be false, it is true that exons are often merged. Depending on the needs of the cell, regulatory mechanisms choose which exons, and sometimes introns, to join. This process of removing pieces of a pre- mRNA transcript and combining them with other pieces is called splicing. The human genome encodes approximately 25,000 genes but there are significantly more proteins produced. This is accomplished through RNA splicing. The exons of these 25,000 genes can be spliced in many different ways to create countless combinations of RNA transcripts and ultimately countless proteins. Normally, exons from the same pre-mRNA transcript are spliced together. However, occasionally gene products or pre-mRNA transcripts are spliced together so that exons from different transcripts are mixed together in a fusion product known as chimeric RNA. Chimeric RNA often incorporates exons from highly expressed genes,[1] but the chimeric transcript itself is usually expressed at low levels.

This chimeric RNA can then be translated into a fusion protein. Fusion proteins are very tissue-specific [1] and they are frequently associated with cancers such as colorectal, prostate,[6] and mesotheliomas.[7] They significantly exploit signal peptides and transmembrane proteins which can alter the localization of proteins, possibly contributing to the disease phenotype.

Discovery of Chimeric RNA

edit

One of the first studies to investigate the generation of chimeric RNA examined the fusion of the first three exons of a gene known as JAZF1 to the last 15 exons of a gene known as JJAZ1.[8] This exact transcript, and the resulting protein, was found specifically in endometrial tissue. While often found in endometrial cancers, these transcripts are expressed in normal tissue as well. Originally thought to be the result of chromosomal fusions, one group investigated whether this was accurate. Using Southern blotting and fluorescence in situ hybridization (FISH) on the genome, the researchers found no evidence of DNA rearrangement. They decided to investigate further by combining human endometrial cells with rhesus fibroblasts and found chimeric products containing sequences from both species. These data suggested that chimeric RNA is generated by splicing parts of genes together rather than chromosomal re-arrangements. They also performed mass spectrometry on the translated protein to verify that the chimeric RNA is translated into protein.

Recently, advances in next-generation sequencing have decreased the cost of sequencing significantly, allowing more RNAseq projects to be conducted. These RNAseq projects are able to detect novel RNA transcripts instead of the traditional microarray in which only known transcripts can be detected. Deep sequencing enables detection of transcripts even at very low levels. This has allowed researchers to detect many more chimeric RNAs and fusion proteins and has facilitated understanding their role in health and disease.

Chimeric protein products

edit

Numerous putative chimeric transcripts have been identified among the expressed sequence tags using high throughput RNA sequencing technology. In humans, chimeric transcripts can be generated in several ways such as trans-splicing of pre-mRNAs, RNA transcription runoff, from other errors in RNA transcription or they can also be the result of gene fusion following inter-chromosomal translocations or rearrangements. Among the few corresponding protein products that have been characterized so far, most result from chromosomal translocations and are associated with cancer. For instance, gene fusion in chronic myelogenous leukemia (CML) leads to an mRNA transcript that encompasses the 5′ end of the breakpoint cluster region protein (BCR) gene and the 3′ end of the Abelson murine leukemia viral oncogene homolog 1 (ABL) gene. Translation of this transcript results in a chimeric BCR–ABL protein that possesses increased tyrosine kinase activity. Chimeric transcripts characterize specific cellular phenotypes and are suspected to function not only in cancer, but also in normal cells. One example of a chimera in normal human cells is generated by trans-splicing of the 5′ exons of the JAZF1 gene on chromosome 7p15 and the 3′ exons of JJAZ1 (SUZ12) on chromosome 17q1. This chimeric RNA is translated in endometrial stroma cells and encodes an anti-apoptotic protein. Notable examples of chimeric genes in cancer are the fused BCR-ABL, FUS-ERG, MLL-AF6, and MOZ-CBP genes expressed in acute myeloid leukemia (AML), and the TMPRSS2-ETS chimera associated with overexpression of the oncogene in prostate cancer.[1]

Characteristics of chimeric proteins

edit

Frenkel-Morgenstern et al. have defined two main features of chimeric proteins. They have reported that chimeras exploit signal peptides and transmembrane domains to alter the cellular localization of the associated activities. Second, chimeras incorporate parental genes that are expressed at a high level.[1] A survey of all the functional domains in proteins encoded by chimeric transcripts demonstrated that chimeras contain complete protein domains significantly more often than in random data sets.[9]

Databases of chimeric transcripts

edit

Several databases have been constructed to incorporate chimeric transcripts from different resources using a variety of computational procedures:

Computational tools for detecting chimeric RNA

edit

Recent advances in high throughput transcriptome sequencing have paved the way for new computational methods for fusion discovery. The following are computational tools available for detection of fusion transcripts from RNA-Seq data:

  • Fusim is a software tool for simulating fusion transcripts for comprehensive comparison across fusion discovery methods.[17]
  • CRAC integrates genomic locations and local coverage to enable splice junction or fusion RNA predictions directly from RNA-seq read analysis.[18]
  • TopHat-Fusion can discover fusion products deriving from known genes, unknown genes and unannotated splice variants of known genes.[19]
  • FusionAnalyser is a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data.[20]
  • ChimeraScan offers discovery of chimeric transcription between two independent transcripts in high-throughput transcriptome sequencing data by providing features such as the ability to process long (>75 bp) paired-end reads, processing of ambiguously mapping reads and detection of reads spanning a fusion junction.[21]
  • FusionHunter identifies fusion transcripts from transcriptional analysis of paired-end RNA-seq reads.[22]
  • SplitSeek allows de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts.[23]
  • Trans-AB ySS is a de novo short-read transcriptome assembly and analysis pipeline that helps in the identification of known, new and alternative structures in expressed transcripts such as chimeric transcripts.[24]
  • FusionSeq identifies fusion transcripts from paired-end RNA-sequencing. It includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments.[25]

Some caution needs to be applied in the interpretation of trans-splicing events detected in high-throughput sequencing experiments as the reverse transcriptase enzymes ubiquitously used to determine RNA sequences are capable of introducing apparent trans-splicing events that were not present in the original RNA.[26][27] Some chimeric RNAs have been confirmed by other methods however.[28]

Chimeric RNA in lower eukaryotes

edit

Although rare in higher eukaryotes, various lower eukaryotes including nematodes and trypanosomes make extensive use of trans-splicing to generate chimeric RNAs.[29][30] In these organisms, splicing reactions between a protein coding RNA and a universal sequence result in the attachment of a splice-leader to the 5' end of the RNA, generating a functional messenger RNA. This system allows the use of operons - collections of protein-coding genes with a shared function that are simultaneously transcribed into a single RNA and then spliced into individual messenger RNAs, each of which codes for a single protein.

References

edit
  1. ^ a b c d e Frenkel-Morgenstern, M.; Lacroix, V.; Ezkurdia, I.; Levin, Y.; Gabashvili, A.; Prilusky, J.; del Pozo, A.; Tress, M.; Johnson, R.; Guigo, R.; Valencia, A. (15 May 2012). "Chimeras taking shape: Potential functions of proteins encoded by chimeric RNA transcripts". Genome Research. 22 (7): 1231–1242. doi:10.1101/gr.130062.111. PMC 3396365. PMID 22588898.
  2. ^ Horspool, Daniel (2008-11-28). "Central Dogma of Molecular Biochemistry with Enzymes". Retrieved 22 July 2013.
  3. ^ CRICK, FRANCIS (August 1970). "Central Dogma of Molecular Biology". Nature. 227 (5258): 561–563. Bibcode:1970Natur.227..561C. doi:10.1038/227561a0. PMID 4913914. S2CID 4164029.
  4. ^ Geer, R.C. "Introduction to Molecular Biology Resources". Retrieved 22 July 2013.
  5. ^ a b c James D. Watson; et al. (2007). Molecular biology of the gene (6th ed.). San Francisco, Calif.: Benjamin Cummings. ISBN 9780805395921.
  6. ^ Tomlins, SA; Mehra, R; Rhodes, DR; Smith, LR; Roulston, D; Helgeson, BE; Cao, X; Wei, JT; Rubin, MA; Shah, RB; Chinnaiyan, AM (Apr 1, 2006). "TMPRSS2:ETV4 gene fusions define a third molecular subtype of prostate cancer". Cancer Research. 66 (7): 3396–400. doi:10.1158/0008-5472.CAN-06-0168. PMID 16585160.
  7. ^ Panagopoulos, Ioannis; Thorsen, Jim; Gorunova, Ludmila; Micci, Francesca; Haugom, Lisbeth; Davidson, Ben; Heim, Sverre (1 August 2013). "RNA sequencing identifies fusion of the EWSR1 and YY1 genes in mesothelioma with t(14;22)(q32;q12)". Genes, Chromosomes and Cancer. 52 (8): 733–740. doi:10.1002/gcc.22068. PMID 23630070. S2CID 28377909.
  8. ^ Koontz, J. I.; Soreng, A. L.; Nucci, M.; Kuo, F. C.; Pauwels, P.; van den Berghe, H.; Cin, P. D.; Fletcher, J. A.; Sklar, J. (22 May 2001). "Frequent fusion of the JAZF1 and JJAZ1 genes in endometrial stromal tumors". Proceedings of the National Academy of Sciences. 98 (11): 6348–6353. Bibcode:2001PNAS...98.6348K. doi:10.1073/pnas.101132598. PMC 33471. PMID 11371647.
  9. ^ Frenkel-Morgenstern, M.; Valencia, A. (11 June 2012). "Novel domain combinations in proteins encoded by chimeric transcripts". Bioinformatics. 28 (12): i67–i74. doi:10.1093/bioinformatics/bts216. PMC 3371848. PMID 22689780.
  10. ^ Gorohovski, A.; Tagore, S.; Palande, V.; Malka, A.; Raviv-Shay, D.; Frenkel-Morgenstern, M. (4 January 2017). "ChiTaRS-3.1—the enhanced chimeric transcripts and RNA-seq database matched with protein–protein interactions". Nucleic Acids Research. 45 (D1): D790–D795. doi:10.1093/nar/gkw1127. PMC 5210585. PMID 27899596.
  11. ^ Frenkel-Morgenstern, M.; Gorohovski, A.; Vucenovic, D.; Maestre, L.; Valencia, A. (28 January 2015). "ChiTaRS 2.1--an improved database of the chimeric transcripts and RNA-seq data with novel sense-antisense chimeric RNA transcripts". Nucleic Acids Research. 43 (D1): D68–D75. doi:10.1093/nar/gku1199. PMC 4383979. PMID 25414346.
  12. ^ Frenkel-Morgenstern, M.; Gorohovski, A.; Lacroix, V.; Rogers, M.; Ibanez, K.; Boullosa, C.; Andres Leon, E.; Ben-Hur, A.; Valencia, A. (9 November 2012). "ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data". Nucleic Acids Research. 41 (D1): D142–D151. doi:10.1093/nar/gks1041. PMC 3531201. PMID 23143107.
  13. ^ Kim, P.; Yoon, S.; Kim, N.; Lee, S.; Ko, M.; Lee, H.; Kang, H.; Kim, J.; Lee, S. (11 November 2009). "ChimerDB 2.0--a knowledgebase for fusion genes updated". Nucleic Acids Research. 38 (Database): D81–D85. doi:10.1093/nar/gkp982. PMC 2808913. PMID 19906715.
  14. ^ Kim, Dae-Soo; Huh, Jae-Won; Kim, Heui-Soo (1 January 2007). "HYBRIDdb: a database of hybrid genes in the human genome". BMC Genomics. 8 (1): 128. doi:10.1186/1471-2164-8-128. PMC 1890557. PMID 17519042.
  15. ^ Novo, FJ; de Mendíbil, IO; Vizmanos, JL (Jan 26, 2007). "TICdb: a collection of gene-mapped translocation breakpoints in cancer". BMC Genomics. 8: 33. doi:10.1186/1471-2164-8-33. PMC 1794234. PMID 17257420.
  16. ^ Kong, F.; Zhu, J.; Wu, J.; Peng, J.; Wang, Y.; Wang, Q.; Fu, S.; Yuan, L.-L.; Li, T. (4 November 2010). "dbCRID: a database of chromosomal rearrangements in human diseases". Nucleic Acids Research. 39 (Database): D895–D900. doi:10.1093/nar/gkq1038. PMC 3013658. PMID 21051346.
  17. ^ Bruno, Andrew; Jeffrey C Miecznikowski; Maochun Qin; Jianmin Wang; Song Liu (January 2013). "FUSIM: a Software Tool for Simulating Fusion Transcripts". BMC Bioinformatics. 14 (13): 13. doi:10.1186/1471-2105-14-13. PMC 3637076. PMID 23323884.
  18. ^ Philippe, Nicolas; Salson, Mikaël; Commes, Thérèse; Rivals, Eric (1 January 2013). "CRAC: an integrated approach to the analysis of RNA-seq reads". Genome Biology. 14 (3): R30. doi:10.1186/gb-2013-14-3-r30. PMC 4053775. PMID 23537109.
  19. ^ Kim, Daehwan; Salzberg, Steven L (1 January 2011). "TopHat-Fusion: an algorithm for discovery of novel fusion transcripts". Genome Biology. 12 (8): R72. doi:10.1186/gb-2011-12-8-r72. PMC 3245612. PMID 21835007.
  20. ^ Piazza, R.; Pirola, A.; Spinelli, R.; Valletta, S.; Redaelli, S.; Magistroni, V.; Gambacorti-Passerini, C. (8 May 2012). "FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery". Nucleic Acids Research. 40 (16): e123. doi:10.1093/nar/gks394. PMC 3439881. PMID 22570408.
  21. ^ Iyer, M. K.; Chinnaiyan, A. M.; Maher, C. A. (11 August 2011). "ChimeraScan: a tool for identifying chimeric transcription in sequencing data". Bioinformatics. 27 (20): 2903–2904. doi:10.1093/bioinformatics/btr467. PMC 3187648. PMID 21840877.
  22. ^ Li, Y.; Chien, J.; Smith, D. I.; Ma, J. (5 May 2011). "FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq". Bioinformatics. 27 (12): 1708–1710. doi:10.1093/bioinformatics/btr265. PMID 21546395.
  23. ^ Ameur, Adam; Wetterbom, Anna; Feuk, Lars; Gyllensten, Ulf (1 January 2010). "Global and unbiased detection of splice junctions from RNA-seq data". Genome Biology. 11 (3): R34. doi:10.1186/gb-2010-11-3-r34. PMC 2864574. PMID 20236510.
  24. ^ Robertson, Gordon; Schein, Jacqueline; Chiu, Readman; Corbett, Richard; Field, Matthew; Jackman, Shaun D; Mungall, Karen; Lee, Sam; Okada, Hisanaga Mark; Qian, Jenny Q; Griffith, Malachi; Raymond, Anthony; Thiessen, Nina; Cezard, Timothee; Butterfield, Yaron S; Newsome, Richard; Chan, Simon K; She, Rong; Varhol, Richard; Kamoh, Baljit; Prabhu, Anna-Liisa; Tam, Angela; Zhao, YongJun; Moore, Richard A; Hirst, Martin; Marra, Marco A; Jones, Steven J M; Hoodless, Pamela A; Birol, Inanc (10 October 2010). "De novo assembly and analysis of RNA-seq data". Nature Methods. 7 (11): 909–912. doi:10.1038/nmeth.1517. PMID 20935650. S2CID 1034682.
  25. ^ Sboner, Andrea; Habegger, Lukas; Pflueger, Dorothee; Terry, Stephane; Chen, David Z; Rozowsky, Joel S; Tewari, Ashutosh K; Kitabayashi, Naoki; Moss, Benjamin J; Chee, Mark S; Demichelis, Francesca; Rubin, Mark A; Gerstein, Mark B (1 January 2010). "FusionSeq: a modular framework for finding gene fusions by analyzing Paired-End RNA-Sequencing data". Genome Biology. 11 (10): R104. doi:10.1186/gb-2010-11-10-r104. PMC 3218660. PMID 20964841.
  26. ^ Houseley, J; Tollervey, D (Aug 18, 2010). "Apparent non-canonical trans-splicing is generated by reverse transcriptase in vitro". PLOS ONE. 5 (8): e12271. Bibcode:2010PLoSO...512271H. doi:10.1371/journal.pone.0012271. PMC 2923612. PMID 20805885.
  27. ^ McManus, CJ; Duff, MO; Eipper-Mains, J; Graveley, BR (Jul 20, 2010). "Global analysis of trans-splicing in Drosophila". Proceedings of the National Academy of Sciences of the United States of America. 107 (29): 12975–9. Bibcode:2010PNAS..10712975M. doi:10.1073/pnas.1007586107. PMC 2919919. PMID 20615941.
  28. ^ Djebali, S; Lagarde, J; Kapranov, P; Lacroix, V; Borel, C; Mudge, JM; Howald, C; Foissac, S; Ucla, C; Chrast, J; Ribeca, P; Martin, D; Murray, RR; Yang, X; Ghamsari, L; Lin, C; Bell, I; Dumais, E; Drenkow, J; Tress, ML; Gelpí, JL; Orozco, M; Valencia, A; van Berkum, NL; Lajoie, BR; Vidal, M; Stamatoyannopoulos, J; Batut, P; Dobin, A; Harrow, J; Hubbard, T; Dekker, J; Frankish, A; Salehi-Ashtiani, K; Reymond, A; Antonarakis, SE; Guigó, R; Gingeras, TR (2012). "Evidence for transcript networks composed of chimeric RNAs in human cells". PLOS ONE. 7 (1): e28213. Bibcode:2012PLoSO...728213D. doi:10.1371/journal.pone.0028213. PMC 3251577. PMID 22238572.
  29. ^ Blumenthal, T (Jun 25, 2005). "Trans-splicing and operons". WormBook: 1–9. doi:10.1895/wormbook.1.5.1. PMID 18050426.
  30. ^ Michaeli, S (Apr 2011). "Trans-splicing in trypanosomes: machinery and its impact on the parasite transcriptome". Future Microbiology. 6 (4): 459–74. doi:10.2217/fmb.11.20. PMID 21526946.