In biochemistry, a hypothetical protein is a protein whose existence has been predicted, but for which there is a lack of experimental evidence that it is expressed in vivo. Sequencing of several genomes has resulted in numerous predicted open reading frames to which functions cannot be readily assigned. These proteins, either orphan or conserved hypothetical proteins, make up an estimated 20% to 40% of proteins encoded in each newly sequenced genome. The real evidences for the hypothetical protein functioning in the metabolism of the organism can be predicted by comparing its sequence or structure homology by considering the conserved domain analysis. [1] Even when there is enough evidence that the product of the gene is expressed, by techniques such as microarray and mass spectrometry, it is difficult to assign a function to it given its lack of identity to protein sequences with annotated biochemical function. Nowadays, most protein sequences are inferred from computational analysis of genomic DNA sequence. Hypothetical proteins are created by gene prediction software during genome analysis. When the bioinformatic tool used for the gene identification finds a large open reading frame without a characterised homologue in the protein database, it returns "hypothetical protein" as an annotation remark.

The function of a hypothetical protein can be predicted by domain homology searches with various confidence levels.[2] Conserved domains are available in the hypothetical proteins which need to be compared with the known family domains by which hypothetical protein could be classified into particular protein families even though they have not been in vivo investigated. The function of hypothetical protein could also be predicted by homology modelling, in which hypothetical protein has to align with known protein sequence whose three dimensional structure is known and by modelling method if structure predicted then the capability of hypothetical protein to function could be ascertained computationally.[2][3][4] Further, approaches to annotate function to hypothetical proteins include determination of 3-dimensional structure of these proteins by structural genomics initiatives, understanding the nature and mode of prosthetic group/metal ion binding, fold similarity with other proteins of known functions and annotating possible catalytic site and regulatory site.[5] Structure prediction with biochemical function assessment by screening for various substrate is another promising approach to annotate function[2]

See also edit

References edit

  1. ^ Galperin MY (2001). "Conserved 'hypothetical' proteins: new hints and new puzzles". Comparative and Functional Genomics. 2 (1): 14–18. doi:10.1002/cfg.66. PMC 2447192. PMID 18628897.
  2. ^ a b c Srinivasan B; et al. (2015). "Prediction of substrate specificity and preliminary kinetic characterization of the hypothetical protein PVX_123945 from Plasmodium vivax". Exp. Parasitol. 151–152: 56–63. doi:10.1016/j.exppara.2015.01.013. PMID 25655405.
  3. ^ P S Kewate; R C Urade; D G Gore; M A Soni; A P Kopulwar (2015). "In silico enzyme function prediction in hypothetical proteins of Mycobacterium bovis AF2122/97". Journal of Pharmacy Research. 9 (3): 182–189.
  4. ^ Dilip Gore (2009). "In silico Prediction of Structure and Enzymatic Activity for Hypothetical Proteins of Shigellaflexneri. Biofrontiers". Biofrontiers. 1 (2): 1–10.
  5. ^ Eisenstein E; et al. (2000). "Biological function made crystal clear - annotation of hypothetical proteins via structural genomics". Curr Opin Biotechnol. 11 (1): 25–30. doi:10.1016/j.exppara.2015.01.013. PMID 10679350.

External links edit