Open main menu

Cheminformatics (also known as chemoinformatics, chemioinformatics and chemical informatics) is the use of computer and informational techniques applied to a range of problems in the field of chemistry. These in silico techniques are used, for example, in pharmaceutical companies and academic settings in the process of drug discovery[1]. These methods can also be used in chemical and allied industries in various other forms.



The term chemoinformatics was defined by F.K. Brown[2][3] in 1998:

Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.

Since then, both spellings have been used, and some have evolved to be established as Cheminformatics,[4] while European Academia settled in 2006 for Chemoinformatics.[5] The recent establishment of the Journal of Cheminformatics is a strong push towards the shorter variant.


Cheminformatics combines the scientific working fields of chemistry, computer science and information science for example in the areas of topology, chemical graph theory, information retrieval and data mining in the chemical space.[6][1][7][8][9][10][11][12][13][14] Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.


Storage and retrievalEdit

The primary application of cheminformatics is in the storage, indexing and search of information relating to compounds. The efficient search of such stored information includes topics that are dealt with in computer science as data mining, information retrieval, information extraction and machine learning. Related research topics include:

File formatsEdit

The in silico representation of chemical structures uses specialized formats such as the XML-based Chemical Markup Language or SMILES. These representations are often used for storage in large chemical databases. While some formats are suited for visual representations in 2 or 3 dimensions, others are more suited for studying physical interactions, modeling and docking studies.

Virtual librariesEdit

Chemical data can pertain to real or virtual molecules. Virtual libraries of compounds may be generated in various ways to explore chemical space and hypothesize novel compounds with desired properties.

Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm.[15] This was done by using cheminformatic tools to train transition probabilities of a Markov chain on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.

Virtual screeningEdit

In contrast to high-throughput screening, virtual screening involves computationally screening in silico libraries of compounds, by means of various methods such as docking, to identify members likely to possess desired properties such as biological activity against a given target[1][6][7][8][9][12]. In some cases, combinatorial chemistry is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or natural products is screened.

Quantitative structure-activity relationship (QSAR)Edit

This is the calculation of quantitative structure–activity relationship and quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to chemometrics. Chemical expert systems are also relevant, since they represent parts of chemical knowledge as an in silico representation. There is a relatively new concept of matched molecular pair analysis or prediction-driven MMPA which is coupled with QSAR model in order to identify activity cliff.[16]

See alsoEdit


  1. ^ a b c Srinivasan, Bharath; Zhou, Hongyi; Kubanek, Julia; Skolnick, Jeffrey (2014-04-26). "Experimental validation of FINDSITEcomb virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders". Journal of Cheminformatics. 6 (1): 16. doi:10.1186/1758-2946-6-16. ISSN 1758-2946. PMC 4038399. PMID 24936211.
  2. ^ F.K. Brown (1998). Chapter 35. Chemoinformatics: What is it and How does it Impact Drug Discovery. Annual Reports in Med. Chem. Annual Reports in Medicinal Chemistry. 33. pp. 375–384. doi:10.1016/S0065-7743(08)61100-8. ISBN 978-0-12-040533-6.
  3. ^ Brown, Frank (2005). "Editorial Opinion: Chemoinformatics – a ten year update". Current Opinion in Drug Discovery & Development. 8 (3): 296–302.
  4. ^ Cheminformatics or Chemoinformatics ?
  5. ^ Obernai Declaration
  6. ^ a b Roy, Ambrish; Srinivasan, Bharath; Skolnick, Jeffrey (2015-08-12). "PoLi: A Virtual Screening Pipeline Based on Template Pocket and Ligand Similarity". Journal of Chemical Information and Modeling. 55 (8): 1757–1770. doi:10.1021/acs.jcim.5b00232. ISSN 1549-9596. PMC 4593500. PMID 26225536.
  7. ^ a b Srinivasan, Bharath; Tonddast-Navaei‡, Sam; Roy§, Ambrish; Zhou, Hongyi; Skolnick, Jeffrey (2018-09-07). "Chemical space of Escherichia coli dihydrofolate reductase inhibitors: New approaches for discovering novel drugs for old bugs". Medicinal Research Reviews. doi:10.1002/med.21538. ISSN 0198-6325. PMID 30192413.
  8. ^ a b Srinivasan, Bharath; Tonddast-Navaei, Sam; Skolnick, Jeffrey (September 2017). "Pocket detection and interaction-weighted ligand-similarity search yields novel high-affinity binders for Myocilin-OLF, a protein implicated in glaucoma". Bioorganic & Medicinal Chemistry Letters. 27 (17): 4133–4139. doi:10.1016/j.bmcl.2017.07.035. ISSN 0960-894X. PMC 5568477. PMID 28739043.
  9. ^ a b Srinivasan, Bharath; Zhou, Hongyi; Mitra, Sreyoshi; Skolnick, Jeffrey (October 2016). "Novel small molecule binders of human N-glycanase 1, a key player in the endoplasmic reticulum associated degradation pathway". Bioorganic & Medicinal Chemistry. 24 (19): 4750–4758. doi:10.1016/j.bmc.2016.08.019. ISSN 0968-0896. PMC 5015769. PMID 27567076.
  10. ^ Gasteiger J.(Editor), Engel T.(Editor): Chemoinformatics : A Textbook. John Wiley & Sons, 2004, ISBN 3-527-30681-1
  11. ^ A.R. Leach, V.J. Gillet: An Introduction to Chemoinformatics. Springer, 2003, ISBN 1-4020-1347-7
  12. ^ a b Snell, Terry W.; Johnston, Rachel K.; Srinivasan, Bharath; Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey (2016-08-02). "Repurposing FDA-approved drugs for anti-aging therapies". Biogerontology. 17 (5–6): 907–920. doi:10.1007/s10522-016-9660-x. ISSN 1389-5729. PMC 5065615. PMID 27484416.
  13. ^ Alexandre Varnek and Igor Baskin (2011). "Chemoinformatics as a Theoretical Chemistry Discipline". Molecular Informatics. 30 (1): 20–32. doi:10.1002/minf.201000100. PMID 27467875.
  14. ^ Barry A. Bunin (Author), Brian Siesel (Author), Guillermo Morales (Author), Jürgen Bajorath (Author): Chemoinformatics: Theory, Practice, & Products. Springer, 2006, ISBN 978-1402050008
  15. ^ Kutchukian, Peter; Lou, David; Shakhnovich, Eugene (2009). "FOG: Fragment Optimized Growth Algorithm for the de Novo Generation of Molecules occupying Druglike Chemical". Journal of Chemical Information and Modeling. 49 (7): 1630–1642. doi:10.1021/ci9000458. PMID 19527020.
  16. ^ Sushko, Yurii; Novotarskyi, Sergii; Körner, Robert; Vogt, Joachim; Abdelaziz, Ahmed; Tetko, Igor V. (2014). "Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process". Journal of Cheminformatics. 6 (1): 48. doi:10.1186/s13321-014-0048-0. PMC 4272757. PMID 25544551.

External linksEdit