Latent semantic structure indexing

Latent semantic structure indexing (LaSSI) is a technique for calculating chemical similarity derived from latent semantic analysis (LSA).

LaSSI was developed at Merck & Co. and patented in 2007[1] by Richard Hull, Eugene Fluder, Suresh Singh, Robert Sheridan, Robert Nachbar and Simon Kearsley.

Overview edit

LaSSI is similar to LSA in that it involves the construction of an occurrence matrix from a corpus of items and the application of singular value decomposition to that matrix to derive latent features. What differs is that the occurrence matrix represents the frequency of two- and three-dimensional chemical descriptors (rather than natural language terms) found within a chemical database of chemical structures. This process derives latent chemical structure concepts that can be used to calculate chemical similarities and structure–activity relationships for drug discovery.

References edit

  • Hull, R.D., Fluder, E.M., Singh, S.B., Nachbar, R.B., Sheridan, R.P. and Kearsley, S.K. (2001) "Latent semantic structure indexing (LaSSI) for defining chemical similarity." J Med Chem, 2001 Apr 12;44(8):1177–84. doi:10.1021/jm000393c
  • Hull, R.D., Singh, S.B., Nachbar, R.B., Sheridan, R.P., Kearsley, S.K. and Fluder, E.M. (2001) "Chemical similarity searches using latent semantic structure indexing (LaSSI) and comparison to TOPOSIM." J Med Chem, 2001 Apr 12;44(8):1185–91.
  • Singh, S.B., Sheridan, R.P., Fluder, E.M. and Hull, R.D. (2001) "Mining the chemical quarry with joint chemical probes: an application of latent semantic structure indexing (LaSSI) and TOPOSIM (Dice) to chemical database mining." J Med Chem, 2001 May 10;44(10):1564–75.