About

Introduction

SimilarityLab enables the use of the molecular similarity predictions using USRCAT [1]. This rotation invariant molecular similarity measure can be used to identify candidate molecules similar to a given query. USRCAT is an extension of USR[2] and UFSRAT[3].

The SimilarityLab website allows two things:

  1. Discovery of commercially available similars.
  2. Compound target prediction.
Further details of the above, and their operation can be found bellow. Source code can be found in the following repository: https://github.com/stevenshave/SimilarityLab

1. Discovery of commercially available similars.

We have often found collaborators want a quick way to explore SAR (Structural Activity Relationships) around a compound they have of particular interest. This may be from their internal screens, or a compound from literature. The application of USRCAT to the eMolecules database allows sourcing of similar compounds for purchase. Internally, the eMolecules database is kept up-to-date, ensuring that compounds are purchable (although even with frequent updating, we find some suppliers do not hold constant stock of their advertised compounds). We periodically update SimilarityLab's eMolecules datasets, and hold 3D representations of low-energy conformers for each molecule. These low energy conformers are generated using the protocol outlined by Ebjner[4] (see repository for code listing). For cheminformatics tasks requiring querying against specific small molecule conformations, we advise users seek other tools and experts capable of running the analysis, which should be simple to do using code in the accompanying SimilarityLab source repository, or simply using RDKit. Conformers are used to pre-generate descriptors which are then scored against the user supplied query molecule after conformer and descriptor generation. The top N similars are returned to the user, where N is definable at job definition.

The authors of SimilarityLab will always endevour to refresh the websites eMolecules version periodically.

2. Compound target prediction.

Hits from phenotypic screening campaigns require target deconvolution, determining protein targets or pathways likely to be perturbed. Utilising the same approach as 1 (Discovery of commercially available similars), the USRCAT molecular similarity technique is applied to "active" molecules within the most recent release of ChEMBL[5]. Active in this sense isdefined as having an IC50/KD of minimally 10 µM against protein targets. The top 100 similar active molecules then have their activities against all protein targets counted. The protein targets are then sorted by the number of times hit by this 100 compound similar list and this list of targets returned to the user.

Funding

SimilarityLab was made possible by the Welcome Trust Institutional Strategic Support Fund and was developed by Dr. Steven Shave during his time in the Auer Lab at the University of Edinburgh. This site and associated code is available under open source compatible license and available from https://github.com/stevenshave/SimilarityLab.