Introduction
SimilarityLab enables the use of the molecular similarity predictions using USRCAT [1]. This rotation
invariant molecular similarity measure can be used to identify candidate molecules similar to a given query.
USRCAT is an extension of USR[2] and UFSRAT[3].
The SimilarityLab website allows two things:
- Discovery of commercially available similars.
- Compound target prediction.
Further details of the above, and their operation can be found bellow. Source code can be found in the following
repository:
https://github.com/stevenshave/SimilarityLab
1. Discovery of commercially available similars.
We have often found collaborators want a quick way to explore SAR (Structural Activity Relationships) around a
compound they have of particular interest. This may be from their internal screens, or a compound from
literature. The application of USRCAT to the eMolecules database allows sourcing of similar compounds for
purchase. Internally, the eMolecules database is kept up-to-date, ensuring that compounds are purchable
(although even with frequent updating, we find some suppliers do not hold constant stock of their advertised
compounds). We periodically update SimilarityLab's eMolecules datasets, and hold 3D representations of
low-energy conformers for each molecule. These low energy conformers are generated using the protocol outlined
by Ebjner[4] (see repository for code listing). For cheminformatics tasks requiring querying against specific
small molecule conformations, we advise users seek other tools and experts capable of running the analysis,
which should be simple to do using code in the accompanying SimilarityLab source repository, or simply using
RDKit. Conformers are used to pre-generate descriptors which are then scored against the user supplied query
molecule after conformer and descriptor generation. The top N similars are returned to the user, where N is
definable at job definition.
The authors of SimilarityLab will always endevour to refresh the websites eMolecules version periodically.
2. Compound target prediction.
Hits from phenotypic screening campaigns require target deconvolution, determining protein targets or pathways
likely to be perturbed. Utilising the same approach as 1 (Discovery of commercially available similars), the
USRCAT molecular similarity technique is applied to "active" molecules within the most recent release of
ChEMBL[5]. Active in this sense isdefined as having an IC50/KD of minimally 10 µM against protein targets.
The top 100 similar active molecules then have their activities against all protein targets counted. The protein
targets are then sorted by the number of times hit by this 100 compound similar list and this list of targets
returned to the user.
Funding
SimilarityLab was made possible by the Welcome Trust Institutional Strategic Support Fund and was developed by Dr. Steven Shave during his time in the Auer Lab at the University of Edinburgh. This site and associated code is available under open source compatible license and available from
https://github.com/stevenshave/SimilarityLab.
References
- [1] Schreyer, Adrian M., and Tom Blundell. "USRCAT: real-time ultrafast shape recognition with
pharmacophoric constraints." Journal of cheminformatics 4.1 (2012): 1-12.
- [2] Ballester, Pedro J., and W. Graham Richards. "Ultrafast shape recognition to search compound
databases
for similar molecular shapes." Journal of computational chemistry 28.10 (2007): 1711-1723.
- [3] Shave, Steven, et al. "UFSRAT: ultra-fast shape recognition with atom types–the discovery of novel
bioactive small molecular scaffolds for FKBP12 and 11βHSD1." PloS one 10.2 (2015): e0116570.
-
[4] Ebejer, Jean-Paul, Garrett M. Morris, and Charlotte M. Deane. "Freely available conformer generation
methods: how good are they?." Journal of chemical information and modeling 52.5 (2012): 1146-1158.
-
[5] Gaulton, Anna, et al. "ChEMBL: a large-scale bioactivity database for drug discovery." Nucleic acids
research 40.D1 (2012): D1100-D1107.