# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Target prediction

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2004 / 2003 / 2002 /

## 2014

• Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.
Reker, Daniel and Rodrigues, Tiago and Schneider, Petra and Schneider, Gisbert
PNAS, 2014, 111(11), 4067-4072
PMID: 24591595     doi: 10.1073/pnas.1320001111

De novo molecular design and in silico prediction of polypharmacological profiles are emerging research topics that will profoundly affect the future of drug discovery and chemical biology. The goal is to identify the macromolecular targets of new chemical agents. Although several computational tools for predicting such targets are publicly available, none of these methods was explicitly designed to predict target engagement by de novo-designed molecules. Here we present the development and practical application of a unique technique, self-organizing map-based prediction of drug equivalence relationships (SPiDER), that merges the concepts of self-organizing maps, consensus scoring, and statistical analysis to successfully identify targets for both known drugs and computer-generated molecular scaffolds. We discovered a potential off-target liability of fenofibrate-related compounds, and in a comprehensive prospective application, we identified a multitarget-modulating profile of de novo designed molecules. These results demonstrate that SPiDER may be used to identify innovative compounds in chemical biology and in the early stages of drug discovery, and help investigate the potential side effects of drugs and their repurposing options.

## 2013

• Drug Promiscuity in PDB: Protein Binding Site Similarity Is Key.
Haupt, V Joachim and Daminelli, Simone and Schroeder, Michael
PloS one, 2013, 8(6), e65894
PMID: 23805191     doi: 10.1371/journal.pone.0065894

Drug repositioning applies established drugs to new disease indications with increasing success. A pre-requisite for drug repurposing is drug promiscuity (polypharmacology) - a drug's ability to bind to several targets. There is a long standing debate on the reasons for drug promiscuity. Based on large compound screens, hydrophobicity and molecular weight have been suggested as key reasons. However, the results are sometimes contradictory and leave space for further analysis. Protein structures offer a structural dimension to explain promiscuity: Can a drug bind multiple targets because the drug is flexible or because the targets are structurally similar or even share similar binding sites? We present a systematic study of drug promiscuity based on structural data of PDB target proteins with a set of 164 promiscuous drugs. We show that there is no correlation between the degree of promiscuity and ligand properties such as hydrophobicity or molecular weight but a weak correlation to conformational flexibility. However, we do find a correlation between promiscuity and structural similarity as well as binding site similarity of protein targets. In particular, 71% of the drugs have at least two targets with similar binding sites. In order to overcome issues in detection of remotely similar binding sites, we employed a score for binding site similarity: LigandRMSD measures the similarity of the aligned ligands and uncovers remote local similarities in proteins. It can be applied to arbitrary structural binding site alignments. Three representative examples, namely the anti-cancer drug methotrexate, the natural product quercetin and the anti-diabetic drug acarbose are discussed in detail. Our findings suggest that global structural and binding site similarity play a more important role to explain the observed drug promiscuity in the PDB than physicochemical drug properties like hydrophobicity or molecular weight. Additionally, we find ligand flexibility to have a minor influence.

• ChemMapper: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method
Gong, J and Cai, C and Liu, X and Ku, X and Jiang, H and Gao, D and Li, H
Bioinformatics (Oxford, England), 2013, 29(14), 1827-1829
PMID: 23712658     doi: 10.1093/bioinformatics/btt270

SUMMARY: ChemMapper is an online platform to predict polypharmacology effect and mode of action for small molecules based on 3D similarity computation. ChemMapper collects >350 000 chemical structures with bioactivities and associated target annotations (as well as >3 000 000 non-annotated compounds for virtual screening). Taking the user-provided chemical structure as the query, the top most similar compounds in terms of 3D similarity are returned with associated pharmacology annotations. ChemMapper is designed to provide versatile services in a variety of chemogenomics, drug repurposing, polypharmacology, novel bioactive compounds identification and scaffold hopping studies. AVAILABILITY: http://lilab.ecust.edu.cn/chemmapper/. CONTACT: xfliu@ecust.edu.cn or hlli@ecust.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

• HitPick: a web server for hit identification and target prediction of chemical screenings
Liu, X and Vogt, I and Haque, T and Campillos, M
Bioinformatics (Oxford, England), 2013, 29(15), 1910-1912
PMID: 23716196     doi: 10.1093/bioinformatics/btt303

MOTIVATION: High-throughput phenotypic assays reveal information about the molecules that modulate biological processes, such as a disease phenotype and a signaling pathway. In these assays, the identification of hits along with their molecular targets is critical to understand the chemical activities modulating the biological system. Here, we present HitPick, a web server for identification of hits in high-throughput chemical screenings and prediction of their molecular targets. HitPick applies the B-score method for hit identification and a newly developed approach combining 1-nearest-neighbor (1NN) similarity searching and Laplacian-modified naïve Bayesian target models to predict targets of identified hits. The performance of the HitPick web server is presented and discussed. AVAILABILITY: The server can be accessed at http://mips.helmholtz-muenchen.de/proj/hitpick. CONTACT: monica.campillos@helmholtz-muenchen.de.

• In silico Target Fishing for the Potential Targets and Molecular Mechanisms of Baicalein as an Antiparkinsonian Agent: Discovery of the Protective Effects on NMDA Receptor-Mediated Neurotoxicity.
Gao, Li and Fang, Jian-Song and Bai, Xiao-Yu and Zhou, Dan and Wang, Yi-Tao and Liu, Ai-Lin and Du, Guan-Hua
Chemical biology & drug design, 2013, 81(6), 675-687
PMID: 23461900     doi: 10.1111/cbdd.12127

The flavonoid baicalein has been proven effective in animal models of parkinson's disease; however, the potential biological targets and molecular mechanisms underlying the antiparkinsonian action of baicalein have not been fully clarified. In the present study, the potential targets of baicalein were predicted by in silico target fishing approaches including database mining, molecular docking, structure-based pharmacophore searching, and chemical similarity searching. A consensus scoring formula has been developed and validated to objectively rank the targets. The top two ranked targets catechol-O-methyltransferase (COMT) and monoamine oxidase B (MAO-B) have been proposed as targets of baicalein by literatures. The third-ranked one (N-methyl-d-aspartic acid receptor, NMDAR) with relatively low consensus score was further experimentally tested. Although our results suggested that baicalein significantly attenuated NMDA-induced neurotoxicity including cell death, intracellular nitric oxide (NO) and reactive oxygen species (ROS) generation, extracellular NO reduction in human SH-SY5Y neuroblastoma cells, baicalein exhibited no inhibitory effect on [(3) H]MK-801 binding study, indicating that NMDAR might not be the target of baicalein. In conclusion, the results indicate that in silico target fishing is an effective method for drug target discovery, and the protective role of baicalein against NMDA-induced neurotoxicity supports our previous research that baicalein possesses antiparkinsonian activity.

• TargetHunter: An In Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database
Wang, Lirong and Ma, Chao and Wipf, Peter and Liu, Haibin and Su, Weiwei and Xie, Xiang-Qun
The AAPS journal, 2013, 15(2), 395-406
PMID: 23292636     doi: 10.1208/s12248-012-9449-z

Target identification of the known bioactive compounds and novel synthetic analogs is a very important research field in medicinal chemistry, biochemistry, and pharmacology. It is also a challenging and costly step towards chemical biology and phenotypic screening. In silico identification of potential biological targets for chemical compounds offers an alternative avenue for the exploration of ligand-target interactions and biochemical mechanisms, as well as for investigation of drug repurposing. Computational target fishing mines biologically annotated chemical databases and then maps compound structures into chemogenomical space in order to predict the biological targets. We summarize the recent advances and applications in computational target fishing, such as chemical similarity searching, data mining/machine learning, panel docking, and the bioactivity spectral analysis for target identification. We then described in detail a new web-based target prediction tool, TargetHunter ( http://www.cbligand.org/TargetHunter ). This web portal implements a novel in silico target prediction algorithm, the Targets Associated with its MOst SImilar Counterparts, by exploring the largest chemogenomical databases, ChEMBL. Prediction accuracy reached 91.1% from the top 3 guesses on a subset of high-potency compounds from the ChEMBL database, which outperformed a published algorithm, multiple-category models. TargetHunter also features an embedded geography tool, BioassayGeoMap, developed to allow the user easily to search for potential collaborators that can experimentally validate the predicted biological target(s) or off target(s). TargetHunter therefore provides a promising alternative to bridge the knowledge gap between biology and chemistry, and significantly boost the productivity of chemogenomics researchers for in silico drug design and discovery.

• Chemogenomics Approaches to Rationalizing the Mode-of-Action of Traditional Chinese and Ayurvedic Medicines
Mohd Fauzi, Fazlin and Koutsoukas, Alexios and Lowe, Robert and Joshi, Kalpana and Fan, Tai-Ping and Glen, Robert C and Bender, Andreas
Journal of chemical information and modeling, 2013, 53(3), 661-673

Traditional Chinese medicine (TCM) and Ayurveda have been used in humans for thousands of years. While the link to a particular indication has been established in man, the mode-of-action (MOA) of the formulations often remains unknown. In this study, we aim to understand the MOA of formulations used in traditional medicine using an in silico target prediction algorithm, which aims to predict protein targets (and hence MOAs), given the chemical structure of a compound. Following this approach we were able to establish several links between suggested MOAs and experimental evidence. In particular, compounds from the 'tonifying and replenishing medicinal' class from TCM exhibit a hypoglycemic effect which can be related to activity of the ingredients against the Sodium-Glucose Transporters (SGLT) 1 and 2 as well as Protein Tyrosine Phosphatase (PTP). Similar results were obtained for Ayurvedic anticancer drugs. Here, both primary anticancer targets (those directly involved in cancer pathogenesis) such as steroid-5-alpha-reductase 1 and 2 were predicted as well as targets which act synergistically with the primary target, such as the efflux pump P-glycoprotein (P-gp). In addition, we were able to elucidate some targets which may point us to novel MOAs as well as explain side effects. Most notably, GPBAR1, which was predicted as a target for both 'tonifying and replenishing medicinal' and anticancer classes, suggests an influence of the compounds on metabolism. Understanding the MOA of these compounds is beneficial as it provides a resource for NMEs with possibly higher efficacy in the clinic than those identified by single-target biochemical assays.

• A combined molecular docking-based and pharmacophore-based target prediction strategy with a probabilistic fusion method for target ranking.
Li, Guo-Bo and Yang, Ling-Ling and Xu, Yong and Wang, Wen-Jing and Li, Lin-Li and Yang, Sheng-Yong
Journal of molecular graphics & modelling, 2013, 44, 278-285
PMID: 23933279     doi: 10.1016/j.jmgm.2013.07.005

Herein, a combined molecular docking-based and pharmacophore-based target prediction strategy is presented, in which a probabilistic fusion method is suggested for target ranking. Establishment and validation of the combined strategy are described. A target database, termed TargetDB, was firstly constructed, which contains 1105 drug targets. Based on TargetDB, the molecular docking-based target prediction and pharmacophore-based target prediction protocols were established. A probabilistic fusion method was then developed by constructing probability assignment curves (PACs) against a set of selected targets. Finally the workflow for the combined molecular docking-based and pharmacophore-based target prediction strategy was established. Evaluations of the performance of the combined strategy were carried out against a set of structurally different single-target compounds and a well-known multi-target drug, 4H-tamoxifen, which results showed that the combined strategy consistently outperformed the sole use of docking-based and pharmacophore-based methods. Overall, this investigation provides a possible way for improving the accuracy of in silico target prediction and a method for target ranking.

• Chemically Advanced Template Search (CATS) for Scaffold-Hopping and Prospective Target Prediction for 'Orphan' Molecules.
Reutlinger, Michael and Koch, Christian P and Reker, Daniel and Todoroff, Nickolay and Schneider, Petra and Rodrigues, Tiago and Schneider, Gisbert
Molecular Informatics, 2013, 32(2), 133-138
PMID: 23956801     doi: 10.1002/minf.201200141

• Identification of distant drug off-targets by direct superposition of binding pocket surfaces.
Schumann, Marcel and Armen, Roger S
PloS one, 2013, 8(12), e83533
PMID: 24391782     doi: 10.1371/journal.pone.0083533

Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target ("distant off-targets"). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target ("distant off-target").

• Virtual affinity fingerprints for target fishing: a new application of drug profile matching.
Peragovics, Agnes and Simon, Zoltán and Tombor, László and Jelinek, Balázs and Hári, Péter and Czobor, Pál and Málnási-Csizmadia, András
Journal of chemical information and modeling, 2013, 53(1), 103-113
PMID: 23215025     doi: 10.1021/ci3004489

We recently introduced Drug Profile Matching (DPM), a novel virtual affinity fingerprinting bioactivity prediction method. DPM is based on the docking profiles of ca. 1200 FDA-approved small-molecule drugs against a set of nontarget proteins and creates bioactivity predictions based on this pattern. The effectiveness of this approach was previously demonstrated for therapeutic effect prediction of drug molecules. In the current work, we investigated the applicability of DPM for target fishing, i.e. for the prediction of biological targets for compounds. Predictions were made for 77 targets, and their accuracy was measured by Receiver Operating Characteristic (ROC) analysis. Robustness was tested by a rigorous 10-fold cross-validation procedure. This procedure identified targets (N

• Drug repositioning by structure-based virtual screening.
Ma, Dik-Lung and Chan, Daniel Shiu-Hin and Leung, Chung-Hang
Chemical Society reviews, 2013, 42(5), 2130-2141
PMID: 23288298     doi: 10.1039/c2cs35357a

Approved drugs have favourable or validated pharmacokinetic properties and toxicological profiles, and the repositioning of existing drugs for new indications can potentially avoid expensive costs associated with early-stage testing of the hit compounds. In recent years, technological advances in virtual screening methodologies have allowed medicinal chemists to rapidly screen drug libraries for therapeutic activity against new biomolecular targets in a cost-effective manner. This review article outlines the basic principles and recent advances in structure-based virtual screening and highlights the powerful synergy of in silico techniques in drug repositioning as demonstrated in several recent reports.

• Network-based drug repositioning.
Wu, Zikai and Wang, Yong and Chen, Luonan
Molecular bioSystems, 2013, 9(6), 1268-1281
PMID: 23493874     doi: 10.1039/c3mb25382a

Network-based computational biology, with the emphasis on biomolecular interactions and omics-data integration, has had success in drug development and created new directions such as drug repositioning and drug combination. Drug repositioning, i.e., revealing a drug's new roles, is increasingly attracting much attention from the pharmaceutical community to tackle the problems of high failure rate and long-term development in drug discovery. While drug combination or drug cocktails, i.e., combining multiple drugs against diseases, mainly aims to alleviate the problems of the recurrent emergence of drug resistance and also reveal their synergistic effects. In this paper, we unify the two topics to reveal new roles of drug interactions from a network perspective by treating drug combination as another form of drug repositioning. In particular, first, we emphasize that rationally repositioning drugs in the large scale is driven by the accumulation of various high-throughput genome-wide data. These data can be utilized to capture the interplay among targets and biological molecules, uncover the resulting network structures, and further bridge molecular profiles and phenotypes. This motivates many network-based computational methods on these topics. Second, we organize these existing methods into two categories, i.e., single drug repositioning and drug combination, and further depict their main features by three data sources. Finally, we discuss the merits and shortcomings of these methods and pinpoint some future topics in this promising field.

• Progress in the Visualization and Mining of Chemical and Target Spaces
Medina-Franco, José L and Aguayo-Ortiz, Rodrigo
Molecular Informatics, 2013, 32(11-12), 942-953
doi: 10.1002/minf.201300041

Chemogenomics is a growing field that aims to integrate the chemical and target spaces. As part of a multi-disciplinary effort to achieve this goal, computational methods initially developed to visualize the chemical space of compound collections and mine single-target structure-activity relationships, are being adapted to visualize and mine complex relationships in chemogenomics data sets. Similarly, the growing evidence that clinical effects are many times due to the interaction of single or multiple drugs with multiple targets, is encouraging the development of novel methodologies that are integrated in multi-target drug discovery endeavors. Herein we review advances in the development and application of approaches to generate visual representations of chemical space with particular emphasis on methods that aim to explore and uncover relationships between chemical and target spaces. Also, progress in the data mining of the structure-activity relationships of sets of compounds screened across multiple targets are discussed in light of the concept of activity landscape modeling.

• Prediction of polypharmacological profiles of drugs by the integration of chemical, side effect, and therapeutic space.
Cheng, Feixiong and Li, Weihua and Wu, Zengrui and Wang, Xichuan and Zhang, Chen and Li, Jie and Liu, Guixia and Tang, Yun
Journal of chemical information and modeling, 2013, 53(4), 753-762
PMID: 23527559     doi: 10.1021/ci400010x

Prediction of polypharmacological profiles of drugs enables us to investigate drug side effects and further find their new indications, i.e. drug repositioning, which could reduce the costs while increase the productivity of drug discovery. Here we describe a new computational framework to predict polypharmacological profiles of drugs by the integration of chemical, side effect, and therapeutic space. On the basis of our previous developed drug side effects database, named MetaADEDB, a drug side effect similarity inference (DSESI) method was developed for drug-target interaction (DTI) prediction on a known DTI network connecting 621 approved drugs and 893 target proteins. The area under the receiver operating characteristic curve was 0.882 $\pm$ 0.011 averaged from 100 simulated tests of 10-fold cross-validation for the DSESI method, which is comparative with drug structural similarity inference and drug therapeutic similarity inference methods. Seven new predicted candidate target proteins for seven approved drugs were confirmed by published experiments, with the successful hit rate more than 15.9%. Moreover, network visualization of drug-target interactions and off-target side effect associations provide new mechanism-of-action of three approved antipsychotic drugs in a case study. The results indicated that the proposed methods could be helpful for prediction of polypharmacological profiles of drugs.

• Drug target prediction and repositioning using an integrated network-based approach.
Emig, Dorothea and Ivliev, Alexander and Pustovalova, Olga and Lancashire, Lee and Bureeva, Svetlana and Nikolsky, Yuri and Bessarabova, Marina
PloS one, 2013, 8(4), e60618
PMID: 23593264     doi: 10.1371/journal.pone.0060618

The discovery of novel drug targets is a significant challenge in drug development. Although the human genome comprises approximately 30,000 genes, proteins encoded by fewer than 400 are used as drug targets in the treatment of diseases. Therefore, novel drug targets are extremely valuable as the source for first in class drugs. On the other hand, many of the currently known drug targets are functionally pleiotropic and involved in multiple pathologies. Several of them are exploited for treating multiple diseases, which highlights the need for methods to reliably reposition drug targets to new indications. Network-based methods have been successfully applied to prioritize novel disease-associated genes. In recent years, several such algorithms have been developed, some focusing on local network properties only, and others taking the complete network topology into account. Common to all approaches is the understanding that novel disease-associated candidates are in close overall proximity to known disease genes. However, the relevance of these methods to the prediction of novel drug targets has not yet been assessed. Here, we present a network-based approach for the prediction of drug targets for a given disease. The method allows both repositioning drug targets known for other diseases to the given disease and the prediction of unexploited drug targets which are not used for treatment of any disease. Our approach takes as input a disease gene expression signature and a high-quality interaction network and outputs a prioritized list of drug targets. We demonstrate the high performance of our method and highlight the usefulness of the predictions in three case studies. We present novel drug targets for scleroderma and different types of cancer with their underlying biological processes. Furthermore, we demonstrate the ability of our method to identify non-suspected repositioning candidates using diabetes type 1 as an example.

• Predicting drug-target interactions through integrative analysis of chemogenetic assays in yeast.
Heiskanen, Marja A and Aittokallio, Tero
Molecular bioSystems, 2013, 9(4), 768-779
PMID: 23420501     doi: 10.1039/c3mb25591c

Chemical-genomic and genetic interaction profiling approaches are widely used to study mechanisms of drug action and resistance. However, there exist a number of scoring algorithms customized to different experimental assays, the relative performance of which remains poorly understood, especially with respect to different types of chemogenetic assays. Using yeast Saccharomyces cerevisiae as a test bed, we carried out a systematic evaluation among the main drug target analysis approaches in terms of predicting global drug-target interaction networks. We found drastic differences in their performance across different chemical-genomic assay types, such as those based on heterozygous and homozygous diploid or haploid deletion mutant libraries. Moreover, a relatively small overlap in the predicted targets was observed between those approaches that use either chemical-genomic screening alone or combined with genetic interaction profiling. A rank-based integration of the complementary scoring approaches led to improved overall performance, demonstrating that genetic interaction profiling provides added information on drug target prediction. Optimal performance was achieved when focusing specifically on the negative tail of the genetic interactions, suggesting that combining synthetic lethal interactions with chemical-genetic interactions provides highest information on drug-target interactions. A network view of rapamycin-interacting genes, pathways and complexes was used as an example to demonstrate the benefits of such integrated and optimized analysis of chemogenetic assays in yeast.

## 2012

• Large-scale prediction and testing of drug activity on side-effect targets.
Lounkine, Eugen and Keiser, Michael J and Whitebread, Steven and Mikhailov, Dmitri and Hamon, Jacques and Jenkins, Jeremy L and Lavan, Paul and Weber, Eckhard and Doak, Allison K and Côté, Serge and Shoichet, Brian K and Urban, Laszlo
Nature\ldots}, 2012, 486(7403), 361-367
PMID: 22722194     doi: 10.1038/nature11159

Discovering the unintended 'off-targets' that predict adverse drug reactions is daunting by empirical methods alone. Drugs can act on several protein targets, some of which can be unrelated by conventional molecular metrics, and hundreds of proteins have been implicated in side effects. Here we use a computational strategy to predict the activity of 656 marketed drugs on 73 unintended 'side-effect' targets. Approximately half of the predictions were confirmed, either from proprietary databases unknown to the method or by new experimental assays. Affinities for these new off-targets ranged from 1 nM to 30 $\mu$M. To explore relevance, we developed an association metric to prioritize those new off-targets that explained side effects better than any known target of a given drug, creating a drug-target-adverse drug reaction network. Among these new associations was the prediction that the abdominal pain side effect of the synthetic oestrogen chlorotrianisene was mediated through its newly discovered inhibition of the enzyme cyclooxygenase-1. The clinical relevance of this inhibition was borne out in whole human blood platelet aggregation assays. This approach may have wide application to de-risking toxicological liabilities in drug discovery.

• Effects of protein interaction data integration, representation and reliability on the use of network properties for drug target prediction.
Mora, Antonio and Donaldson, Ian M
Bmc Bioinformatics, 2012, 13(1), 294
PMID: 23146171     doi: 10.1186/1471-2105-13-294

ABSTRACT: BACKGROUND: Previous studies have noted that drug targets appear to be associated with higher-degree or higher-centrality proteins in interaction networks. These studies explicitly or tacitly make choices of different source databases, data integration strategies, representation of proteins and complexes, and data reliability assumptions. Here we examined how the use of different data integration and representation techniques, or different notions of reliability, may affect the efficacy of degree and centrality as features in drug target prediction. RESULTS: Fifty percent of drug targets have a degree of less than nine, and ninety-five percent have a degree of less than ninety. We found that drug targets are over-represented in higher degree bins - this relationship is only seen for the consolidated interactome and it is not dependent on n-ary interaction data or its representation. Degree acts as a weak predictive feature for drug-target status and using more reliable subsets of the data does not increase this performance. However, performance does increase if only cancer-related drug targets are considered. We also note that a protein's membership in pathway records can act as a predictive feature that is better than degree and that high-centrality may be an indicator of a drug that is more likely to be withdrawn. CONCLUSIONS: These results show that protein interaction data integration and cleaning is an important consideration when incorporating network properties as predictive features for drug-target status. The provided scripts and data sets offer a starting point for further studies and cross-comparison of methods.

• Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions.
Myint, Kyaw-Zeyar and Wang, Lirong and Tong, Qin and Xie, Xiang-Qun
Molecular Pharmaceutics, 2012, 9(10), 2912-2923
PMID: 22937990     doi: 10.1021/mp300237z

In this manuscript, we have reported a novel 2D fingerprint-based artificial neural network QSAR (FANN-QSAR) method in order to effectively predict biological activities of structurally diverse chemical ligands. Three different types of fingerprints, namely, ECFP6, FP2 and MACCS, were used in FANN-QSAR algorithm development, and FANN-QSAR models were compared to known 3D and 2D QSAR methods using five data sets previously reported. In addition, the derived models were used to predict GPCR cannabinoid ligand binding affinities using our manually curated cannabinoid ligand database containing 1699 structurally diverse compounds with reported cannabinoid receptor subtype CB(2) activities. To demonstrate its useful applications, the established FANN-QSAR algorithm was used as a virtual screening tool to search a large NCI compound database for lead cannabinoid compounds, and we have discovered several compounds with good CB(2) binding affinities ranging from 6.70 nM to 3.75 $\mu$M. To the best of our knowledge, this is the first report for a fingerprint-based neural network approach validated with a successful virtual screening application in identifying lead compounds. The studies proved that the FANN-QSAR method is a useful approach to predict bioactivities or properties of ligands and to find novel lead compounds for drug discovery research.

• Compound activity prediction using models of binding pockets or ligand properties in 3D.
Kufareva, Irina and Chen, Yu-Chen and Ilatovskiy, Andrey V and Abagyan, Ruben
Current topics in medicinal chemistry, 2012, 12(17), 1869-1882
PMID: 23116466

Transient interactions of endogenous and exogenous small molecules with flexible binding sites in proteins or macromolecular assemblies play a critical role in all biological processes. Current advances in high-resolution protein structure determination, database development, and docking methodology make it possible to design three-dimensional models for prediction of such interactions with increasing accuracy and specificity. Using the data collected in the Pocketome encyclopedia, we here provide an overview of two types of the three-dimensional ligand activity models, pocket-based and ligand property-based, for two important classes of proteins, nuclear and G-protein coupled receptors. For half the targets, the pocket models discriminate actives from property matched decoys with acceptable accuracy (the area under ROC curve, AUC, exceeding 84%) and for about one fifth of the targets with high accuracy (AUC > 95%). The 3D ligand property field models performed better than 95% in half of the cases. The high performance models can already become a basis of activity predictions for new chemicals. Family-wide benchmarking of the models highlights strengths of both approaches and helps identify their inherent bottlenecks and challenges.

• Growth of Ligand-Target Interaction Data in ChEMBL Is Associated with Increasing and Activity Measurement-Dependent Compound Promiscuity.
Hu, Ye and Bajorath, Jürgen
Journal of chemical information and modeling, 2012, 52(10), 2550-2558
PMID: 22978710     doi: 10.1021/ci3003304

Compounds with high-confidence target annotations and activity measurements in the original and current release of the ChEMBL database have been compared to better understand how the growth of compound activity data might influence the spectrum of ligand-target interactions and the degree of target promiscuity among active compounds. Compared to the original ChEMBL release, a significant increase in the proportion of target promiscuous compounds was observed in the current version. The presence of these compounds led to large-magnitude changes in compound activity-based target and target family relationships and to a reorganization of major target communities. Surprisingly, however, this strong trend toward increasing target promiscuity was largely caused by growth of compounds with exclusive IC(50) measurements. By contrast, compounds with available equilibrium constants, which were also added in large amounts, did not substantially alter compound-based target relationships and notably contribute to increasing target promiscuity. These findings suggest that apparent compound promiscuity is much dependent on experimental conditions under which activities are determined and that care should be taken when evaluating promiscuity and polypharmacology on the basis of assay-dependent activity measurements.

• Matched Molecular Pair Analysis of Small Molecule Microarray Data Identifies Promiscuity Cliffs and Reveals Molecular Origins of Extreme Compound Promiscuity.
Dimova, Dilyana and Hu, Ye and Bajorath, Jürgen
Journal of medicinal chemistry, 2012, 55(22), 10220-10228
PMID: 23050678     doi: 10.1021/jm301292a

The study of compound promiscuity is a hot topic in medicinal chemistry and drug discovery research. Promiscuous compounds are increasingly identified, but the molecular basis of promiscuity is currently only little understood. Utilizing the matched molecular pair formalism, we have analyzed patterns of compound promiscuity in a publicly available small molecule microarray data set. On the basis of our analysis, we introduce "promiscuity cliffs" as pairs of structural analogs with single-site substitutions that lead to large-magnitude differences in apparent compound promiscuity involving between 50 and 97 unrelated targets. No substructures or substructure transformations have been detected that are generally responsible for introducing promiscuity. However, within a given structural context, small chemical replacements were found to lead to dramatic promiscuity effects. On the basis of our analysis, promiscuity is not an inherent feature of molecular scaffolds but can be induced by small chemical substitutions. Promiscuity cliffs provide immediate access to such modifications.

• Drug target prediction using adverse event report systems: a pharmacogenomic approach.
Takarabe, Masataka and Kotera, Masaaki and Nishimura, Yosuke and Goto, Susumu and Yamanishi, Yoshihiro
Bioinformatics (Oxford, England), 2012, 28(18), i611-i618
PMID: 22962489     doi: 10.1093/bioinformatics/bts413

MOTIVATION:Unexpected drug activities derived from off-targets are usually undesired and harmful; however, they can occasionally be beneficial for different therapeutic indications. There are many uncharacterized drugs whose target proteins (including the primary target and off-targets) remain unknown. The identification of all potential drug targets has become an important issue in drug repositioning to reuse known drugs for new therapeutic indications.

• Structural insights into the molecular basis of the ligand promiscuity.
Sturm, Noé and Desaphy, Jérémy and Quinn, Ronald J and Rognan, Didier and Kellenberger, Esther
Journal of chemical information and modeling, 2012, 52(9), 2410-2421
PMID: 22920885     doi: 10.1021/ci300196g

Selectivity is a key factor in drug development. In this paper, we questioned the Protein Data Bank to better understand the reasons for the promiscuity of bioactive compounds. We assembled a data set of >1000 pairs of three-dimensional structures of complexes between a "drug-like" ligand (as its physicochemical properties overlap that of approved drugs) and two distinct "druggable" protein targets (as their binding sites are likely to accommodate "drug-like" ligands). Studying the similarity between the ligand-binding sites in the different targets revealed that the lack of selectivity of a ligand can be due (i) to the fact that Nature has created the same binding pocket in different proteins, which do not necessarily have otherwise sequence or fold similarity, or (ii) to specific characteristics of the ligand itself. In particular, we demonstrated that many ligands can adapt to different protein environments by changing their conformation, by using different chemical moieties to anchor to different targets, or by adopting unusual extreme binding modes (e.g., only apolar contact between the ligand and the protein, even though polar groups are present on the ligand or at the protein surface). Lastly, we provided new elements in support to the recent studies which suggest that the promiscuity of a ligand might be inferred from its molecular complexity.

• Exploring polypharmacology using a ROCS-based target fishing approach.
Abdulhameed, Mohamed Diwan M and Chaudhury, Sidhartha and Singh, Narender and Sun, Hongmao and Wallqvist, Anders and Tawa, Gregory J
Journal of chemical information and modeling, 2012, 52(2), 492-505
PMID: 22196353     doi: 10.1021/ci2003544

Polypharmacology has emerged as a new theme in drug discovery. In this paper, we studied polypharmacology using a ligand-based target fishing (LBTF) protocol. To implement the protocol, we first generated a chemogenomic database that links individual protein targets with a specified set of drugs or target representatives. Target profiles were then generated for a given query molecule by computing maximal shape/chemistry overlap between the query molecule and the drug sets assigned to each protein target. The overlap was computed using the program ROCS (Rapid Overlay of Chemical Structures). We validated this approach using the Directory of Useful Decoys (DUD). DUD contains 2950 active compounds, each with 36 property-matched decoys, against 40 protein targets. We chose a set of known drugs to represent each DUD target, and we carried out ligand-based virtual screens using data sets of DUD actives seeded into DUD decoys for each target. We computed Receiver Operator Characteristic (ROC) curves and associated area under the curve (AUC) values. For the majority of targets studied, the AUC values were significantly better than for the case of a random selection of compounds. In a second test, the method successfully identified off-targets for drugs such as rimantadine, propranolol, and domperidone that were consistent with those identified by recent experiments. The results from our ROCS-based target fishing approach are promising and have potential application in drug repurposing for single and multiple targets, identifying targets for orphan compounds, and adverse effect prediction.

• Identifying multiple-target ligands via computational chemogenomics approaches.
Peng, Shiming and Lin, Xingyu and Guo, Zongru and Huang, Niu
Current topics in medicinal chemistry, 2012, 12(12), 1363-1375
PMID: 22690683

Despite the rapidly growing knowledge of functional and structural information regarding pharmaceutically relevant targets during the past decade, target-based drug discovery has remained a high-cost and low-yield process. Particularly, single-target drugs often turn out to be less effective in treating complicated diseases such as cancers, metabolic disorders and CNS diseases. However, discovering compounds that are effective against multiple desired targets raises an enormous challenge to the current mode of drug innovation. Computational chemogenomics approaches aim at predicting all potential interactions between small molecular ligands and biomolecular targets, thus the derived information can be directly applied to "design in" (i.e. engineer desirable binding spectrum) and "design out" (i.e. eliminate the unwanted interactions) specific biological activities. The present review will focus on introducing the recent methodological development and successful applications of structure-based and ligand-based approaches on predicting the ligand binding profiles, which is the very first and essential step toward rationally designing the multiple-target ligands. Structure-based methods (e.g. binding site mapping and inverse molecular docking) generally require the structures of known targets to navigate the receptor-ligand binding space, while ligand-based approaches (e.g. chemical similarity analysis and pharmacophore search) can only rely on the series of active compounds to derive the structural characteristics for describing certain biological activities.

• Chemogenomics in drug discovery: computational methods based on the comparison of binding sites.
Vulpetti, Anna and Kalliokoski, Tuomo and Milletti, Francesca
Future medicinal chemistry, 2012, 4(15), 1971-1979
PMID: 23088277     doi: 10.4155/fmc.12.147

Novel computational methods for understanding relationships between ligands and all possible biological targets have emerged in recent years. Proteins are connected to each other based on the similarity of their ligands or based on the similarity of their binding sites. The assumption is that compounds sharing chemical similarity should share targets and that targets with a similar binding site should also share ligands. A large number of computational techniques have been developed to assess ligand and binding site similarity, which can be used to make quantitative predictions of the most probable biological target of a given compound. This review covers the recent advances in new computational methods for relating biological targets based on the similarity of their binding sites. Binding site comparisons are used for the prediction of their most likely ligands, their possible cross reactivity and selectivity. These comparisons can also be used to infer the function of novel uncharacterized proteins.

• Comprehensive predictions of target proteins based on protein-chemical interaction using virtual screening and experimental verifications.
Kobayashi, Hiroki and Harada, Hiroko and Nakamura, Masaomi and Futamura, Yushi and Ito, Akihiro and Yoshida, Minoru and Iemura, Shun-Ichiro and Shin-Ya, Kazuo and Doi, Takayuki and Takahashi, Takashi and Natsume, Tohru and Imoto, Masaya and Sakakibara, Yasubumi
BMC chemical biology, 2012, 12(1), 2
PMID: 22480302     doi: 10.1186/1472-6769-12-2

ABSTRACT: BACKGROUND: Identification of the target proteins of bioactive compounds is critical for elucidating the mode of action; however, target identification has been difficult in general, mostly due to the low sensitivity of detection using affinity chromatography followed by CBB staining and MS/MS analysis. RESULTS: We applied our protocol of predicting target proteins combining in silico screening and experimental verification for incednine, which inhibits the anti-apoptotic function of Bcl-xL by an unknown mechanism. One hundred eighty-two target protein candidates were computationally predicted to bind to incednine by the statistical prediction method, and the predictions were verified by in vitro binding of incednine to seven proteins, whose expression can be confirmed in our cell system. As a result, 40% accuracy of the computational predictions was achieved successfully, and we newly found 3 incednine-binding proteins. CONCLUSIONS: This study revealed that our proposed protocol of predicting target protein combining in silico screening and experimental verification is useful, and provides new insight into a strategy for identifying target proteins of small molecules.

• Predicting new indications for approved drugs using a proteochemometric method.
Dakshanamurthy, Sivanesan and Issa, Naiem T and Assefnia, Shahin and Seshasayee, Ashwini and Peters, Oakland J and Madhavan, Subha and Uren, Aykut and Brown, Milton L and Byers, Stephen W
Journal of medicinal chemistry, 2012, 55(15), 6832-6848
PMID: 22780961     doi: 10.1021/jm300576q

The most effective way to move from target identification to the clinic is to identify already approved drugs with the potential for activating or inhibiting unintended targets (repurposing or repositioning). This is usually achieved by high throughput chemical screening, transcriptome matching, or simple in silico ligand docking. We now describe a novel rapid computational proteochemometric method called "train, match, fit, streamline" (TMFS) to map new drug-target interaction space and predict new uses. The TMFS method combines shape, topology, and chemical signatures, including docking score and functional contact points of the ligand, to predict potential drug-target interactions with remarkable accuracy. Using the TMFS method, we performed extensive molecular fit computations on 3671 FDA approved drugs across 2335 human protein crystal structures. The TMFS method predicts drug-target associations with 91% accuracy for the majority of drugs. Over 58% of the known best ligands for each target were correctly predicted as top ranked, followed by 66%, 76%, 84%, and 91% for agents ranked in the top 10, 20, 30, and 40, respectively, out of all 3671 drugs. Drugs ranked in the top 1-40 that have not been experimentally validated for a particular target now become candidates for repositioning. Furthermore, we used the TMFS method to discover that mebendazole, an antiparasitic with recently discovered and unexpected anticancer properties, has the structural potential to inhibit VEGFR2. We confirmed experimentally that mebendazole inhibits VEGFR2 kinase activity and angiogenesis at doses comparable with its known effects on hookworm. TMFS also predicted, and was confirmed with surface plasmon resonance, that dimethyl celecoxib and the anti-inflammatory agent celecoxib can bind cadherin-11, an adhesion molecule important in rheumatoid arthritis and poor prognosis malignancies for which no targeted therapies exist. We anticipate that expanding our TMFS method to the >27 000 clinically active agents available worldwide across all targets will be most useful in the repositioning of existing drugs for new therapeutic targets.

• Detecting Drug Promiscuity Using Gaussian Ensemble Screening.
Pérez-Nueno, Violeta I and Venkatraman, Vishwesh and Mavridis, Lazaros and Ritchie, David W
Journal of chemical information and modeling, 2012, 52(8), 1948-1961
PMID: 22747187     doi: 10.1021/ci3000979

Polypharmacology describes the binding of a ligand to multiple protein targets (a promiscuous ligand) or multiple diverse ligands binding to a given target (a promiscuous target). Pharmaceutical companies are discovering increasing numbers of both promiscuous drugs and drug targets. Hence, polypharmacology is now recognized as an important aspect of drug design. Here, we describe a new and fast way to predict polypharmacological relationships between drug classes quantitatively, which we call Gaussian Ensemble Screening (GES). This approach represents a cluster of molecules with similar spherical harmonic surface shapes as a Gaussian distribution with respect to a selected center molecule. Calculating the Gaussian overlap between pairs of such clusters allows the similarity between drug classes to be calculated analytically without requiring thousands of bootstrap comparisons, as in current promiscuity prediction approaches. We find that such cluster similarity scores also follow a Gaussian distribution. Hence, a cluster similarity score may be transformed into a probability value, or "p-value", in order to quantify the relationships between drug classes. We present results obtained when using the GES approach to predict relationships between drug classes in a subset of the MDL Drug Data Report (MDDR) database. Our results indicate that GES is a useful way to study polypharmacology relationships, and it could provide a novel way to propose new targets for drug repositioning.

• Virtual Target Screening: Validation Using Kinase Inhibitors.
Santiago, Daniel N and Pevzner, Yuri and Durand, Ashley A and Tran, Minhphuong and Scheerer, Rachel R and Daniel, Kenyon and Sung, Shen-Shu and Lee Woodcock, H and Guida, Wayne C and Brooks, Wesley H
Journal of chemical information and modeling, 2012, 52(8), 2192-2203
PMID: 22747098     doi: 10.1021/ci300073m

Computational methods involving virtual screening could potentially be employed to discover new biomolecular targets for an individual molecule of interest (MOI). However, existing scoring functions may not accurately differentiate proteins to which the MOI binds from a larger set of macromolecules in a protein structural database. An MOI will most likely have varying degrees of predicted binding affinities to many protein targets. However, correctly interpreting a docking score as a hit for the MOI docked to any individual protein can be problematic. In our method, which we term "Virtual Target Screening (VTS)", a set of small drug-like molecules are docked against each structure in the protein library to produce benchmark statistics. This calibration provides a reference for each protein so that hits can be identified for an MOI. VTS can then be used as tool for: drug repositioning (repurposing), specificity and toxicity testing, identifying potential metabolites, probing protein structures for allosteric sites, and testing focused libraries (collection of MOIs with similar chemotypes) for selectivity. To validate our VTS method, twenty kinase inhibitors were docked to a collection of calibrated protein structures. Here, we report our results where VTS predicted protein kinases as hits in preference to other proteins in our database. Concurrently, a graphical interface for VTS was developed.

• Assessing drug target association using semantic linked data.
Chen, Bin and Ding, Ying and Wild, David J
PLoS computational biology, 2012, 8(7), e1002574
PMID: 22859915     doi: 10.1371/journal.pcbi.1002574

The rapidly increasing amount of public data in chemistry and biology provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets would permit investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 290,000 nodes and 720,000 edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can correctly identify known direct drug target pairs with high precision. Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 157 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a [Formula: see text] score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms. The web service is freely available at: http://chem2bio2rdf.org/slap.

• idTarget: a web server for identifying protein targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach.
Wang, Jui-Chih and Chu, Pei-Ying and Chen, Chung-Ming and Lin, Jung-Hsin
Nucleic acids research, 2012, 40(Web Server issue), W393-9
PMID: 22649057     doi: 10.1093/nar/gks496

Identification of possible protein targets of small chemical molecules is an important step for unravelling their underlying causes of actions at the molecular level. To this end, we construct a web server, idTarget, which can predict possible binding targets of a small chemical molecule via a divide-and-conquer docking approach, in combination with our recently developed scoring functions based on robust regression analysis and quantum chemical charge models. Affinity profiles of the protein targets are used to provide the confidence levels of prediction. The divide-and-conquer docking approach uses adaptively constructed small overlapping grids to constrain the searching space, thereby achieving better docking efficiency. Unlike previous approaches that screen against a specific class of targets or a limited number of targets, idTarget screen against nearly all protein structures deposited in the Protein Data Bank (PDB). We show that idTarget is able to reproduce known off-targets of drugs or drug-like compounds, and the suggested new targets could be prioritized for further investigation. idTarget is freely available as a web-based server at http://idtarget.rcas.sinica.edu.tw.

## 2011

• Old friends in new guise: repositioning of known drugs with structural bioinformatics.
Haupt, V Joachim and Schroeder, Michael
Briefings in bioinformatics, 2011, 12(4), 312-326
PMID: 21441562     doi: 10.1093/bib/bbr011

Developing a drug de novo is a laborious and costly endeavor. Thus, the repositioning of already approved drugs for the treatment of new diseases is promising and valuable. One computational approach to repositioning exploits the structural similarity of binding sites of known and new targets. Here, we review computational methods to represent and align binding sites. We review available tools, present success stories and discuss limits of the approach.

• PROMISCUOUS: a database for network-based drug-repositioning.
von Eichborn, Joachim and Murgueitio, Manuela S and Dunkel, Mathias and Koerner, Soeren and Bourne, Philip E and Preissner, Robert
Nucleic acids research, 2011, 39(Database issue), D1060-6
PMID: 21071407     doi: 10.1093/nar/gkq1037

The procedure of drug approval is time-consuming, costly and risky. Accidental findings regarding multi-specificity of approved drugs led to block-busters in new indication areas. Therefore, the interest in systematically elucidating new areas of application for known drugs is rising. Furthermore, the knowledge, understanding and prediction of so-called off-target effects allow a rational approach to the understanding of side-effects. With PROMISCUOUS we provide an exhaustive set of drugs (25,000), including withdrawn or experimental drugs, annotated with drug-protein and protein-protein relationships (21,500/104,000) compiled from public resources via text and data mining including manual curation. Measures of structural similarity for drugs as well as known side-effects can be easily connected to protein-protein interactions to establish and analyse networks responsible for multi-pharmacology. This network-based approach can provide a starting point for drug-repositioning. PROMISCUOUS is publicly available at http://bioinformatics.charite.de/promiscuous.

• Chemical structural novelty: on-targets and off-targets.
Yera, Emmanuel R and Cleves, Ann E and Jain, Ajay N
Journal of medicinal chemistry, 2011, 54(19), 6771-6785
PMID: 21916467     doi: 10.1021/jm200666a

Drug structures may be quantitatively compared based on 2D topological structural considerations and based on 3D characteristics directly related to binding. A framework for combining multiple similarity computations is presented along with its systematic application to 358 drugs with overlapping pharmacology. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, or their combination. For prediction of primary targets, the benefit of 3D over 2D was relatively small, but for prediction of off-targets, the added benefit was large. In addition to assessing prediction, the relationship between chemical similarity and pharmacological novelty was studied. Drug pairs that shared high 3D similarity but low 2D similarity (i.e., a novel scaffold) were shown to be much more likely to exhibit pharmacologically relevant differences in terms of specific protein target modulation.

• ReverseScreen3D: a structure-based ligand matching method to identify protein targets.
Kinnings, Sarah L and Jackson, Richard M
Journal of chemical information and modeling, 2011, 51(3), 624-634
PMID: 21361385     doi: 10.1021/ci1003174

Ligand promiscuity, which is now recognized as an extremely common phenomenon, is a major underlying cause of drug toxicity. We have developed a new reverse virtual screening (VS) method called ReverseScreen3D, which can be used to predict the potential protein targets of a query compound of interest. The method uses a 2D fingerprint-based method to select a ligand template from each unique binding site of each protein within a target database. The target database contains only the structurally determined bioactive conformations of known ligands. The 2D comparison is followed by a 3D structural comparison to the selected query ligand using a geometric matching method, in order to prioritize each target binding site in the database. We have evaluated the performance of the ReverseScreen2D and 3D methods using a diverse set of small molecule protein inhibitors known to have multiple targets, and have shown that they are able to provide a highly significant enrichment of true targets in the database. Furthermore, we have shown that the 3D structural comparison improves early enrichment when compared with the 2D method alone, and that the 3D method performs well even in the absence of 2D similarity to the template ligands. By carrying out further experimental screening on the prioritized list of targets, it may be possible to determine the potential targets of a new compound or determine the off-targets of an existing drug. The ReverseScreen3D method has been incorporated into a Web server, which is freely available at http://www.modelling.leeds.ac.uk/ReverseScreen3D .

• From in silico target prediction to multi-target drug design: Current databases, methods and applications.
Koutsoukas, Alexios and Simms, Benjamin and Kirchmair, Johannes and Bond, Peter J and Whitmore, Alan V and Zimmer, Steven and Young, Malcolm P and Jenkins, Jeremy L and Glick, Meir and Glen, Robert C and Bender, Andreas
Journal of proteomics, 2011, 74(12), 2554-2574
PMID: 21621023     doi: 10.1016/j.jprot.2011.05.011

Given the tremendous growth of bioactivity databases, the use of computational tools to predict protein targets of small molecules has been gaining importance in recent years. Applications span a wide range, from the 'designed polypharmacology' of compounds to mode-of-action analysis. In this review, we firstly survey databases that can be used for ligand-based target prediction and which have grown tremendously in size in the past. We furthermore outline methods for target prediction that exist, both based on the knowledge of bioactivities from the ligand side and methods that can be applied in situations when a protein structure is known. Applications of successful in silico target identification attempts are discussed in detail, which were based partly or in whole on computational target predictions in the first instance. This includes the authors' own experience using target prediction tools, in this case considering phenotypic antibacterial screens and the analysis of high-throughput screening data. Finally, we will conclude with the prospective application of databases to not only predict, retrospectively, the protein targets of a small molecule, but also how to design ligands with desired polypharmacology in a prospective manner.

• Mapping of pharmacological space.
Nisius, Britta and Bajorath, Jürgen
Expert opinion on drug discovery, 2011, 6(1), 1-7
PMID: 22646823     doi: 10.1517/17460441.2011.533654

The analysis of pharmacological space is becoming highly relevant in light of the emerging polypharmacology paradigm, that is, the increasing evidence that many drugs elicit therapeutic effects and adverse drug reactions through interactions with multiple targets. To better understand desired and undesired polypharmacology and identify new targets for existing drugs, computational methods are of critical importance. Herein we provide an overview of computational approaches for analyzing pharmacological space and put their opportunities and limitations in perspective. Insights into computational approaches for the study of target-ligand interactions and polypharmacology are provided and put into scientific context. The interplay between computational and experimental approaches is rationalized. Computational methods have become indispensable tools for the systematic analysis of drug-target interactions. Because currently most prominent predictive methods are knowledge-based, they are affected by data bias and sparseness. Predictions of drug-target interactions are already carried out on a large scale, but experimentally validated to a much lesser extent. In order to demonstrate true utility of pharmacological space analysis for drug discovery, it will be essential to closely interface computational and experimental target profiling efforts.

## 2010

• The chemical basis of pharmacology.
Keiser, Michael J and Irwin, John J and Shoichet, Brian K
Biochemistry, 2010, 49(48), 10267-10276
PMID: 21058655     doi: 10.1021/bi101540g

Molecular biology now dominates pharmacology so thoroughly that it is difficult to recall that only a generation ago the field was very different. To understand drug action today, we characterize the targets through which they act and new drug leads are discovered on the basis of target structure and function. Until the mid-1980s the information often flowed in reverse: investigators began with organic molecules and sought targets, relating receptors not by sequence or structure but by their ligands. Recently, investigators have returned to this chemical view of biology, bringing to it systematic and quantitative methods of relating targets by their ligands. This has allowed the discovery of new targets for established drugs, suggested the bases for their side effects, and predicted the molecular targets underlying phenotypic screens. The bases for these new methods, some of their successes and liabilities, and new opportunities for their use are described.

• Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework.
Yamanishi, Yoshihiro and Kotera, Masaaki and Kanehisa, Minoru and Goto, Susumu
Bioinformatics (Oxford, England), 2010, 26(12), i246-54
PMID: 20529913     doi: 10.1093/bioinformatics/btq176

MOTIVATION:In silico prediction of drug-target interactions from heterogeneous biological data is critical in the search for drugs and therapeutic targets for known diseases such as cancers. There is therefore a strong incentive to develop new methods capable of detecting these potential drug-target interactions efficiently.

• Rational approaches to targeted polypharmacology: creating and navigating protein-ligand interaction networks.
Metz, James T and Hajduk, Philip J
Current opinion in chemical biology, 2010, 14(4), 498-504
PMID: 20609615     doi: 10.1016/j.cbpa.2010.06.166

Many successful drugs bind to and modulate multiple targets in vivo. Successfully navigating protein-ligand polypharmacology will be a crucial and increasingly utilized component of pharmaceutical research. As publicly available databases of ligand activity values continue to grow in size and quality, infrastructure is needed to enable scientists to create and interact with these networks to fuel hypothesis-driven science. While most of the individual tools for creating this infrastructure exist, effectively connecting the data to the network to the scientist is very much a work in progress. Standards for publishing network data are also important to facilitate the analysis and comparison of networks from different research groups using different methods.

• Biochemical network-based drug-target prediction.
Klipp, Edda and Wade, Rebecca C and Kummer, Ursula
Current Opinion in Biotechnology, 2010, 21(4), 511-516
PMID: 20554441     doi: 10.1016/j.copbio.2010.05.004

The use of networks to aid the drug discovery process is a rather new but booming endeavor. A vast variety of different types of networks are being constructed and analyzed for various different tasks in drug discovery. The analysis may be at the level of establishing connectivity, topology, and graphs, or may go to a more quantitative level. We discuss here how computational systems biology approaches can aid the quantitative analysis of biochemical networks for drug-target prediction. We focus on networks and pathways in which the components are related by physical interactions or biochemical processes. We particularly discuss the potential of mathematical modeling to aid the analysis of proteins for druggability.

• PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach.
Liu, Xiaofeng and Ouyang, Sisheng and Yu, Biao and Liu, Yabo and Huang, Kai and Gong, Jiayu and Zheng, Siyuan and Li, Zhihua and Li, Honglin and Jiang, Hualiang
Nucleic acids research, 2010, 38(Web Server issue), W609-14
PMID: 20430828     doi: 10.1093/nar/gkq300

In silico drug target identification, which includes many distinct algorithms for finding disease genes and proteins, is the first step in the drug discovery pipeline. When the 3D structures of the targets are available, the problem of target identification is usually converted to finding the best interaction mode between the potential target candidates and small molecule probes. Pharmacophore, which is the spatial arrangement of features essential for a molecule to interact with a specific target receptor, is an alternative method for achieving this goal apart from molecular docking method. PharmMapper server is a freely accessed web server designed to identify potential target candidates for the given small molecules (drugs, natural products or other newly discovered compounds with unidentified binding targets) using pharmacophore mapping approach. PharmMapper hosts a large, in-house repertoire of pharmacophore database (namely PharmTargetDB) annotated from all the targets information in TargetBank, BindingDB, DrugBank and potential drug target database, including over 7000 receptor-based pharmacophore models (covering over 1500 drug targets information). PharmMapper automatically finds the best mapping poses of the query molecule against all the pharmacophore models in PharmTargetDB and lists the top N best-fitted hits with appropriate target annotations, as well as respective molecule's aligned poses are presented. Benefited from the highly efficient and robust triangle hashing mapping method, PharmMapper bears high throughput ability and only costs 1 h averagely to screen the whole PharmTargetDB. The protocol was successful in finding the proper targets among the top 300 pharmacophore candidates in the retrospective benchmarking test of tamoxifen. PharmMapper is available at http://59.78.96.61/pharmmapper.

## 2009

• Predicting new molecular targets for known drugs
Keiser, Michael J and Setola, Vincent and Irwin, John J and Laggner, Christian and Abbas, Atheir I and Hufeisen, Sandra J and Jensen, Niels H and Kuijer, Michael B and Matos, Roberto C and Tran, Thuy B and Whaley, Ryan and Glennon, Richard A and Hert, Jérôme and Thomas, Kelan L H and Edwards, Douglas D and Shoichet, Brian K and Roth, Bryan L
Nature\ldots}, 2009, 462(7270), 175-181
doi: 10.1038/nature08506

Abstract Although drugs are intended to be selective, at least some bind to several physiological targets, explaining side effects and efficacy. Because many drug-target combinations exist, it would be useful to explore possible interactions computationally. ...

• Predicting new molecular targets for known drugs
Keiser, Michael J and Setola, Vincent and Irwin, John J and Laggner, Christian and Abbas, Atheir I and Hufeisen, Sandra J and Jensen, Niels H and Kuijer, Michael B and Matos, Roberto C and Tran, Thuy B and Whaley, Ryan and Glennon, Richard A and Hert, Jérôme and Thomas, Kelan L H and Edwards, Douglas D and Shoichet, Brian K and Roth, Bryan L
Nature\ldots}, 2009, 462(7270), 175-181
doi: 10.1038/nature08506

Abstract Although drugs are intended to be selective, at least some bind to several physiological targets, explaining side effects and efficacy. Because many drug-target combinations exist, it would be useful to explore possible interactions computationally. ...

• Off-target networks derived from ligand set similarity.
Keiser, Michael J and Hert, Jérôme
Methods in molecular biology (Clifton, N.J.), 2009, 575, 195-205
PMID: 19727616     doi: 10.1007/978-1-60761-274-2_8

Chemically similar drugs often bind biologically diverse protein targets, and proteins with similar sequences or structures do not always recognize the same ligands. How can we uncover the pharmacological relationships among proteins, when drugs may bind them in defiance of bioinformatic criteria? Here we consider a technique that quantitatively relates proteins based on the chemical similarity of their ligands. Starting with tens of thousands of ligands organized into sets for hundreds of drug targets, we calculated the similarity among sets using ligand topology. We developed a statistical model to rank the resulting scores, which were then expressed in minimum spanning trees. We have shown that biologically sensible groups of targets emerged from these maps, as well as experimentally validated predictions of drug off-target effects.

• Computer-aided drug discovery and development (CADDD): In silico-chemico-biological approach
Kapetanovic, I. M.
Journal of chemical information and modeling, 2008, 49(10), 165-176
PMID: 19764745     doi: 10.1016/j.cbi.2006.12.006

It is generally recognized that drug discovery and development are very time and resources consuming processes. There is an ever growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In biomedical arena, computer-aided or in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile and avoid safety issues. Commonly used computational approaches include ligand-based drug design (pharmacophore, a 3D spatial arrangement of chemical features essential for biological activity), structure-based drug design (drug-target docking), and quantitative structure-activity and quantitative structure-property relationships. Regulatory agencies as well as pharmaceutical industry are actively involved in development of computational tools that will improve effectiveness and efficiency of drug discovery and development process, decrease use of animals, and increase predictability. It is expected that the power of CADDD will grow as the technology continues to evolve. Published by Elsevier Ireland Ltd.

## 2008

• Binding similarity network of ligand.
Park, Keunwan and Kim, Dongsup
Proteins, 2008, 71(2), 960-971
PMID: 18004762     doi: 10.1002/prot.21780

The protein and ligand interaction takes an important part in protein function. Both ligand and its binding site are essential components for understanding how the protein-ligand complex functions. Until now, there have been many studies about protein function and evolution, but they usually lacked ligand information. Accordingly, in this study, we tried to answer the following questions: how much ligand and binding site are associated with protein function, and how ligands themselves are related to each other in terms of binding site. To answer the questions, we presented binding similarity network of ligand. Through the network analysis, we attempted to reveal systematic relationship between the ligand and binding site. The results showed that ligand binding site and function were closely related (conservation ratio, 81%). We also showed conservative tendency of function in line with ligand structure similarity with some exceptional cases. In addition, the binding similarity network of ligand revealed scale-free property to some degree like other biological networks. Since most nodes formed highly connected cluster, a clustering coefficient was very high compared with random. All the highly connected ligands (hubs) were involved in various functions forming large cluster and tended to act as a bridge between modular clusters in the network.

• SuperPred: drug classification and target prediction.
Dunkel, Mathias and Günther, Stefan and Ahmed, Jessica and Wittig, Burghardt and Preissner, Robert
Nucleic acids research, 2008, 36(Web Server issue), W55-9
PMID: 18499712     doi: 10.1093/nar/gkn307

The drug classification scheme of the World Health Organization (WHO) [Anatomical Therapeutic Chemical (ATC)-code] connects chemical classification and therapeutic approach. It is generally accepted that compounds with similar physicochemical properties exhibit similar biological activity. If this hypothesis holds true for drugs, then the ATC-code, the putative medical indication area and potentially the medical target should be predictable on the basis of structural similarity. We have validated that the prediction of the drug class is reliable for WHO-classified drugs. The reliability of the predicted medical effects of the compounds increases with a rising number of (physico-) chemical properties similar to a drug with known function. The web-server translates a user-defined molecule into a structural fingerprint that is compared to about 6300 drugs, which are enriched by 7300 links to molecular targets of the drugs, derived through text mining followed by manual curation. Links to the affected pathways are provided. The similarity to the medical compounds is expressed by the Tanimoto coefficient that gives the structural similarity of two compounds. A similarity score higher than 0.85 results in correct ATC prediction for 81% of all cases. As the biological effect is well predictable, if the structural similarity is sufficient, the web-server allows prognoses about the medical indication area of novel compounds and to find new leads for known targets. Availability: the system is freely accessible at http://bioinformatics.charite.de/superpred. SuperPred can be obtained via a Creative Commons Attribution Noncommercial-Share Alike 3.0 License.

• Prediction of drug-target interaction networks from the integration of chemical and genomic spaces.
Yamanishi, Yoshihiro and Araki, Michihiro and Gutteridge, Alex and Honda, Wataru and Kanehisa, Minoru
Bioinformatics (Oxford, England), 2008, 24(13), i232-40
PMID: 18586719     doi: 10.1093/bioinformatics/btn162

MOTIVATION:The identification of interactions between drugs and target proteins is a key area in genomic drug discovery. Therefore, there is a strong incentive to develop new methods capable of detecting these potential drug-target interactions efficiently.

• Computer-aided drug discovery and development (CADDD): In silico-chemico-biological approach
Kapetanovic, I. M.
Journal of chemical information and modeling, 2008, 49(10), 165-176
PMID: 19764745     doi: 10.1016/j.cbi.2006.12.006

It is generally recognized that drug discovery and development are very time and resources consuming processes. There is an ever growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In biomedical arena, computer-aided or in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile and avoid safety issues. Commonly used computational approaches include ligand-based drug design (pharmacophore, a 3D spatial arrangement of chemical features essential for biological activity), structure-based drug design (drug-target docking), and quantitative structure-activity and quantitative structure-property relationships. Regulatory agencies as well as pharmaceutical industry are actively involved in development of computational tools that will improve effectiveness and efficiency of drug discovery and development process, decrease use of animals, and increase predictability. It is expected that the power of CADDD will grow as the technology continues to evolve. Published by Elsevier Ireland Ltd.

• Computational approaches in chemogenomics and chemical biology: current and future impact on drug discovery.
Bajorath, Jürgen
Expert opinion on drug discovery, 2008, 3(12), 1371-1376
PMID: 23506102     doi: 10.1517/17460440802536496

Background: Chemical biology and chemogenomics are rapidly evolving disciplines at interfaces between chemistry and the life sciences and are highly interdisciplinary in nature. Chemogenomics has a strong conceptional link to modern drug discovery research, whereas chemical biology focuses more on the use of small molecules as probes for exploring biological functions, rather than drug candidates. However, the boundaries between these areas are fluid, as they should be, given their strong interdisciplinary orientation. Objective: Recently, computational approaches have been introduced for the analysis of research topics that are of considerable relevance for these disciplines including, for example, the systematic study of ligand-target interactions or mapping of pharmacologically relevant chemical space. This contribution introduces key investigations in computational chemical biology and chemogenomics and critically evaluates their current and future potential to impact drug discovery. Conclusions: Computational methods of high relevance for chemogenomics and chemical biology either derive knowledge from large-scale analysis of available drug and target data or interface experimental programs with predictive methods. Approaches for drug target prediction and the systematic analysis of polypharmacology substantially impact research in this area.

## 2007

• Relating protein pharmacology by ligand chemistry.
Keiser, Michael J and Roth, Bryan L and Armbruster, Blaine N and Ernsberger, Paul and Irwin, John J and Shoichet, Brian K
Nature biotechnology, 2007, 25(2), 197-206
PMID: 17287757     doi: 10.1038/nbt1284

The identification of protein function based on biological information is an area of intense research. Here we consider a complementary technique that quantitatively groups and relates proteins based on the chemical similarity of their ligands. We began with 65,000 ligands annotated into sets for hundreds of drug targets. The similarity score between each set was calculated using ligand topology. A statistical model was developed to rank the significance of the resulting similarity scores, which are expressed as a minimum spanning tree to map the sets together. Although these maps are connected solely by chemical similarity, biologically sensible clusters nevertheless emerged. Links among unexpected targets also emerged, among them that methadone, emetine and loperamide (Imodium) may antagonize muscarinic M3, alpha2 adrenergic and neurokinin NK2 receptors, respectively. These predictions were subsequently confirmed experimentally. Relating receptors by ligand chemistry organizes biology to reveal unexpected relationships that may be assayed using the ligands themselves.

• Modeling promiscuity based on in vitro safety pharmacology profiling data.
Azzaoui, Kamal and Hamon, Jacques and Faller, Bernard and Whitebread, Steven and Jacoby, Edgar and Bender, Andreas and Jenkins, Jeremy L and Urban, Laszlo
Chemmedchem, 2007, 2(6), 874-880
PMID: 17492703     doi: 10.1002/cmdc.200700036

This study describes a method for mining and modeling binding data obtained from a large panel of targets (in vitro safety pharmacology) to distinguish differences between promiscuous and selective compounds. Two naïve Bayes models for promiscuity and selectivity were generated and validated on a test set as well as publicly available drug databases. The model shows a higher score (lower promiscuity) for marketed drugs than for compounds in early development or compounds that failed during clinical development. Such models can be used in triaging high-throughput screening data or for lead optimization.

## 2006

• Can we rationally design promiscuous drugs?
Hopkins, Andrew L and Mason, Jonathan S and Overington, John P
Current opinion in structural biology, 2006, 16(1), 127-136
PMID: 16442279     doi: 10.1016/j.sbi.2006.01.013

Structure-based drug design is now used widely in modern medicinal chemistry. The application of structural biology to medicinal chemistry has heralded the "rational drug design" vision of discovering exquisitely selective ligands. However, recent advances in post-genomic biology are indicating that polypharmacology may be a necessary trait for the efficacy of many drugs, therefore questioning the "one drug, one target" assumption of current rational drug design. By combining advances in chemoinformatics and structural biology, it might be possible to rationally design the next generation of promiscuous drugs with polypharmacology.

• Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors.
Nettles, James H and Jenkins, Jeremy L and Bender, Andreas and Deng, Zhan and Davies, John W and Glick, Meir
Journal of medicinal chemistry, 2006, 49(23), 6802-6810
PMID: 17154510     doi: 10.1021/jm060902w

Bridging chemical and biological space is the key to drug discovery and development. Typically, cheminformatics methods operate under the assumption that similar chemicals have similar biological activity. Ideally then, one could predict a drug's biological function(s) given only its chemical structure by similarity searching in libraries of compounds with known activities. In practice, effectively choosing a similarity metric is case dependent. This work compares both 2D and 3D chemical descriptors as tools for predicting the biological targets of ligand probes, on the basis of their similarity to reference molecules in a 46,000 compound, biologically annotated chemical database. Overall, we found that the 2D methods employed here outperform the 3D (88% vs 67% success) in correct target prediction. However, the 3D descriptors proved superior in cases of probes with low structural similarity to other compounds in the database (singletons). Additionally, the 3D method (FEPOPS) shows promise for providing pharmacophoric alignment of the small molecules' chemical features consistent with those seen in experimental ligand/ receptor complexes. These results suggest that querying annotated chemical databases with a systematic combination of both 2D and 3D descriptors will prove more effective than employing single methods.

• TarFisDock: a web server for identifying drug targets with docking approach
Li, Honglin and Gao, Zhenting and Kang, Ling and Zhang, Hailei and Yang, Kun and Yu, Kunqian and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Shen, Jianhua and Wang, Xicheng and Jiang, Hualiang
Nucleic acids research, 2006, 34(Web Server issue), W219-W224
PMID: 16844997     doi: 10.1093/nar/gkl114

TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at http://www.dddc.ac.cn/tarfisdock/.

## 2004

• Recognition of functional sites in protein structures.
Shulman-Peleg, Alexandra and Nussinov, Ruth and Wolfson, Haim J
Journal of molecular biology, 2004, 339(3), 607-633
PMID: 15147845     doi: 10.1016/j.jmb.2004.04.012

Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.

• Recovering the true targets of specific ligands by virtual screening of the protein data bank.
Paul, Nicodéme and Kellenberger, Esther and Bret, Guillaume and Muller, Pascal and Rognan, Didier
Proteins, 2004, 54(4), 671-680
PMID: 14997563     doi: 10.1002/prot.10625

The Protein Data Bank (PDB) has been processed to extract a screening protein library (sc-PDB) of 2148 entries. A knowledge-based detection algorithm has been applied to 18,000 PDB files to find regular expressions corresponding to either protein, ions, co-factors, solvent, or ligand atoms. The sc-PDB database comprises high-resolution X-ray structures of proteins for which (i) a well-defined active site exists, (ii) the bound-ligand is a small molecular weight molecule. The database has been screened by an inverse docking tool derived from the GOLD program to recover the known target of four unrelated ligands. Both the database and the inverse screening procedures are accurate enough to rank the true target of the four investigated ligands among the top 1% scorers, with 70-100 fold enrichment with respect to random screening. Applying the proposed screening procedure to a small-sized generic ligand was much less accurate suggesting that inverse screening shall be reserved to rather selective compounds.

## 2003

• Similarity metrics for ligands reflecting the similarity of the target proteins.
Schuffenhauer, Ansgar and Floersheim, Philipp and Acklin, Pierre and Jacoby, Edgar
Journal of Chemical Information and Computer Sciences, 2003, 43(2), 391-405
PMID: 12653501     doi: 10.1021/ci025569t

In this study we evaluate how far the scope of similarity searching can be extended to identify not only ligands binding to the same target as the reference ligand(s) but also ligands of other homologous targets without initially known ligands. This "homology-based similarity searching" requires molecular representations reflecting the ability of a molecule to interact with target proteins. The Similog keys, which are introduced here as a new molecular representation, were designed to fulfill such requirements. They are based only on the molecular constitution and are counts of atom triplets. Each triplet is characterized by the graph distances and the types of its atoms. The atom-typing scheme classifies each atom by its function as H-bond donor or acceptor and by its electronegativity and bulkiness. In this study the Similog keys are investigated in retrospective in silico screening experiments and compared with other conformation independent molecular representations. Studied were molecules of the MDDR database for which the activity data was augmented by standardized target classification information from public protein classification databases. The MDDR molecule set was split randomly into two halves. The first half formed the candidate set. Ligands of four targets (dopamine D2 receptor, opioid delta-receptor, factor Xa serine protease, and progesterone receptor) were taken from the second half to form the respective reference sets. Different similarity calculation methods are used to rank the molecules of the candidate set by their similarity to each of the four reference sets. The accumulated counts of molecules binding to the reference target and groups of targets with decreasing homology to it were examined as a function of the similarity rank for each reference set and similarity method. In summary, similarity searching based on Unity 2D-fingerprints or Similog keys are found to be equally effective in the identification of molecules binding to the same target as the reference set. However, the application of the Similog keys is more effective in comparison with the other investigated methods in the identification of ligands binding to any target belonging to the same family as the reference target. We attribute this superiority to the fact that the Similog keys provide a generalization of the chemical elements and that the keys are counted instead of merely noting their presence or absence in a binary form. The second most effective molecular representation are the occurrence counts of the public ISIS key fragments, which like the Similog method, incorporates key counting as well as a generalization of the chemical elements. The results obtained suggest that ligands for a new target can be identified by the following three-step procedure: 1. Select at least one target with known ligands which is homologous to the new target. 2. Combine the known ligands of the selected target(s) to a reference set. 3. Search candidate ligands for the new targets by their similarity to the reference set using the Similog method. This clearly enlarges the scope of similarity searching from the classical application for a single target to the identification of candidate ligands for whole target families and is expected to be of key utility for further systematic chemogenomics exploration of previously well explored target families.

## 2002

• Do Structurally Similar Molecules Have Similar Biological Activity?
Martin, Yvonne C and Kofron, James L and Traphagen, Linda M
Journal of medicinal chemistry, 2002, 45(19), 4350-4358
PMID: 12213076     doi: 10.1021/jm020155c

To design diverse combinatorial libraries or to select diverse compounds to augment a screening collection, computational chemists frequently reject compounds that are > or