research-article

Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks

Authors:

Kira RadinskyAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 331 - 339

https://doi.org/10.1145/3219819.3219882

Published: 19 July 2018 Publication History

Abstract

Designing a new drug is a lengthy and expensive process. As the space of potential molecules is very large (10 23 - 10 60 ), a common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. In this work, we develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug. We show that the molecules generated by the system are valid molecules and significantly different from the prototype drug. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis. The system is currently being deployed for use in collaboration with pharmaceutical companies to further analyze the additional generated molecules.

Supplementary Material

MP4 File (harel_conditional_diversity_networks.mp4)

Download
474.62 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. OSDI, Vol. Vol. 16. 265--283.

Digital Library

[2]

Esben Jannik Bjerrum. 2017. Smiles enumeration as data augmentation for neural network modeling of molecules. arXiv preprint arXiv:1703.07076 (2017).

[3]

Andrea Cadeddu, Elizabeth K. Wylie, Janusz Jurczak, Matthew Wampler-Doty, and Bartosz A. Grzybowski. 2014. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angewandte Chemie, Vol. 126, 31 (2014), 8246--8250.

[4]

Connor W. Coley, Regina Barzilay, William H. Green, Tommi S. Jaakkola, and Klavs F. Jensen. 2017. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, Vol. 57, 8 (2017), 1757--1772.

[5]

David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints Advances in neural information processing systems. 2224--2232.

Digital Library

[6]

Peter Ertl, Richard Lewis, Eric Martin, and Valery Polyakov. 2017. In silico generation of novel, drug-like chemical matter using the LSTM neural network. arXiv preprint arXiv:1712.07449 (2017).

[7]

Silvio Garattini. 1997. Are me-too drugs justified? Journal of Nephrology Vol. 10, 6 (1997), 283--294.

[8]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.

[9]

Garrett B. Goh, Charles Siegel, Abhinav Vishnu, and Nathan O. Hodas. 2017. ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction. arXiv preprint arXiv:1712.02734 (2017).

[10]

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. 2016. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science (2016).

[11]

Anvita Gupta, Alex T. Müller, Berend J. H. Huisman, Jens A. Fuchs, Petra Schneider, and Gisbert Schneider. 2017. Generative Recurrent Networks for De Novo Drug Design. Molecular informatics (2017).

[12]

Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 82--97.

[13]

Hisaki Ikebata, Kenta Hongo, Tetsu Isomura, Ryo Maezono, and Ryo Yoshida. 2017. Bayesian molecular design with a chemical language model. Journal of computer-aided molecular design Vol. 31, 4 (2017), 379--391.

[14]

John J. Irwin, Teague Sterling, Michael M. Mysinger, Erin S. Bolstad, and Ryan G. Coleman. 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling, Vol. 52, 7 (2012), 1757--1768.

[15]

Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. 2017. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network Advances in Neural Information Processing Systems. 2604--2613.

[16]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[17]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[18]

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

Digital Library

[20]

Greg Landrum. 2006. RDKit: Open-source cheminformatics. Online). http://www.rdkit. org. Accessed Vol. 3, 04 (2006), 2012.

[21]

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).

[22]

Christopher A. Lipinski. 2000. Drug-like properties and the causes of poor solubility and poor permeability. Journal of pharmacological and toxicological methods, Vol. 44, 1 (2000), 235--249.

[23]

Andreas Mayr, Günter Klambauer, Thomas Unterthiner, and Sepp Hochreiter. 2016. DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science Vol. 3 (2016), 80.

[24]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119.

Digital Library

[25]

Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hongming Chen. 2017. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics Vol. 9, 1 (2017), 48.

[26]

World Health Organization et al. 2003. The selection and use of essential medicines: report of the WHO Expert Committee, 2002: (including the 12th model list of essential medicines). (2003).

[27]

Pavel G. Polishchuk, Timur I. Madzhidov, and Alexandre Varnek. 2013. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of computer-aided molecular design Vol. 27, 8 (2013), 675--679.

[28]

Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, and Vijay Pande. 2015. Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015).

[29]

Volker Schnecke and Jonas Boström. 2006. Computational chemistry-driven decision making in lead generation. Drug discovery today, Vol. 11, 1--2 (2006), 43--50.

[30]

Gisbert Schneider. 2017. Automating drug discovery. Nature Reviews Drug Discovery (2017).

[31]

Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, and Teodoro Laino. 2017. "Found in Translation": Predicting Outcome of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models. arXiv preprint arXiv:1711.04810 (2017).

[32]

Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller. 2017. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint arXiv:1701.01329 (2017).

[33]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks Advances in neural information processing systems. 3104--3112.

Digital Library

[34]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.

[35]

Izhar Wallach, Michael Dzamba, and Abraham Heifets. 2015. Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 (2015).

[36]

David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, Vol. 28, 1 (1988), 31--36.

Digital Library

[37]

Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation, Vol. 1, 2 (1989), 270--280.

Digital Library

[38]

David S. Wishart, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali, Paul Stothard, Zhan Chang, and Jennifer Woolsey. 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research Vol. 34, suppl_1 (2006), D668--D672.

[39]

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical Science, Vol. 9, 2 (2018), 513--530.

Cited By

Bhadwal AKumar K(2024)Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generationThe Journal of Supercomputing10.1007/s11227-024-06250-280:14(21207-21228)Online publication date: 6-Jun-2024
https://doi.org/10.1007/s11227-024-06250-2
Nemoto SMizuno TKusuhara H(2023)Investigation of chemical structure recognition by encoder–decoder models in learning progressJournal of Cheminformatics10.1186/s13321-023-00713-z15:1Online publication date: 12-Apr-2023
https://doi.org/10.1186/s13321-023-00713-z
Bhadwal AKumar KKumar N(2023)GMG-NCDVAE: Guided de novo Molecule Generation using NLP Techniques and Constrained Diverse Variational AutoencoderACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3610533Online publication date: 2-Aug-2023
https://dl.acm.org/doi/10.1145/3610533
Show More Cited By

Index Terms

Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Structure and multilingual text search
        Chemical and biochemical retrieval

Recommendations

Structure based drug design studies on urokinase plasminogen activator inhibitors using AutoDock
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

The urokinase plasminogen activator receptor (uPAR) is a glycosylphosphatidylinositol (GPI) membrane-anchored receptor that binds the serine protease urokinase plasminogen activator (uPA). That uPAR plays an important role in determining malignancy of ...
Coarse-Grained Modeling of the HIV---1 Protease Binding Mechanisms: II. Folding Inhibition
Computational Intelligence Methods for Bioinformatics and Biostatistics

Evolutionary and structurally conserved fragments 24---34 and 83---93 from each of the HIV---1 protease (HIV---1 PR) monomers constitute the critical components of the HIV---1 PR folding nucleus. It has been recently discovered that the peptide with the ...
Structural insights for rational design of new PIM-1 kinase inhibitors based on 3,5-disubstituted indole derivatives: An integrative computational approach
Abstract
Proviral integration Moloney virus (PIM) 1, 2, and 3 kinases are a family of constitutively active serine/threonine kinases that are involved in a number of signaling pathways important to cancer cells. Their overexpression in a ...
Graphical abstract

Display Omitted
Highlights
- 3D-QSAR study of 3,5-disubstituted indole derivatives as PIM-1 kinase inhibitors was reported.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
569
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)4

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhadwal AKumar K(2024)Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generationThe Journal of Supercomputing10.1007/s11227-024-06250-280:14(21207-21228)Online publication date: 6-Jun-2024
https://doi.org/10.1007/s11227-024-06250-2
Nemoto SMizuno TKusuhara H(2023)Investigation of chemical structure recognition by encoder–decoder models in learning progressJournal of Cheminformatics10.1186/s13321-023-00713-z15:1Online publication date: 12-Apr-2023
https://doi.org/10.1186/s13321-023-00713-z
Bhadwal AKumar KKumar N(2023)GMG-NCDVAE: Guided de novo Molecule Generation using NLP Techniques and Constrained Diverse Variational AutoencoderACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3610533Online publication date: 2-Aug-2023
https://dl.acm.org/doi/10.1145/3610533
Kaminsky NSinger URadinsky KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)CFOM: Lead Optimization For Drug Discovery With Limited DataProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614807(1056-1066)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614807
Bhadwal AKumar KKumar N(2023)GenSMILES: An enhanced validity conscious representation for inverse design of moleculesKnowledge-Based Systems10.1016/j.knosys.2023.110429268(110429)Online publication date: May-2023
https://doi.org/10.1016/j.knosys.2023.110429
Benjamin RSinger URadinsky KAl Hasan MXiong L(2022)Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property PredictionProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557085(2903-2912)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557085
Barshatski GNordon GRadinsky KDemartini GZuccon GCulpepper JHuang ZTong H(2021)Multi-Property Molecular Optimization using an Integrated Poly-Cycle ArchitectureProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481938(3727-3736)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3481938
Barshatski GRadinsky KZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Unpaired Generative Molecule-to-Molecule Translation for Lead OptimizationProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467120(2554-2564)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467120
Zhumagambetov RMolnár FPeshkov VFazli S(2021)Transmol: repurposing a language model for molecular generationRSC Advances10.1039/D1RA03086H11:42(25921-25932)Online publication date: 2021
https://doi.org/10.1039/D1RA03086H
Kim HJo CAltmann JEgger B(2021)RapidSwap: a Hierarchical Far MemoryEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-030-92916-9_12(143-151)Online publication date: 9-Dec-2021
https://doi.org/10.1007/978-3-030-92916-9_12
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents