skip to main content
10.1145/3219819.3219882acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks

Published: 19 July 2018 Publication History

Abstract

Designing a new drug is a lengthy and expensive process. As the space of potential molecules is very large (10 23 - 10 60 ), a common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. In this work, we develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug. We show that the molecules generated by the system are valid molecules and significantly different from the prototype drug. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis. The system is currently being deployed for use in collaboration with pharmaceutical companies to further analyze the additional generated molecules.

Supplementary Material

MP4 File (harel_conditional_diversity_networks.mp4)

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. OSDI, Vol. Vol. 16. 265--283.
[2]
Esben Jannik Bjerrum. 2017. Smiles enumeration as data augmentation for neural network modeling of molecules. arXiv preprint arXiv:1703.07076 (2017).
[3]
Andrea Cadeddu, Elizabeth K. Wylie, Janusz Jurczak, Matthew Wampler-Doty, and Bartosz A. Grzybowski. 2014. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angewandte Chemie, Vol. 126, 31 (2014), 8246--8250.
[4]
Connor W. Coley, Regina Barzilay, William H. Green, Tommi S. Jaakkola, and Klavs F. Jensen. 2017. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, Vol. 57, 8 (2017), 1757--1772.
[5]
David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints Advances in neural information processing systems. 2224--2232.
[6]
Peter Ertl, Richard Lewis, Eric Martin, and Valery Polyakov. 2017. In silico generation of novel, drug-like chemical matter using the LSTM neural network. arXiv preprint arXiv:1712.07449 (2017).
[7]
Silvio Garattini. 1997. Are me-too drugs justified? Journal of Nephrology Vol. 10, 6 (1997), 283--294.
[8]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.
[9]
Garrett B. Goh, Charles Siegel, Abhinav Vishnu, and Nathan O. Hodas. 2017. ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction. arXiv preprint arXiv:1712.02734 (2017).
[10]
Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. 2016. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science (2016).
[11]
Anvita Gupta, Alex T. Müller, Berend J. H. Huisman, Jens A. Fuchs, Petra Schneider, and Gisbert Schneider. 2017. Generative Recurrent Networks for De Novo Drug Design. Molecular informatics (2017).
[12]
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 82--97.
[13]
Hisaki Ikebata, Kenta Hongo, Tetsu Isomura, Ryo Maezono, and Ryo Yoshida. 2017. Bayesian molecular design with a chemical language model. Journal of computer-aided molecular design Vol. 31, 4 (2017), 379--391.
[14]
John J. Irwin, Teague Sterling, Michael M. Mysinger, Erin S. Bolstad, and Ryan G. Coleman. 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling, Vol. 52, 7 (2012), 1757--1768.
[15]
Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. 2017. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network Advances in Neural Information Processing Systems. 2604--2613.
[16]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[17]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[20]
Greg Landrum. 2006. RDKit: Open-source cheminformatics. Online). http://www.rdkit. org. Accessed Vol. 3, 04 (2006), 2012.
[21]
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).
[22]
Christopher A. Lipinski. 2000. Drug-like properties and the causes of poor solubility and poor permeability. Journal of pharmacological and toxicological methods, Vol. 44, 1 (2000), 235--249.
[23]
Andreas Mayr, Günter Klambauer, Thomas Unterthiner, and Sepp Hochreiter. 2016. DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science Vol. 3 (2016), 80.
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119.
[25]
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hongming Chen. 2017. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics Vol. 9, 1 (2017), 48.
[26]
World Health Organization et al. 2003. The selection and use of essential medicines: report of the WHO Expert Committee, 2002: (including the 12th model list of essential medicines). (2003).
[27]
Pavel G. Polishchuk, Timur I. Madzhidov, and Alexandre Varnek. 2013. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of computer-aided molecular design Vol. 27, 8 (2013), 675--679.
[28]
Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, and Vijay Pande. 2015. Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015).
[29]
Volker Schnecke and Jonas Boström. 2006. Computational chemistry-driven decision making in lead generation. Drug discovery today, Vol. 11, 1--2 (2006), 43--50.
[30]
Gisbert Schneider. 2017. Automating drug discovery. Nature Reviews Drug Discovery (2017).
[31]
Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, and Teodoro Laino. 2017. "Found in Translation": Predicting Outcome of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models. arXiv preprint arXiv:1711.04810 (2017).
[32]
Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller. 2017. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint arXiv:1701.01329 (2017).
[33]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks Advances in neural information processing systems. 3104--3112.
[34]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[35]
Izhar Wallach, Michael Dzamba, and Abraham Heifets. 2015. Atomnet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 (2015).
[36]
David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, Vol. 28, 1 (1988), 31--36.
[37]
Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation, Vol. 1, 2 (1989), 270--280.
[38]
David S. Wishart, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali, Paul Stothard, Zhan Chang, and Jennifer Woolsey. 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research Vol. 34, suppl_1 (2006), D668--D672.
[39]
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical Science, Vol. 9, 2 (2018), 513--530.

Cited By

View all
  • (2024)Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generationThe Journal of Supercomputing10.1007/s11227-024-06250-280:14(21207-21228)Online publication date: 6-Jun-2024
  • (2023)Investigation of chemical structure recognition by encoder–decoder models in learning progressJournal of Cheminformatics10.1186/s13321-023-00713-z15:1Online publication date: 12-Apr-2023
  • (2023)GMG-NCDVAE: Guided de novo Molecule Generation using NLP Techniques and Constrained Diverse Variational AutoencoderACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3610533Online publication date: 2-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning for medicine
  2. drug design
  3. prototype-based drug discovery

Qualifiers

  • Research-article

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generationThe Journal of Supercomputing10.1007/s11227-024-06250-280:14(21207-21228)Online publication date: 6-Jun-2024
  • (2023)Investigation of chemical structure recognition by encoder–decoder models in learning progressJournal of Cheminformatics10.1186/s13321-023-00713-z15:1Online publication date: 12-Apr-2023
  • (2023)GMG-NCDVAE: Guided de novo Molecule Generation using NLP Techniques and Constrained Diverse Variational AutoencoderACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3610533Online publication date: 2-Aug-2023
  • (2023)CFOM: Lead Optimization For Drug Discovery With Limited DataProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614807(1056-1066)Online publication date: 21-Oct-2023
  • (2023)GenSMILES: An enhanced validity conscious representation for inverse design of moleculesKnowledge-Based Systems10.1016/j.knosys.2023.110429268(110429)Online publication date: May-2023
  • (2022)Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property PredictionProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557085(2903-2912)Online publication date: 17-Oct-2022
  • (2021)Multi-Property Molecular Optimization using an Integrated Poly-Cycle ArchitectureProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481938(3727-3736)Online publication date: 26-Oct-2021
  • (2021)Unpaired Generative Molecule-to-Molecule Translation for Lead OptimizationProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467120(2554-2564)Online publication date: 14-Aug-2021
  • (2021)Transmol: repurposing a language model for molecular generationRSC Advances10.1039/D1RA03086H11:42(25921-25932)Online publication date: 2021
  • (2021)RapidSwap: a Hierarchical Far MemoryEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-030-92916-9_12(143-151)Online publication date: 9-Dec-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media