ABSTRACT
This paper focuses on the overall task of recommending to the chemist candidate molecules (reactants) necessary to synthesize a given target molecule (product), which is a novel application as well as an important step for the chemist to find a synthesis route to generate the product. We formulate this task as a link-prediction problem over a so-called Network of Organic Chemistry (NOC) that we have constructed from 8 million chemical reactions described in the US patent literature between 1976 and 2013. We leverage state-of-the-art factorization algorithms for recommender systems to solve this task. Our empirical evaluation demonstrates that Factorization Machines, trained with chemistry-specific knowledge, outperforms current methods based on similarity of chemical structures.
- Kyle J. M. Bishop, Rafal Klajn, and Bartosz A. Grzybowski. 2006. The core and most useful molecules in organic chemistry. Angewandte Chemie - International Edition 45, 32 (2006), 5348--5354.Google ScholarCross Ref
- E. J. Corey and W. T. Wipke. 1969. Computer-assisted design of complex organic syntheses. Science 166, 3902 (October 1969), 178--92.Google ScholarCross Ref
- Joseph L. Durant, Burton A. Leland, Douglas R. Henry, and James G. Nourse. 2002. Reoptimization of MDL Keys for Use in Drug Discovery. Journal of Chemical Information and Modeling 42, 6 (November 2002), 1273--1280.Google Scholar
- David K. Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceeedings of the 28th Annual Conference on Neural Information Processing Systems. 2224--2232. Google ScholarDigital Library
- Kerry Dwan, Douglas G. Altman, Juan A. Arnaiz, Jill Bloom, An-Wen Chan, Eugenia Cronin, Evelyne Decullier, Philippa J. Easterbrook, Erik Von Elm, Carrol Gamble, Davina Ghersi, John P. A. Ioannidis, John Simes, and Paula R. Williamson. 2008. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PloS one 3, 8 (January 2008), e3081.Google ScholarCross Ref
- Zeno Gantner, Lucas Drumond, Christoph Freudenthaler, Steffen Rendle, and Lars Schmidt-Thieme. 2010. Learning attribute-to-feature mappings for cold-start recommendations. In Proceedings of the 10th IEEE International Conference on Data Mining. 176--185. Google ScholarDigital Library
- Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. MyMediaLite: A Free Recommender System Library. In 5th ACM International Conference on Recommender Systems (RecSys 2011). Google ScholarDigital Library
- Chris M. Gothard, Siowling Soh, Nosheen A. Gothard, Bartlomiej Kowalczyk, YanhuWei, Bilge Baytekin, and Bartosz A. Grzybowski. 2012. Rewiring Chemistry: Algorithmic Discovery and Experimental Validation of One-Pot Reactions in the Network of Organic Chemistry. Angewandte Chemie International Edition 51, 32 (2012), 7922--7927.Google ScholarCross Ref
- Bartosz A. Grzybowski, Kyle J. M. Bishop, Bartlomiej Kowalczyk, and Christopher E. Wilmer. 2009. The "wired" universe of organic chemistry. Nature Chemistry 1, 1 (April 2009), 31--6.Google ScholarCross Ref
- Abraham Heifets and Igor Jurisica. 2012. Construction of New Medicines via Game Proof Search. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, Vol. 2. 1564--1570. http://www.scopus.com/inward/record.url?eid=2-s2.0-84868287208 Google ScholarDigital Library
- Matthew A. Kayala and Pierre Baldi. 2011. A Machine Learning Approach to Predict Chemical Reactions. In Proceeding of the 25th Annual Conference on Advances in Neural Information Processing Systems. 747--755. http://papers.nips.cc/paper/4356-a-machine-learning-approach-to-predict-chemical-reactions.pdf Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30--37. Google ScholarDigital Library
- Yehuda Koren and Robert M. Bell. 2015. Recommender Systems Handbook. Springer, Chapter Advances in Collaborative Filtering, 77--118.Google Scholar
- Mikolaj Kowalik, Chris M. Gothard, Aaron M. Drews, Nosheen A. Gothard, Alex Weckiewicz, Patrick E. Fuller, Bartosz A. Grzybowski, and Kyle J. M. Bishop. 2012. Parallel optimization of synthetic pathways within the network of organic chemistry. Angewandte Chemie (International ed. in English) 51, 32 (August 2012), 7928--32.Google Scholar
- James Law, Zsolt Zsoldos, Aniko Simon, Darryl Reid, Yang Liu, Sing Yoong Khew, A. Peter Johnson, Sarah Major, Robert A.Wade, and Howard Y. Ando. 2009. Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation. Journal of Chemical Information and Modeling 49, 3 (2009), 593--602.Google ScholarCross Ref
- Daniel M. Lowe. 2012. Extraction of Chemical Structures and Reactions from the Literature. Ph.D. Dissertation. University of Cambridge.Google Scholar
- Aditya Krishna Menon and Charles Elkan. 2011. Link prediction via matrix factorization. Machine Learning and Knowledge Discovery in Databases 6912 (2011), 437--452. Google ScholarDigital Library
- Noel M. O'Boyle, Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, and Geoffrey R. Hutchison. 2011. Open Babel: An open chemical toolbox. Journal of Cheminformatics 3, 1 (January 2011), 33.Google ScholarCross Ref
- Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 10th IEEE International Conference on Data Mining. 995--1000. Google ScholarDigital Library
- Steffen Rendle. 2012. Factorization Machine with libFM. ACM Transactions on Intelligent Systems and Technology 3, 3 (2012), 1--22. Google ScholarDigital Library
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Vol. cs.LG. 452--461. arXiv:1205.2618 http://dl.acm.org/citation.cfm?id=1795167 Google ScholarDigital Library
- Nadine Schneider, Daniel M. Lowe, Roger A. Sayle, and Gregory A. Landrum. 2015. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. Journal of Chemical Information and Modeling 55, 1 (2015), 39--53.Google ScholarCross Ref
- Marwin H. S. Segler and Mark P. Waller. 2017. Modelling Chemical Reasoning to Predict and Invent Reactions. Chemistry - A European Journal 1521, 3765 (2017).Google Scholar
- Marwin H. S. Segler and Mark P.Waller. 2017. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chemistry - A European Journal 1521, 3765 (2017).Google Scholar
- Vladimir N. Vapnik. 1998. Statisical Learning Theory. Wiley.Google Scholar
- Peter Willett, John M. Barnard, and Geoffrey M. Downs. 1998. Chemical Similarity Searching. Journal of Chemical Information and Modeling 38, 6 (November 1998), 983--996.Google Scholar
- Yoshihiro Yamanishi, Jean-Philippe Vert, and Minoru Kanehisa. 2005. Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinromatics 21 (2005), 468--477. Google ScholarDigital Library
Index Terms
- Chemical Reactant Recommendation Using a Network of Organic Chemistry
Recommendations
AR-some Chemistry Models:Interactable 3D Molecules through Augmented Reality
MobileHCI '21: Adjunct Publication of the 23rd International Conference on Mobile Human-Computer InteractionAugmented Reality (AR) presents many opportunities to design systems that can aid students in learning complex chemistry concepts. Chemistry is a 3D concept that student soften have trouble visualizing using 2D media. AR-some Chemistry Models is an AR ...
Network predicting drug’s anatomical therapeutic chemical code
Motivation: Discovering drug’s Anatomical Therapeutic Chemical (ATC) classification rules at molecular level is of vital importance to understand a vast majority of drugs action. However, few studies attempt to annotate drug’s potential ATC-codes by ...
Intervals and the deduction of drug binding site models
HICSS '95: Proceedings of the 28th Hawaii International Conference on System SciencesIn the search for new drugs, it often occurs that the binding affinities of several compounds to a common receptor macromolecule are known experimentally. But the structure of the receptor is not known. We describe an extraordinarily objective computer ...
Comments