skip to main content
10.1145/3219819.3219961acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

Published:19 July 2018Publication History

ABSTRACT

Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor'' in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.

Skip Supplemental Material Section

Supplemental Material

wang_dynamic_treatment.mp4

mp4

286.7 MB

References

  1. Pieter Abbeel and Andrew Y Ng . 2004. Apprenticeship learning via inverse reinforcement learning ICML. ACM, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D Almirall, S. N. Compton, M Gunlicks-Stoessel, N. Duan, and S. A. Murphy . 2012. Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy. Statistics in Medicine (2012), 1887--902.Google ScholarGoogle ScholarCross RefCross Ref
  3. Jacek M Bajor and Thomas A Lasko . 2017. Predicting Medications from Diagnostic Codes with Recurrent Neural Networks. ICLR (2017).Google ScholarGoogle Scholar
  4. Andrew G Barto . 2002. Reinforcement Learning in Motor Control. In The handbook of brain theory and neural networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. MTRAG Barto . 2004. supervised actor-critic reinforcement learning. Handbook of learning and approximate dynamic programming (2004), 359.Google ScholarGoogle Scholar
  6. Hamid Benbrahim and Judy A Franklin . 1997. Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems (1997), 283--302.Google ScholarGoogle Scholar
  7. Bibhas Chakraborty and EE Moodie . 2013. Statistical methods for dynamic treatment regimes. Springer.Google ScholarGoogle Scholar
  8. N Cheerla and O Gevaert . 2017. MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation:. Bmc Bioinformatics (2017), 32.Google ScholarGoogle ScholarCross RefCross Ref
  9. Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun . 2016. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks Proceedings of the 1st Machine Learning for Healthcare Conference. PMLR, 301--318.Google ScholarGoogle Scholar
  10. Jeffery A. Clouse and Paul E. Utgoff . 1992. A Teaching Method for Reinforcement Learning. In International Workshop on Machine Learning. 92--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thomas Degris, Patrick M Pilarski, and Richard S Sutton . 2012. Model-free reinforcement learning with continuous action in practice American Control Conference (ACC). IEEE, 2177--2182.Google ScholarGoogle Scholar
  12. Chelsea Finn, Sergey Levine, and Pieter Abbeel . 2016. Guided cost learning: Deep inverse optimal control via policy optimization ICML. 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M Gunlicksstoessel, L Mufson, A Westervelt, D Almirall, and S Murphy . 2017. A Pilot SMART for Developing an Adaptive Treatment Strategy for Adolescent Depression. J Clin Child Adolesc Psychol (2017), 1--15.Google ScholarGoogle Scholar
  14. Matthew Hausknecht and Peter Stone . 2015. Deep recurrent q-learning for partially observable mdps. CoRR, abs/1507.06527 (2015).Google ScholarGoogle Scholar
  15. Jianying Hu, Adam Perer, and Fei Wang . 2016. Data driven analytics for personalized healthcare. In Healthcare Information Management Systems. 529--554.Google ScholarGoogle Scholar
  16. Alistair EW Johnson, Tom J Pollard, Lu Shen, et almbox. . 2016. MIMIC-III, a freely accessible critical care database. Scientific data (2016).Google ScholarGoogle Scholar
  17. Vijay R Konda and John N Tsitsiklis . 2000. Actor-critic algorithms. In NIPS. 1008--1014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel . 2016. End-to-end training of deep visuomotor policies. JMLR (2016), 1--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sergey Levine, Zoran Popovic, and Vladlen Koltun . 2011. Nonlinear inverse reinforcement learning with gaussian processes NIPS. 19--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google ScholarGoogle Scholar
  21. P. E. Marik . 2015. The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiologica Scandinavica (2015), 561.Google ScholarGoogle Scholar
  22. Oliver Mihatsch and Ralph Neuneier . 2002. Risk-sensitive reinforcement learning. Machine learning (2002), 267--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature (2015), 529--533.Google ScholarGoogle Scholar
  24. Susan A Murphy . 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 65, 2 (2003), 331--355.Google ScholarGoogle ScholarCross RefCross Ref
  25. Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford . 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In Engineering in Medicine and Biology Society. 2978.Google ScholarGoogle Scholar
  26. Niranjani Prasad, Li Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engelhardt . 2017. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units. (2017).Google ScholarGoogle Scholar
  27. Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi . 2017. Deep Reinforcement Learning for Sepsis Treatment. arXiv preprint arXiv:1711.09602 (2017).Google ScholarGoogle Scholar
  28. Nathan D Ratliff, J Andrew Bagnell, and Martin A Zinkevich . 2006. Maximum margin planning. In ICML. ACM, 729--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. James Robins . 1986. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical modelling (1986), 1393--1512.Google ScholarGoogle ScholarCross RefCross Ref
  30. M Rosen-Zvi, AProsperi M Altmann, E Aharoni, et almbox. . 2008. Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics (2008), 399--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Susan M. Shortreed and Erica E. M. Moodie . 2012. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple-assignment randomized Clinical Antipsychotic Trials of Intervention and Effectiveness schizophrenia study. Journal of the Royal Statistical Society (2012), 577--599.Google ScholarGoogle Scholar
  32. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller . 2014. Deterministic policy gradient algorithms. In ICML. 387--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Leilei Sun, Chuanren Liu, Chonghui Guo, Hui Xiong, and Yanming Xie . 2016. Data-driven Automatic Treatment Regimen Development and Recommendation. KDD. 1865--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Daniel J. Lizotte T. Scott Stroup Joelle Pineau Susan A. Murphy Susan M. Shortreed, Eric Laber . 2011. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine Learning (2011), 109--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour . 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Personalized Prescription for Comorbidity. In International Conference on Database Systems for Advanced Applications. 3-19.Google ScholarGoogle Scholar
  37. Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning (1992), 279--292.Google ScholarGoogle Scholar
  38. Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits . 2017. Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients. arXiv preprint arXiv:1712.00654 (2017).Google ScholarGoogle Scholar
  39. Daan Wierstra, Alexander Foerster, Jan Peters, and Juergen Schmidhuber . 2007. Solving deep memory POMDPs with recurrent policy gradients International Conference on Artificial Neural Networks. Springer, 697--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ping Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino . 2014. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Summits on Translational Science Proceedings (2014), 132.Google ScholarGoogle Scholar
  41. Yutao Zhang, Robert Chen, Jie Tang, Walter F. Stewart, and Jimeng Sun . 2017. LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity. In KDD. 1315--1324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yufan Zhao, Donglin Zeng, Mark A Socinski, and Michael R Kosorok . 2011. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer. Biometrics (2011), 1422--1433.Google ScholarGoogle Scholar
  43. Chen Zhuo, Marple Kyle, Salazar Elmer, Gupta Gopal, and Tamil Lakshman . 2016. A Physician Advisory System for Chronic Heart Failure management based on knowledge patterns. Theory and Practice of Logic Programming (2016), 604--618.Google ScholarGoogle Scholar
  44. Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey . 2008. Maximum Entropy Inverse Reinforcement Learning.. In AAAI. 1433--1438. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2018
        2925 pages
        ISBN:9781450355520
        DOI:10.1145/3219819

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader