ABSTRACT
Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor'' in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.
Supplemental Material
- Pieter Abbeel and Andrew Y Ng . 2004. Apprenticeship learning via inverse reinforcement learning ICML. ACM, 1. Google ScholarDigital Library
- D Almirall, S. N. Compton, M Gunlicks-Stoessel, N. Duan, and S. A. Murphy . 2012. Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy. Statistics in Medicine (2012), 1887--902.Google ScholarCross Ref
- Jacek M Bajor and Thomas A Lasko . 2017. Predicting Medications from Diagnostic Codes with Recurrent Neural Networks. ICLR (2017).Google Scholar
- Andrew G Barto . 2002. Reinforcement Learning in Motor Control. In The handbook of brain theory and neural networks. Google ScholarDigital Library
- MTRAG Barto . 2004. supervised actor-critic reinforcement learning. Handbook of learning and approximate dynamic programming (2004), 359.Google Scholar
- Hamid Benbrahim and Judy A Franklin . 1997. Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems (1997), 283--302.Google Scholar
- Bibhas Chakraborty and EE Moodie . 2013. Statistical methods for dynamic treatment regimes. Springer.Google Scholar
- N Cheerla and O Gevaert . 2017. MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation:. Bmc Bioinformatics (2017), 32.Google ScholarCross Ref
- Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun . 2016. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks Proceedings of the 1st Machine Learning for Healthcare Conference. PMLR, 301--318.Google Scholar
- Jeffery A. Clouse and Paul E. Utgoff . 1992. A Teaching Method for Reinforcement Learning. In International Workshop on Machine Learning. 92--110. Google ScholarDigital Library
- Thomas Degris, Patrick M Pilarski, and Richard S Sutton . 2012. Model-free reinforcement learning with continuous action in practice American Control Conference (ACC). IEEE, 2177--2182.Google Scholar
- Chelsea Finn, Sergey Levine, and Pieter Abbeel . 2016. Guided cost learning: Deep inverse optimal control via policy optimization ICML. 49--58. Google ScholarDigital Library
- M Gunlicksstoessel, L Mufson, A Westervelt, D Almirall, and S Murphy . 2017. A Pilot SMART for Developing an Adaptive Treatment Strategy for Adolescent Depression. J Clin Child Adolesc Psychol (2017), 1--15.Google Scholar
- Matthew Hausknecht and Peter Stone . 2015. Deep recurrent q-learning for partially observable mdps. CoRR, abs/1507.06527 (2015).Google Scholar
- Jianying Hu, Adam Perer, and Fei Wang . 2016. Data driven analytics for personalized healthcare. In Healthcare Information Management Systems. 529--554.Google Scholar
- Alistair EW Johnson, Tom J Pollard, Lu Shen, et almbox. . 2016. MIMIC-III, a freely accessible critical care database. Scientific data (2016).Google Scholar
- Vijay R Konda and John N Tsitsiklis . 2000. Actor-critic algorithms. In NIPS. 1008--1014. Google ScholarDigital Library
- Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel . 2016. End-to-end training of deep visuomotor policies. JMLR (2016), 1--40. Google ScholarDigital Library
- Sergey Levine, Zoran Popovic, and Vladlen Koltun . 2011. Nonlinear inverse reinforcement learning with gaussian processes NIPS. 19--27. Google ScholarDigital Library
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
- P. E. Marik . 2015. The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiologica Scandinavica (2015), 561.Google Scholar
- Oliver Mihatsch and Ralph Neuneier . 2002. Risk-sensitive reinforcement learning. Machine learning (2002), 267--290. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature (2015), 529--533.Google Scholar
- Susan A Murphy . 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 65, 2 (2003), 331--355.Google ScholarCross Ref
- Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford . 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In Engineering in Medicine and Biology Society. 2978.Google Scholar
- Niranjani Prasad, Li Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engelhardt . 2017. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units. (2017).Google Scholar
- Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi . 2017. Deep Reinforcement Learning for Sepsis Treatment. arXiv preprint arXiv:1711.09602 (2017).Google Scholar
- Nathan D Ratliff, J Andrew Bagnell, and Martin A Zinkevich . 2006. Maximum margin planning. In ICML. ACM, 729--736. Google ScholarDigital Library
- James Robins . 1986. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical modelling (1986), 1393--1512.Google ScholarCross Ref
- M Rosen-Zvi, AProsperi M Altmann, E Aharoni, et almbox. . 2008. Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics (2008), 399--406. Google ScholarDigital Library
- Susan M. Shortreed and Erica E. M. Moodie . 2012. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple-assignment randomized Clinical Antipsychotic Trials of Intervention and Effectiveness schizophrenia study. Journal of the Royal Statistical Society (2012), 577--599.Google Scholar
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller . 2014. Deterministic policy gradient algorithms. In ICML. 387--395. Google ScholarDigital Library
- Leilei Sun, Chuanren Liu, Chonghui Guo, Hui Xiong, and Yanming Xie . 2016. Data-driven Automatic Treatment Regimen Development and Recommendation. KDD. 1865--1874. Google ScholarDigital Library
- Daniel J. Lizotte T. Scott Stroup Joelle Pineau Susan A. Murphy Susan M. Shortreed, Eric Laber . 2011. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine Learning (2011), 109--136. Google ScholarDigital Library
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour . 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarDigital Library
- Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Personalized Prescription for Comorbidity. In International Conference on Database Systems for Advanced Applications. 3-19.Google Scholar
- Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning (1992), 279--292.Google Scholar
- Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits . 2017. Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients. arXiv preprint arXiv:1712.00654 (2017).Google Scholar
- Daan Wierstra, Alexander Foerster, Jan Peters, and Juergen Schmidhuber . 2007. Solving deep memory POMDPs with recurrent policy gradients International Conference on Artificial Neural Networks. Springer, 697--706. Google ScholarDigital Library
- Ping Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino . 2014. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Summits on Translational Science Proceedings (2014), 132.Google Scholar
- Yutao Zhang, Robert Chen, Jie Tang, Walter F. Stewart, and Jimeng Sun . 2017. LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity. In KDD. 1315--1324. Google ScholarDigital Library
- Yufan Zhao, Donglin Zeng, Mark A Socinski, and Michael R Kosorok . 2011. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer. Biometrics (2011), 1422--1433.Google Scholar
- Chen Zhuo, Marple Kyle, Salazar Elmer, Gupta Gopal, and Tamil Lakshman . 2016. A Physician Advisory System for Chronic Heart Failure management based on knowledge patterns. Theory and Practice of Logic Programming (2016), 604--618.Google Scholar
- Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey . 2008. Maximum Entropy Inverse Reinforcement Learning.. In AAAI. 1433--1438. Google ScholarDigital Library
Index Terms
- Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation
Recommendations
Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningDespite intense efforts in basic and clinical research, an individualized ventilation strategy for critically ill patients remains a major challenge. Recently, dynamic treatment regime (DTR) with reinforcement learning (RL) on electronic health records (...
Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment
SMA 2020: The 9th International Conference on Smart Media and ApplicationsSepsis is one of the leading causes of mortality globally that costs billions of dollars annually. Until now, the general method of treatment for sepsis remains uncertain. Therefore, treating septic patients is highly challenging. Some recent research ...
Knowledge-based recurrent neural networks in Reinforcement Learning
ASC '07: Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft ComputingRecurrent Neural Networks (RNNs) have been shown to have a strong ability to solve some hard problems. Learning time for these problems from scratch is typically very long. For supervised learning, several methods have been proposed to reuse existing ...
Comments