research-article

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

Authors:
Lu Wang

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Wei Zhang

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Xiaofeng He

East China Normal University, Shnaghai, China

East China Normal University, Shnaghai, China
View Profile

,
Hongyuan Zha

Georgia Tech, Atlanta, GA, USA

Georgia Tech, Atlanta, GA, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 2447–2456https://doi.org/10.1145/3219819.3219961

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2447–2456

ABSTRACT

Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor'' in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.

Supplemental Material

wang_dynamic_treatment.mp4

mp4

286.7 MB

Download

References

Pieter Abbeel and Andrew Y Ng . 2004. Apprenticeship learning via inverse reinforcement learning ICML. ACM, 1. Google ScholarDigital Library
D Almirall, S. N. Compton, M Gunlicks-Stoessel, N. Duan, and S. A. Murphy . 2012. Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy. Statistics in Medicine (2012), 1887--902.Google ScholarCross Ref
Jacek M Bajor and Thomas A Lasko . 2017. Predicting Medications from Diagnostic Codes with Recurrent Neural Networks. ICLR (2017).Google Scholar
Andrew G Barto . 2002. Reinforcement Learning in Motor Control. In The handbook of brain theory and neural networks. Google ScholarDigital Library
MTRAG Barto . 2004. supervised actor-critic reinforcement learning. Handbook of learning and approximate dynamic programming (2004), 359.Google Scholar
Hamid Benbrahim and Judy A Franklin . 1997. Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems (1997), 283--302.Google Scholar
Bibhas Chakraborty and EE Moodie . 2013. Statistical methods for dynamic treatment regimes. Springer.Google Scholar
N Cheerla and O Gevaert . 2017. MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation:. Bmc Bioinformatics (2017), 32.Google ScholarCross Ref
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun . 2016. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks Proceedings of the 1st Machine Learning for Healthcare Conference. PMLR, 301--318.Google Scholar
Jeffery A. Clouse and Paul E. Utgoff . 1992. A Teaching Method for Reinforcement Learning. In International Workshop on Machine Learning. 92--110. Google ScholarDigital Library
Thomas Degris, Patrick M Pilarski, and Richard S Sutton . 2012. Model-free reinforcement learning with continuous action in practice American Control Conference (ACC). IEEE, 2177--2182.Google Scholar
Chelsea Finn, Sergey Levine, and Pieter Abbeel . 2016. Guided cost learning: Deep inverse optimal control via policy optimization ICML. 49--58. Google ScholarDigital Library
M Gunlicksstoessel, L Mufson, A Westervelt, D Almirall, and S Murphy . 2017. A Pilot SMART for Developing an Adaptive Treatment Strategy for Adolescent Depression. J Clin Child Adolesc Psychol (2017), 1--15.Google Scholar
Matthew Hausknecht and Peter Stone . 2015. Deep recurrent q-learning for partially observable mdps. CoRR, abs/1507.06527 (2015).Google Scholar
Jianying Hu, Adam Perer, and Fei Wang . 2016. Data driven analytics for personalized healthcare. In Healthcare Information Management Systems. 529--554.Google Scholar
Alistair EW Johnson, Tom J Pollard, Lu Shen, et almbox. . 2016. MIMIC-III, a freely accessible critical care database. Scientific data (2016).Google Scholar
Vijay R Konda and John N Tsitsiklis . 2000. Actor-critic algorithms. In NIPS. 1008--1014. Google ScholarDigital Library
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel . 2016. End-to-end training of deep visuomotor policies. JMLR (2016), 1--40. Google ScholarDigital Library
Sergey Levine, Zoran Popovic, and Vladlen Koltun . 2011. Nonlinear inverse reinforcement learning with gaussian processes NIPS. 19--27. Google ScholarDigital Library
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
P. E. Marik . 2015. The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiologica Scandinavica (2015), 561.Google Scholar
Oliver Mihatsch and Ralph Neuneier . 2002. Risk-sensitive reinforcement learning. Machine learning (2002), 267--290. Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature (2015), 529--533.Google Scholar
Susan A Murphy . 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 65, 2 (2003), 331--355.Google ScholarCross Ref
Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford . 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In Engineering in Medicine and Biology Society. 2978.Google Scholar
Niranjani Prasad, Li Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engelhardt . 2017. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units. (2017).Google Scholar
Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi . 2017. Deep Reinforcement Learning for Sepsis Treatment. arXiv preprint arXiv:1711.09602 (2017).Google Scholar
Nathan D Ratliff, J Andrew Bagnell, and Martin A Zinkevich . 2006. Maximum margin planning. In ICML. ACM, 729--736. Google ScholarDigital Library
James Robins . 1986. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical modelling (1986), 1393--1512.Google ScholarCross Ref
M Rosen-Zvi, AProsperi M Altmann, E Aharoni, et almbox. . 2008. Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics (2008), 399--406. Google ScholarDigital Library
Susan M. Shortreed and Erica E. M. Moodie . 2012. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple-assignment randomized Clinical Antipsychotic Trials of Intervention and Effectiveness schizophrenia study. Journal of the Royal Statistical Society (2012), 577--599.Google Scholar
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller . 2014. Deterministic policy gradient algorithms. In ICML. 387--395. Google ScholarDigital Library
Leilei Sun, Chuanren Liu, Chonghui Guo, Hui Xiong, and Yanming Xie . 2016. Data-driven Automatic Treatment Regimen Development and Recommendation. KDD. 1865--1874. Google ScholarDigital Library
Daniel J. Lizotte T. Scott Stroup Joelle Pineau Susan A. Murphy Susan M. Shortreed, Eric Laber . 2011. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine Learning (2011), 109--136. Google ScholarDigital Library
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour . 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarDigital Library
Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Personalized Prescription for Comorbidity. In International Conference on Database Systems for Advanced Applications. 3-19.Google Scholar
Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning (1992), 279--292.Google Scholar
Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits . 2017. Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients. arXiv preprint arXiv:1712.00654 (2017).Google Scholar
Daan Wierstra, Alexander Foerster, Jan Peters, and Juergen Schmidhuber . 2007. Solving deep memory POMDPs with recurrent policy gradients International Conference on Artificial Neural Networks. Springer, 697--706. Google ScholarDigital Library
Ping Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino . 2014. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Summits on Translational Science Proceedings (2014), 132.Google Scholar
Yutao Zhang, Robert Chen, Jie Tang, Walter F. Stewart, and Jimeng Sun . 2017. LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity. In KDD. 1315--1324. Google ScholarDigital Library
Yufan Zhao, Donglin Zeng, Mark A Socinski, and Michael R Kosorok . 2011. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer. Biometrics (2011), 1422--1433.Google Scholar
Chen Zhuo, Marple Kyle, Salazar Elmer, Gupta Gopal, and Tamil Lakshman . 2016. A Physician Advisory System for Chronic Heart Failure management based on knowledge patterns. Theory and Practice of Logic Programming (2016), 604--618.Google Scholar
Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey . 2008. Maximum Entropy Inverse Reinforcement Learning.. In AAAI. 1433--1438. Google ScholarDigital Library

Index Terms

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Despite intense efforts in basic and clinical research, an individualized ventilation strategy for critically ill patients remains a major challenge. Recently, dynamic treatment regime (DTR) with reinforcement learning (RL) on electronic health records (...
Read More
Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment
SMA 2020: The 9th International Conference on Smart Media and Applications

Sepsis is one of the leading causes of mortality globally that costs billions of dollars annually. Until now, the general method of treatment for sepsis remains uncertain. Therefore, treating septic patients is highly challenging. Some recent research ...
Read More
Knowledge-based recurrent neural networks in Reinforcement Learning
ASC '07: Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing

Recurrent Neural Networks (RNNs) have been shown to have a strong ability to solve some hard problems. Learning time for these problems from scratch is typically very long. For supervised learning, several methods have been proposed to reuse existing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep sequential recommendation
dynamic treatment regime
supervised reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 114
  Total Citations
  View Citations
- 3,931
  Total Downloads
- Downloads (Last 12 months)293
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment

Knowledge-based recurrent neural networks in Reinforcement Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes

Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment

Knowledge-based recurrent neural networks in Reinforcement Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media