skip to main content
10.1145/3097983.3098088acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

Published:13 August 2017Publication History

ABSTRACT

Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the temporality and high dimensionality of sequential EHR data and to interpret the prediction results. Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results. However, RNN-based approaches suffer from the problem that the performance of RNNs drops when the length of sequences is large, and the relationships between subsequent visits are ignored by current RNN-based approaches. To address these issues, we propose Dipole, an end-to-end, simple and robust model for predicting patients' future health information. Dipole employs bidirectional recurrent neural networks to remember all the information of both the past visits and the future visits, and it introduces three attention mechanisms to measure the relationships of different visits for the prediction. With the attention mechanisms, Dipole can interpret the prediction results effectively. Dipole also allows us to interpret the learned medical code representations which are confirmed positively by medical experts. Experimental results on two real world EHR datasets show that the proposed Dipole can significantly improve the prediction accuracy compared with the state-of-the-art diagnosis prediction approaches and provide clinically meaningful interpretation.

References

  1. Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu 2015. Multiple Object Recognition with Visual Attention. Proceedings of the 3rd International Conference on Learning Representations (ICLR'15).Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate Proceedings of the 3rd International Conference on Learning Representations (ICLR'15).Google ScholarGoogle Scholar
  3. Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks Vol. 5, 2 (1994), 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU Math Compiler in Python. In Proceedings of the 9th Python in Science Conference (SciPy'10). 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  5. Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15). ACM, 507--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu 2016. Recurrent Neural Networks for Multivariate Time Series with Missing Values. arXiv preprint arXiv:1606.01865 (2016).Google ScholarGoogle Scholar
  7. Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk Prediction with Electronic Health Records: A Deep Learning Approach Proceedings of the 2016 SIAM International Conference on Data Mining (SDM'16). 432--440.Google ScholarGoogle Scholar
  8. Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv preprint arXiv:1409.1259 (2014).Google ScholarGoogle Scholar
  9. Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. 2016. Multi-layer Representation Learning for Medical Concepts Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16). 1495--1504.Google ScholarGoogle Scholar
  10. Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun 2017. GRAM: Graph-based Attention Model for Healthcare Representation Learning Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'17). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. In Advances in Neural Information Processing Systems (NIPS'16). 3504--3512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edward Choi, Nan Du, Robert Chen, Le Song, and Jimeng Sun 2015. Constructing Disease Network and Temporal Progression Model via Context-sensitive Hawkes Process. In 2015 IEEE International Conference on Data Mining (ICDM'15). IEEE, 721--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based Models for Speech Recognition. In Advances in Neural Information Processing Systems (NIPS'15). 577--585.Google ScholarGoogle Scholar
  14. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems (NIPS'15). 1693--1701.Google ScholarGoogle Scholar
  15. Sepp Hochreiter and Jürgen Schmidhuber 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Peter B Jensen, Lars J Jensen, and Søren Brunak. 2012. Mining Electronic Health Records: Towards Better Research Applications and Clinical Care. Nature Reviews Genetics Vol. 13, 6 (2012), 395--405.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alex M Lamb, Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio 2016. Professor Forcing: A New Algorithm for Training Recurrent Networks Advances In Neural Information Processing Systems (NIPS'16). 4601--4609.Google ScholarGoogle Scholar
  18. Zachary C Lipton, David C Kale, and Randall Wetzel. 2016. Modeling Missing Data in Clinical Time Series with RNNs Proceedings of Machine Learning for Healthcare (MLHC'16).Google ScholarGoogle Scholar
  19. Chuanren Liu, Fei Wang, Jianying Hu, and Hui Xiong. 2015. Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph based Framework Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15). ACM, 705--714.Google ScholarGoogle Scholar
  20. Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15). 1412--1421.Google ScholarGoogle Scholar
  21. Fenglong Ma, Chuishi Meng, Houping Xiao, Qi Li, Jing Gao, Lu Su, and Aidong Zhang 2017. Unsupervised Discovery of Drug Side-Effects from Heterogeneous Data Sources Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'17). ACM.Google ScholarGoogle Scholar
  22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean 2013. Distributed Representations of Words and Phrases and Their Compositionality Advances in Neural Information Processing Systems (NIPS'13). 3111--3119.Google ScholarGoogle Scholar
  23. Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh 2016. Deepr: A Convolutional Net for Medical Records. IEEE Journal of Biomedical and Health Informatics (2016).Google ScholarGoogle Scholar
  24. Mike Schuster and Kuldip K Paliwal 1997. Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing Vol. 45, 11 (1997), 2673--2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qiuling Suo, Fenglong Ma, Giovanni Canino, Jing Gao, Aidong Zhang, Pierangelo Veltri, and Agostino Gnasso 2017. A Multi-task Framework for Monitoring Health Conditions via Attention-based Recurrent Neural Networks. In Proceedings of the AMIA 2017 Annual Symposium (AMIA'17).Google ScholarGoogle Scholar
  26. Xiang Wang, David Sontag, and Fei Wang 2014. Unsupervised Learning of Disease Progression Models Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14). ACM, 85--94.Google ScholarGoogle Scholar
  27. Houping Xiao, Jing Gao, Long Vu, and Deepak S. Turaga. 2017. Learning Temporal State of Diabetes Patients via Combining Behavioral and Demographic Data Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'17). ACM.Google ScholarGoogle Scholar
  28. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Proceedings of the 32nd International Conference on Machine Learning (ICML'15). CoRR, 2048--2057.Google ScholarGoogle Scholar
  29. Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo 2016. Image Captioning with Semantic Attention. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). 4651--4659. Google ScholarGoogle ScholarCross RefCross Ref
  30. Matthew D Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. arXiv preprint arXiv:1212.5701 (2012).Google ScholarGoogle Scholar
  31. Jiayu Zhou, Jimeng Sun, Yashu Liu, Jianying Hu, and Jieping Ye 2013. Patient Risk Prediction Model via Top-k Stability Selection Proceedings of the 13th SIAM International Conference on Data Mining (SDM'13). SIAM, 55--63.Google ScholarGoogle Scholar
  32. Jiayu Zhou, Fei Wang, Jianying Hu, and Jieping Ye. 2014. From Micro to Macro: Data Driven Phenotyping by Densification of Longitudinal Electronic Medical Records. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14). ACM, 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jiayu Zhou, Lei Yuan, Jun Liu, and Jieping Ye. 2011. A Multi-Task Learning Formulation for Predicting Disease Progression Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). ACM, 814--822. endthebibliographyGoogle ScholarGoogle Scholar

Index Terms

  1. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2017
        2240 pages
        ISBN:9781450348874
        DOI:10.1145/3097983

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader