skip to main content
10.1145/3209626.3209706acmconferencesArticle/Chapter ViewAbstractPublication PagescprConference Proceedingsconference-collections
research-article

A Predictive Method to Determine Incomplete Electronic Medical Records

Published:18 June 2018Publication History

ABSTRACT

This paper is utilizing predictive models to determine missing electronic medical records (EMR) at general practice offices. Prior research has addressed the missing values problem in the EMRs used for secondary analysis. However, health care providers are overlooking the missing records problem that stores the patients' medical visits information in EMRs. Our study provides a technique to predict the number of EMR entries for each practice based on their past data records. If the number of EMR entries is less than predicted, it warns the occurrence of missing records with the 95% confidence interval. The study uses seven years of EMRs from 14 general practice offices to train the predictive model. The model predicts EMR data entries and accordingly identified missing EMRs for the following year. We compared the actual visits illustrated by de-identified billing data to the predictive model. The study found auto-correlation method improves the performance of identifying missing records by detecting the period of prediction. In addition, artificial neural networks and support vector machines perform better than other predictive methods depending on whether the analysis aims at detecting missing EMRs or when identifying complete EMRs with no missing records. Results suggest that clinicians and medical professionals should be mindful of the potential missing records of EMRs prior any secondary analysis.

References

  1. Filippo Amato, Alberto López, Eladia María Peña-Méndez, Petr Vaňhara, Aleš Hampl, and Josef Havel. 2013. Artificial neural networks in medical diagnosis. Elsevier.Google ScholarGoogle Scholar
  2. Danielle GT Arts, Nicolette F. De Keizer, and Gert-Jan Scheffer. 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc. 9, 6 (2002), 600--611.Google ScholarGoogle ScholarCross RefCross Ref
  3. Carl Asche, Qayyim Said, Vijay Joish, Charles Oaxaca Hall, and Diana Brixner. 2008. Assessment of COPD-related outcomes via a national electronic medical record database. Int. J. Chron. Obstruct. Pulmon. Dis. 3, 2 (2008), 323.Google ScholarGoogle ScholarCross RefCross Ref
  4. Steven C. Bagley, Halbert White, and Beatrice A. Golomb. 2001. Logistic regression in the medical literature:: Standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 54, 10 (2001), 979--985.Google ScholarGoogle ScholarCross RefCross Ref
  5. P. Baraldi, F. Di Maio, D. Genini, and E. Zio. 2015. Reconstruction of missing data in multidimensional time series by fuzzy similarity. Appl. Soft Comput. 26, Supplement C (January 2015), 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Roelof K. Brouwer and Witold Pedrycz. 2003. Training a feed-forward network with incomplete data due to missing input variables. Appl. Soft Comput. 3, 1 (July 2003), 23--36.Google ScholarGoogle ScholarCross RefCross Ref
  7. José M. Cadenas, M. Carmen Garrido, and Raquel Martínez. 2013. Feature subset selection Filter--Wrapper based on low quality data. Expert Syst. Appl. 40, 16 (November 2013), 6241--6252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Handan Ankarali Camdeviren, Ayse Canan Yazici, Zeki Akkus, Resul Bugdayci, and Mehmet Ali Sungur. 2007. Comparison of logistic regression model and classification tree: An application to postpartum depression data. Expert Syst. Appl. 32, 4 (May 2007), 987--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Federico Cismondi, André S. Fialho, Susana M. Vieira, Shane R. Reti, João MC Sousa, and Stan N. Finkelstein. 2013. Missing data in medical databases: Impute, delete or classify? Artif. Intell. Med. 58, 1 (2013), 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ali Dag, Asil Oztekin, Ahmet Yucel, Serkan Bulur, and Fadel M. Megahed. 2017. Predicting heart transplantation outcomes through data analytics. Decis. Support Syst. 94, (2017), 42--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ali Dag, Kazim Topuz, Asil Oztekin, Serkan Bulur, and Fadel M. Megahed. 2016. A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival. Decis. Support Syst. 86, (2016), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Glenn De'ath and Katharina E. Fabricius. 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81, 11 (2000), 3178--3192.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. 2005. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng. 17, 7 (2005), 875--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gregor Endler, Philipp Baumgärtel, Andreas M. Wahl, and Richard Lenz. 2015. ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible? In East European Conference on Advances in Databases and Information Systems, 261--274.Google ScholarGoogle ScholarCross RefCross Ref
  16. Benjamin A. Goldstein, Ann Marie Navar, Michael J. Pencina, and John Ioannidis. 2017. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 1 (2017), 198--208.Google ScholarGoogle ScholarCross RefCross Ref
  17. Elham Heidari, Mohammad Amin Sobati, and Salman Movahedirad. 2016. Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom. Intell. Lab. Syst. 155, (2016), 73--85.Google ScholarGoogle Scholar
  18. William R. Hogan and Michael M. Wagner. 1997. Accuracy of data in computer-based patient records. J. Am. Med. Inform. Assoc. 4, 5 (1997), 342--355.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zhen Hu, Genevieve B. Melton, Elliot G. Arsoniadis, Yan Wang, Mary R. Kwaan, and Gyorgy J. Simon. 2017. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J. Biomed. Inform. 68, Supplement C (April 2017), 112--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Turker Ince, Serkan Kiranyaz, Jenni Pulkkinen, and Moncef Gabbouj. 2010. Evaluation of global and local training techniques over feed-forward neural network architecture spaces for computer-aided medical diagnosis. Expert Syst. Appl. 37, 12 (December 2010), 8450--8461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mirjana Ivanovič and Zoran Budimac. 2014. An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41, 11 (September 2014), 5158--5166.Google ScholarGoogle ScholarCross RefCross Ref
  22. José M. Jerez, Ignacio Molina, Pedro J. García-Laencina, Emilio Alba, Nuria Ribelles, Miguel Martín, and Leonardo Franco. 2010. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50, 2 (2010), 105--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sergio Jurado, Àngela Nebot, Fransisco Mugica, and Mihail Mihaylov. 2017. Fuzzy inductive reasoning forecasting strategies able to cope with missing data: A smart grid application. Appl. Soft Comput. 51, Supplement C (February 2017), 225--238.Google ScholarGoogle Scholar
  24. Abel N. Kho, M. Geoffrey Hayes, Laura Rasmussen-Torvik, Jennifer A. Pacheco, William K. Thompson, Loren L. Armstrong, Joshua C. Denny, Peggy L. Peissig, Aaron W. Miller, and Wei-Qi Wei. 2011. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 2 (2011), 212--218.Google ScholarGoogle ScholarCross RefCross Ref
  25. James D. Lewis and Colleen Brensinger. 2004. Agreement between GPRD smoking data: a survey of general practitioners and a population-based survey. Pharmacoepidemiol. Drug Saf. 13, 7 (2004), 437--441.Google ScholarGoogle ScholarCross RefCross Ref
  26. Zhenhui Li, Jingjing Wang, and Jiawei Han. 2012. Mining event periodicity from incomplete observations. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 444--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Siaw-Teng Liaw, Alireza Rahimi, Pradeep Ray, Jane Taggart, S. Dennis, Simon de Lusignan, B. Jalaludin, A. E. T. Yeo, and Amir Talaei-Khoei. 2013. Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inf. 82, 1 (2013), 10--24.Google ScholarGoogle ScholarCross RefCross Ref
  28. Jau-Huei Lin and Peter J. Haug. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J. Biomed. Inform. 41, 1 (February 2008), 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wen-Yang Lin, Lin Lan, Feng-Hsiung Huang, and Min-Hsien Wang. 2015. Rough-set-based ADR signaling from spontaneous reporting data with missing values. J. Biomed. Inform. 58, Supplement C (December 2015), 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Caihua Liu, Amir Talaei-Khoei, Didar Zowghi, and Jay Daniel. 2017. Data Completeness in Healthcare: A Literature Survey. Pac. Asia J. Assoc. Inf. Syst. 9, 2 (2017).Google ScholarGoogle Scholar
  31. Yong-Nan Liu, Jian-Zhong Li, and Zhao-Nian Zou. 2016. Determining the Real Data Completeness of a Relational Dataset. J. Comput. Sci. Technol. 31, 4 (2016), 720--740.Google ScholarGoogle ScholarCross RefCross Ref
  32. Judith R. Logan, Paul N. Gorman, and Blackford Middleton. 2001. Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data. In Proceedings of the AMIA Symposium, 408.Google ScholarGoogle Scholar
  33. Jeanne M. Madden, Matthew D. Lakoma, Donna Rusinak, Christine Y. Lu, and Stephen B. Soumerai. 2016. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J. Am. Med. Inform. Assoc. 23, 6 (2016), 1143--1149.Google ScholarGoogle ScholarCross RefCross Ref
  34. Alan G. Marshall and Francis R. Verdun. 2016. Fourier transforms in NMR, optical, and mass spectrometry: a user's handbook. Elsevier.Google ScholarGoogle Scholar
  35. Thomas Mazzocco and Amir Hussain. 2012. Novel logistic regression models to aid the diagnosis of dementia. Expert Syst. Appl. 39, 3 (February 2012), 3356--3361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Amy McQuillan, Suzanne Aigrain, and Tsevi Mazeh. 2013. Measuring the rotation period distribution of field M dwarfs with Kepler. Mon. Not. R. Astron. Soc. 432, 2 (2013), 1203--1216.Google ScholarGoogle ScholarCross RefCross Ref
  37. Sandra de F. Mendes Sampaio, Chao Dong, and Pedro Sampaio. 2015. DQ2S -- A framework for data quality-aware information management. Expert Syst. Appl. 42, 21 (November 2015), 8304--8326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Amihai Motro and Igor Rakov. 1997. Not all answers are equally good: Estimating the quality of database answers. Springer.Google ScholarGoogle Scholar
  39. Loris Nanni, Alessandra Lumini, and Sheryl Brahnam. 2012. A classifier ensemble approach for the missing feature problem. Artif. Intell. Med. 55, 1 (2012), 37--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Felix Naumann, Johann-Christoph Freytag, and Ulf Leser. 2004. Completeness of integrated information sources. Inf. Syst. 29, 7 (2004), 583--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kais Ncibi, Tarek Sadraoui, Mili Faycel, and Amor Djenina. 2017. A Multilayer Perceptron Artificial Neural Networks Based a Preprocessing and Hybrid Optimization Task for Data Mining and Classification. Int. J. Econom. Financ. Manag. Int. J. Econom. Financ. Manag. 5, 1 (March 2017), 12--21.Google ScholarGoogle Scholar
  42. Z. Omiotek, A. Burda, and W. Wójcik. 2013. The use of decision tree induction and artificial neural networks for automatic diagnosis of Hashimoto's disease. Expert Syst. Appl. 40, 16 (November 2013), 6684--6689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rasaq Otunba and Jessica Lin. 2014. APT: Approximate Period Detection in Time Series. In SEKE, 490--494.Google ScholarGoogle Scholar
  44. Amit Paul, Jaya Sil, and Chitrangada Das Mukhopadhyay. 2017. Gene selection for designing optimal fuzzy rule base classifier by estimating missing value. Appl. Soft Comput. 55, Supplement C (June 2017), 276--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Erkki Pesonen, Matti Eskelinen, and Martti Juhola. 1998. Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artif. Intell. Med. 13, 3 (July 1998), 139--146.Google ScholarGoogle ScholarCross RefCross Ref
  46. Jolt Roukema, Renske K. Los, Sacha E. Bleeker, Astrid M. van Ginneken, Johan van der Lei, and Henriette A. Moll. 2006. Paper versus computer: feasibility of an electronic medical record in general pediatrics. Pediatrics 117, 1 (2006), 15--21.Google ScholarGoogle ScholarCross RefCross Ref
  47. Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henriques Abreu, Hélder Araújo, and João Santos. 2017. Influence of Data Distribution in Missing Data Imputation. In Conference on Artificial Intelligence in Medicine in Europe, 285--294.Google ScholarGoogle Scholar
  48. Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Missing values in deduplication of electronic patient data. J. Am. Med. Inform. Assoc. 19, e1 (2011), e76--e82.Google ScholarGoogle ScholarCross RefCross Ref
  49. Joseph L. Schafer and John W. Graham. 2002. Missing data: our view of the state of the art. Psychol. Methods 7, 2 (2002), 147.Google ScholarGoogle ScholarCross RefCross Ref
  50. Mohammad Ali Shafia, Seyed Jafar Sadjadi, Amin Jamili, Reza Tavakkoli-Moghaddam, and Mohsen Pourseyed-Aghaee. 2012. The periodicity and robustness in a single-track train scheduling problem. Appl. Soft Comput. 12, 1 (January 2012), 440--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Matthew Sperrin, Sarah Thew, James Weatherall, William Dixon, and Iain Buchan. 2011. Quantifying the longitudinal value of healthcare record collections for pharmacoepidemiology. In AMIA Annual Symposium Proceedings, 1318.Google ScholarGoogle Scholar
  52. Dan Steinberg and Phillip Colla. 2009. C&RT: classification and regression trees. Top Ten Algorithms Data Min. 9, (2009), 179.Google ScholarGoogle Scholar
  53. Ruxandra Stoean and Catalin Stoean. 2013. Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst. Appl. 40, 7 (June 2013), 2677--2686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shan Suthaharan. 2016. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification. Springer, 207--235.Google ScholarGoogle Scholar
  55. Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjenka Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. 2011. A predictive model for progression of chronic kidney disease to kidney failure. Jama 305, 15 (2011), 1553--1559.Google ScholarGoogle ScholarCross RefCross Ref
  56. Michal Tkáč and Robert Verner. 2016. Artificial neural networks in business: Two decades of research. Appl. Soft Comput. 38, Supplement C (January 2016), 788--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Michail Vlachos, Philip Yu, and Vittorio Castelli. 2005. On periodicity detection and structural periodic similarity. In Proceedings of the 2005 SIAM International Conference on Data Mining, 449--460.Google ScholarGoogle ScholarCross RefCross Ref
  58. Akbar K Waljee, Peter D R Higgins, and Amit G Singal. 2014. A Primer on Predictive Models. Clin. Transl. Gastroenterol. 5, 1 (January 2014), e44.Google ScholarGoogle ScholarCross RefCross Ref
  59. Nicole G. Weiskopf, George Hripcsak, Sushmita Swaminathan, and Chunhua Weng. 2013. Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46, 5 (2013), 830--836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Nicole Gray Weiskopf and Chunhua Weng. 2013. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20, 1 (2013), 144--151.Google ScholarGoogle ScholarCross RefCross Ref
  61. Adam Wright, Allison B. McCoy, Thu-Trang T. Hickman, Daniel St Hilaire, Damian Borbolla, Watson A. Bowes, William G. Dixon, David A. Dorr, Michael Krall, and Sameer Malholtra. 2015. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. Int. J. Med. Inf. 84, 10 (2015), 784--790.Google ScholarGoogle ScholarCross RefCross Ref
  62. Faramak Zandi. 2014. A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures. Appl. Soft Comput. 18, Supplement C (May 2014), 136--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Di Zhao and Chunhua Weng. 2011. Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. J. Biomed. Inform. 44, 5 (2011), 859--868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Eric R. Ziegel. 1990. Juran's Quality Control Handbook. Taylor & Francis Group.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research
    June 2018
    216 pages
    ISBN:9781450357685
    DOI:10.1145/3209626

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate300of480submissions,63%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader