ABSTRACT
This paper is utilizing predictive models to determine missing electronic medical records (EMR) at general practice offices. Prior research has addressed the missing values problem in the EMRs used for secondary analysis. However, health care providers are overlooking the missing records problem that stores the patients' medical visits information in EMRs. Our study provides a technique to predict the number of EMR entries for each practice based on their past data records. If the number of EMR entries is less than predicted, it warns the occurrence of missing records with the 95% confidence interval. The study uses seven years of EMRs from 14 general practice offices to train the predictive model. The model predicts EMR data entries and accordingly identified missing EMRs for the following year. We compared the actual visits illustrated by de-identified billing data to the predictive model. The study found auto-correlation method improves the performance of identifying missing records by detecting the period of prediction. In addition, artificial neural networks and support vector machines perform better than other predictive methods depending on whether the analysis aims at detecting missing EMRs or when identifying complete EMRs with no missing records. Results suggest that clinicians and medical professionals should be mindful of the potential missing records of EMRs prior any secondary analysis.
- Filippo Amato, Alberto López, Eladia María Peña-Méndez, Petr Vaňhara, Aleš Hampl, and Josef Havel. 2013. Artificial neural networks in medical diagnosis. Elsevier.Google Scholar
- Danielle GT Arts, Nicolette F. De Keizer, and Gert-Jan Scheffer. 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc. 9, 6 (2002), 600--611.Google ScholarCross Ref
- Carl Asche, Qayyim Said, Vijay Joish, Charles Oaxaca Hall, and Diana Brixner. 2008. Assessment of COPD-related outcomes via a national electronic medical record database. Int. J. Chron. Obstruct. Pulmon. Dis. 3, 2 (2008), 323.Google ScholarCross Ref
- Steven C. Bagley, Halbert White, and Beatrice A. Golomb. 2001. Logistic regression in the medical literature:: Standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 54, 10 (2001), 979--985.Google ScholarCross Ref
- P. Baraldi, F. Di Maio, D. Genini, and E. Zio. 2015. Reconstruction of missing data in multidimensional time series by fuzzy similarity. Appl. Soft Comput. 26, Supplement C (January 2015), 1--9. Google ScholarDigital Library
- Roelof K. Brouwer and Witold Pedrycz. 2003. Training a feed-forward network with incomplete data due to missing input variables. Appl. Soft Comput. 3, 1 (July 2003), 23--36.Google ScholarCross Ref
- José M. Cadenas, M. Carmen Garrido, and Raquel Martínez. 2013. Feature subset selection Filter--Wrapper based on low quality data. Expert Syst. Appl. 40, 16 (November 2013), 6241--6252. Google ScholarDigital Library
- Handan Ankarali Camdeviren, Ayse Canan Yazici, Zeki Akkus, Resul Bugdayci, and Mehmet Ali Sungur. 2007. Comparison of logistic regression model and classification tree: An application to postpartum depression data. Expert Syst. Appl. 32, 4 (May 2007), 987--994. Google ScholarDigital Library
- Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, 161--168. Google ScholarDigital Library
- Federico Cismondi, André S. Fialho, Susana M. Vieira, Shane R. Reti, João MC Sousa, and Stan N. Finkelstein. 2013. Missing data in medical databases: Impute, delete or classify? Artif. Intell. Med. 58, 1 (2013), 63--72. Google ScholarDigital Library
- Ali Dag, Asil Oztekin, Ahmet Yucel, Serkan Bulur, and Fadel M. Megahed. 2017. Predicting heart transplantation outcomes through data analytics. Decis. Support Syst. 94, (2017), 42--52. Google ScholarDigital Library
- Ali Dag, Kazim Topuz, Asil Oztekin, Serkan Bulur, and Fadel M. Megahed. 2016. A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival. Decis. Support Syst. 86, (2016), 1--12. Google ScholarDigital Library
- Glenn De'ath and Katharina E. Fabricius. 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81, 11 (2000), 3178--3192.Google ScholarCross Ref
- Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. 2005. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng. 17, 7 (2005), 875--887. Google ScholarDigital Library
- Gregor Endler, Philipp Baumgärtel, Andreas M. Wahl, and Richard Lenz. 2015. ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible? In East European Conference on Advances in Databases and Information Systems, 261--274.Google ScholarCross Ref
- Benjamin A. Goldstein, Ann Marie Navar, Michael J. Pencina, and John Ioannidis. 2017. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 1 (2017), 198--208.Google ScholarCross Ref
- Elham Heidari, Mohammad Amin Sobati, and Salman Movahedirad. 2016. Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom. Intell. Lab. Syst. 155, (2016), 73--85.Google Scholar
- William R. Hogan and Michael M. Wagner. 1997. Accuracy of data in computer-based patient records. J. Am. Med. Inform. Assoc. 4, 5 (1997), 342--355.Google ScholarCross Ref
- Zhen Hu, Genevieve B. Melton, Elliot G. Arsoniadis, Yan Wang, Mary R. Kwaan, and Gyorgy J. Simon. 2017. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J. Biomed. Inform. 68, Supplement C (April 2017), 112--120. Google ScholarDigital Library
- Turker Ince, Serkan Kiranyaz, Jenni Pulkkinen, and Moncef Gabbouj. 2010. Evaluation of global and local training techniques over feed-forward neural network architecture spaces for computer-aided medical diagnosis. Expert Syst. Appl. 37, 12 (December 2010), 8450--8461. Google ScholarDigital Library
- Mirjana Ivanovič and Zoran Budimac. 2014. An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41, 11 (September 2014), 5158--5166.Google ScholarCross Ref
- José M. Jerez, Ignacio Molina, Pedro J. García-Laencina, Emilio Alba, Nuria Ribelles, Miguel Martín, and Leonardo Franco. 2010. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50, 2 (2010), 105--115. Google ScholarDigital Library
- Sergio Jurado, Àngela Nebot, Fransisco Mugica, and Mihail Mihaylov. 2017. Fuzzy inductive reasoning forecasting strategies able to cope with missing data: A smart grid application. Appl. Soft Comput. 51, Supplement C (February 2017), 225--238.Google Scholar
- Abel N. Kho, M. Geoffrey Hayes, Laura Rasmussen-Torvik, Jennifer A. Pacheco, William K. Thompson, Loren L. Armstrong, Joshua C. Denny, Peggy L. Peissig, Aaron W. Miller, and Wei-Qi Wei. 2011. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 2 (2011), 212--218.Google ScholarCross Ref
- James D. Lewis and Colleen Brensinger. 2004. Agreement between GPRD smoking data: a survey of general practitioners and a population-based survey. Pharmacoepidemiol. Drug Saf. 13, 7 (2004), 437--441.Google ScholarCross Ref
- Zhenhui Li, Jingjing Wang, and Jiawei Han. 2012. Mining event periodicity from incomplete observations. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 444--452. Google ScholarDigital Library
- Siaw-Teng Liaw, Alireza Rahimi, Pradeep Ray, Jane Taggart, S. Dennis, Simon de Lusignan, B. Jalaludin, A. E. T. Yeo, and Amir Talaei-Khoei. 2013. Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inf. 82, 1 (2013), 10--24.Google ScholarCross Ref
- Jau-Huei Lin and Peter J. Haug. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J. Biomed. Inform. 41, 1 (February 2008), 1--14. Google ScholarDigital Library
- Wen-Yang Lin, Lin Lan, Feng-Hsiung Huang, and Min-Hsien Wang. 2015. Rough-set-based ADR signaling from spontaneous reporting data with missing values. J. Biomed. Inform. 58, Supplement C (December 2015), 235--246. Google ScholarDigital Library
- Caihua Liu, Amir Talaei-Khoei, Didar Zowghi, and Jay Daniel. 2017. Data Completeness in Healthcare: A Literature Survey. Pac. Asia J. Assoc. Inf. Syst. 9, 2 (2017).Google Scholar
- Yong-Nan Liu, Jian-Zhong Li, and Zhao-Nian Zou. 2016. Determining the Real Data Completeness of a Relational Dataset. J. Comput. Sci. Technol. 31, 4 (2016), 720--740.Google ScholarCross Ref
- Judith R. Logan, Paul N. Gorman, and Blackford Middleton. 2001. Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data. In Proceedings of the AMIA Symposium, 408.Google Scholar
- Jeanne M. Madden, Matthew D. Lakoma, Donna Rusinak, Christine Y. Lu, and Stephen B. Soumerai. 2016. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J. Am. Med. Inform. Assoc. 23, 6 (2016), 1143--1149.Google ScholarCross Ref
- Alan G. Marshall and Francis R. Verdun. 2016. Fourier transforms in NMR, optical, and mass spectrometry: a user's handbook. Elsevier.Google Scholar
- Thomas Mazzocco and Amir Hussain. 2012. Novel logistic regression models to aid the diagnosis of dementia. Expert Syst. Appl. 39, 3 (February 2012), 3356--3361. Google ScholarDigital Library
- Amy McQuillan, Suzanne Aigrain, and Tsevi Mazeh. 2013. Measuring the rotation period distribution of field M dwarfs with Kepler. Mon. Not. R. Astron. Soc. 432, 2 (2013), 1203--1216.Google ScholarCross Ref
- Sandra de F. Mendes Sampaio, Chao Dong, and Pedro Sampaio. 2015. DQ2S -- A framework for data quality-aware information management. Expert Syst. Appl. 42, 21 (November 2015), 8304--8326. Google ScholarDigital Library
- Amihai Motro and Igor Rakov. 1997. Not all answers are equally good: Estimating the quality of database answers. Springer.Google Scholar
- Loris Nanni, Alessandra Lumini, and Sheryl Brahnam. 2012. A classifier ensemble approach for the missing feature problem. Artif. Intell. Med. 55, 1 (2012), 37--50. Google ScholarDigital Library
- Felix Naumann, Johann-Christoph Freytag, and Ulf Leser. 2004. Completeness of integrated information sources. Inf. Syst. 29, 7 (2004), 583--615. Google ScholarDigital Library
- Kais Ncibi, Tarek Sadraoui, Mili Faycel, and Amor Djenina. 2017. A Multilayer Perceptron Artificial Neural Networks Based a Preprocessing and Hybrid Optimization Task for Data Mining and Classification. Int. J. Econom. Financ. Manag. Int. J. Econom. Financ. Manag. 5, 1 (March 2017), 12--21.Google Scholar
- Z. Omiotek, A. Burda, and W. Wójcik. 2013. The use of decision tree induction and artificial neural networks for automatic diagnosis of Hashimoto's disease. Expert Syst. Appl. 40, 16 (November 2013), 6684--6689. Google ScholarDigital Library
- Rasaq Otunba and Jessica Lin. 2014. APT: Approximate Period Detection in Time Series. In SEKE, 490--494.Google Scholar
- Amit Paul, Jaya Sil, and Chitrangada Das Mukhopadhyay. 2017. Gene selection for designing optimal fuzzy rule base classifier by estimating missing value. Appl. Soft Comput. 55, Supplement C (June 2017), 276--288. Google ScholarDigital Library
- Erkki Pesonen, Matti Eskelinen, and Martti Juhola. 1998. Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artif. Intell. Med. 13, 3 (July 1998), 139--146.Google ScholarCross Ref
- Jolt Roukema, Renske K. Los, Sacha E. Bleeker, Astrid M. van Ginneken, Johan van der Lei, and Henriette A. Moll. 2006. Paper versus computer: feasibility of an electronic medical record in general pediatrics. Pediatrics 117, 1 (2006), 15--21.Google ScholarCross Ref
- Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henriques Abreu, Hélder Araújo, and João Santos. 2017. Influence of Data Distribution in Missing Data Imputation. In Conference on Artificial Intelligence in Medicine in Europe, 285--294.Google Scholar
- Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Missing values in deduplication of electronic patient data. J. Am. Med. Inform. Assoc. 19, e1 (2011), e76--e82.Google ScholarCross Ref
- Joseph L. Schafer and John W. Graham. 2002. Missing data: our view of the state of the art. Psychol. Methods 7, 2 (2002), 147.Google ScholarCross Ref
- Mohammad Ali Shafia, Seyed Jafar Sadjadi, Amin Jamili, Reza Tavakkoli-Moghaddam, and Mohsen Pourseyed-Aghaee. 2012. The periodicity and robustness in a single-track train scheduling problem. Appl. Soft Comput. 12, 1 (January 2012), 440--452. Google ScholarDigital Library
- Matthew Sperrin, Sarah Thew, James Weatherall, William Dixon, and Iain Buchan. 2011. Quantifying the longitudinal value of healthcare record collections for pharmacoepidemiology. In AMIA Annual Symposium Proceedings, 1318.Google Scholar
- Dan Steinberg and Phillip Colla. 2009. C&RT: classification and regression trees. Top Ten Algorithms Data Min. 9, (2009), 179.Google Scholar
- Ruxandra Stoean and Catalin Stoean. 2013. Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst. Appl. 40, 7 (June 2013), 2677--2686. Google ScholarDigital Library
- Shan Suthaharan. 2016. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification. Springer, 207--235.Google Scholar
- Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjenka Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. 2011. A predictive model for progression of chronic kidney disease to kidney failure. Jama 305, 15 (2011), 1553--1559.Google ScholarCross Ref
- Michal Tkáč and Robert Verner. 2016. Artificial neural networks in business: Two decades of research. Appl. Soft Comput. 38, Supplement C (January 2016), 788--804. Google ScholarDigital Library
- Michail Vlachos, Philip Yu, and Vittorio Castelli. 2005. On periodicity detection and structural periodic similarity. In Proceedings of the 2005 SIAM International Conference on Data Mining, 449--460.Google ScholarCross Ref
- Akbar K Waljee, Peter D R Higgins, and Amit G Singal. 2014. A Primer on Predictive Models. Clin. Transl. Gastroenterol. 5, 1 (January 2014), e44.Google ScholarCross Ref
- Nicole G. Weiskopf, George Hripcsak, Sushmita Swaminathan, and Chunhua Weng. 2013. Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46, 5 (2013), 830--836. Google ScholarDigital Library
- Nicole Gray Weiskopf and Chunhua Weng. 2013. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20, 1 (2013), 144--151.Google ScholarCross Ref
- Adam Wright, Allison B. McCoy, Thu-Trang T. Hickman, Daniel St Hilaire, Damian Borbolla, Watson A. Bowes, William G. Dixon, David A. Dorr, Michael Krall, and Sameer Malholtra. 2015. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. Int. J. Med. Inf. 84, 10 (2015), 784--790.Google ScholarCross Ref
- Faramak Zandi. 2014. A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures. Appl. Soft Comput. 18, Supplement C (May 2014), 136--145. Google ScholarDigital Library
- Di Zhao and Chunhua Weng. 2011. Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. J. Biomed. Inform. 44, 5 (2011), 859--868. Google ScholarDigital Library
- Eric R. Ziegel. 1990. Juran's Quality Control Handbook. Taylor & Francis Group.Google Scholar
Recommendations
Electronic health records: how can IS researchers contribute to transforming healthcare?
Electronic health records (EHR) facilitate integration of patient health history for planning safe and proper treatment. Combined with data analytics, aggregate-level EHR enable examination and development of effective medicines and therapies for ...
Nurses' Views on Electronic Medical Records (EMR) in Turkey: An Analysis According to Use, Quality and User Satisfaction
Electronic medical records are generally used by nurses in hospitals. However, studies investigating views on and evaluations of electronic medical records by nurses are limited in Turkey and in other countries around the world. Thus, in this study, ...
Electronic Medical Records and Physician Productivity: Evidence from Panel Data Analysis
<P>This paper studies the impact of an electronic medical record (EMR) system on the productivity of physicians. Physicians influence a vast majority of treatment decisions and are central to the care delivery process; thus, it is important to ...
Comments