research-article

A Predictive Method to Determine Incomplete Electronic Medical Records

Authors:
Amir Talaei-Khoei

University of Nevada Reno & University of Technology Sydney, Reno, NV, USA

University of Nevada Reno & University of Technology Sydney, Reno, NV, USA
View Profile

,
Luvai F. Motiwalla

University of Massachusetts Lowell, Lowell, MA, USA

University of Massachusetts Lowell, Lowell, MA, USA
View Profile

,
S. Farzan Kazemi

University of Nevada Reno, Reno, NV, USA

University of Nevada Reno, Reno, NV, USA
View Profile

SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People ResearchJune 2018Pages 99–106https://doi.org/10.1145/3209626.3209706

Published:18 June 2018Publication History

SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research

Pages 99–106

ABSTRACT

This paper is utilizing predictive models to determine missing electronic medical records (EMR) at general practice offices. Prior research has addressed the missing values problem in the EMRs used for secondary analysis. However, health care providers are overlooking the missing records problem that stores the patients' medical visits information in EMRs. Our study provides a technique to predict the number of EMR entries for each practice based on their past data records. If the number of EMR entries is less than predicted, it warns the occurrence of missing records with the 95% confidence interval. The study uses seven years of EMRs from 14 general practice offices to train the predictive model. The model predicts EMR data entries and accordingly identified missing EMRs for the following year. We compared the actual visits illustrated by de-identified billing data to the predictive model. The study found auto-correlation method improves the performance of identifying missing records by detecting the period of prediction. In addition, artificial neural networks and support vector machines perform better than other predictive methods depending on whether the analysis aims at detecting missing EMRs or when identifying complete EMRs with no missing records. Results suggest that clinicians and medical professionals should be mindful of the potential missing records of EMRs prior any secondary analysis.

References

Filippo Amato, Alberto López, Eladia María Peña-Méndez, Petr Vaňhara, Aleš Hampl, and Josef Havel. 2013. Artificial neural networks in medical diagnosis. Elsevier.Google Scholar
Danielle GT Arts, Nicolette F. De Keizer, and Gert-Jan Scheffer. 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc. 9, 6 (2002), 600--611.Google ScholarCross Ref
Carl Asche, Qayyim Said, Vijay Joish, Charles Oaxaca Hall, and Diana Brixner. 2008. Assessment of COPD-related outcomes via a national electronic medical record database. Int. J. Chron. Obstruct. Pulmon. Dis. 3, 2 (2008), 323.Google ScholarCross Ref
Steven C. Bagley, Halbert White, and Beatrice A. Golomb. 2001. Logistic regression in the medical literature:: Standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 54, 10 (2001), 979--985.Google ScholarCross Ref
P. Baraldi, F. Di Maio, D. Genini, and E. Zio. 2015. Reconstruction of missing data in multidimensional time series by fuzzy similarity. Appl. Soft Comput. 26, Supplement C (January 2015), 1--9. Google ScholarDigital Library
Roelof K. Brouwer and Witold Pedrycz. 2003. Training a feed-forward network with incomplete data due to missing input variables. Appl. Soft Comput. 3, 1 (July 2003), 23--36.Google ScholarCross Ref
José M. Cadenas, M. Carmen Garrido, and Raquel Martínez. 2013. Feature subset selection Filter--Wrapper based on low quality data. Expert Syst. Appl. 40, 16 (November 2013), 6241--6252. Google ScholarDigital Library
Handan Ankarali Camdeviren, Ayse Canan Yazici, Zeki Akkus, Resul Bugdayci, and Mehmet Ali Sungur. 2007. Comparison of logistic regression model and classification tree: An application to postpartum depression data. Expert Syst. Appl. 32, 4 (May 2007), 987--994. Google ScholarDigital Library
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, 161--168. Google ScholarDigital Library
Federico Cismondi, André S. Fialho, Susana M. Vieira, Shane R. Reti, João MC Sousa, and Stan N. Finkelstein. 2013. Missing data in medical databases: Impute, delete or classify? Artif. Intell. Med. 58, 1 (2013), 63--72. Google ScholarDigital Library
Ali Dag, Asil Oztekin, Ahmet Yucel, Serkan Bulur, and Fadel M. Megahed. 2017. Predicting heart transplantation outcomes through data analytics. Decis. Support Syst. 94, (2017), 42--52. Google ScholarDigital Library
Ali Dag, Kazim Topuz, Asil Oztekin, Serkan Bulur, and Fadel M. Megahed. 2016. A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival. Decis. Support Syst. 86, (2016), 1--12. Google ScholarDigital Library
Glenn De'ath and Katharina E. Fabricius. 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81, 11 (2000), 3178--3192.Google ScholarCross Ref
Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. 2005. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng. 17, 7 (2005), 875--887. Google ScholarDigital Library
Gregor Endler, Philipp Baumgärtel, Andreas M. Wahl, and Richard Lenz. 2015. ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible? In East European Conference on Advances in Databases and Information Systems, 261--274.Google ScholarCross Ref
Benjamin A. Goldstein, Ann Marie Navar, Michael J. Pencina, and John Ioannidis. 2017. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 1 (2017), 198--208.Google ScholarCross Ref
Elham Heidari, Mohammad Amin Sobati, and Salman Movahedirad. 2016. Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom. Intell. Lab. Syst. 155, (2016), 73--85.Google Scholar
William R. Hogan and Michael M. Wagner. 1997. Accuracy of data in computer-based patient records. J. Am. Med. Inform. Assoc. 4, 5 (1997), 342--355.Google ScholarCross Ref
Zhen Hu, Genevieve B. Melton, Elliot G. Arsoniadis, Yan Wang, Mary R. Kwaan, and Gyorgy J. Simon. 2017. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J. Biomed. Inform. 68, Supplement C (April 2017), 112--120. Google ScholarDigital Library
Turker Ince, Serkan Kiranyaz, Jenni Pulkkinen, and Moncef Gabbouj. 2010. Evaluation of global and local training techniques over feed-forward neural network architecture spaces for computer-aided medical diagnosis. Expert Syst. Appl. 37, 12 (December 2010), 8450--8461. Google ScholarDigital Library
Mirjana Ivanovič and Zoran Budimac. 2014. An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41, 11 (September 2014), 5158--5166.Google ScholarCross Ref
José M. Jerez, Ignacio Molina, Pedro J. García-Laencina, Emilio Alba, Nuria Ribelles, Miguel Martín, and Leonardo Franco. 2010. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50, 2 (2010), 105--115. Google ScholarDigital Library
Sergio Jurado, Àngela Nebot, Fransisco Mugica, and Mihail Mihaylov. 2017. Fuzzy inductive reasoning forecasting strategies able to cope with missing data: A smart grid application. Appl. Soft Comput. 51, Supplement C (February 2017), 225--238.Google Scholar
Abel N. Kho, M. Geoffrey Hayes, Laura Rasmussen-Torvik, Jennifer A. Pacheco, William K. Thompson, Loren L. Armstrong, Joshua C. Denny, Peggy L. Peissig, Aaron W. Miller, and Wei-Qi Wei. 2011. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 2 (2011), 212--218.Google ScholarCross Ref
James D. Lewis and Colleen Brensinger. 2004. Agreement between GPRD smoking data: a survey of general practitioners and a population-based survey. Pharmacoepidemiol. Drug Saf. 13, 7 (2004), 437--441.Google ScholarCross Ref
Zhenhui Li, Jingjing Wang, and Jiawei Han. 2012. Mining event periodicity from incomplete observations. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 444--452. Google ScholarDigital Library
Siaw-Teng Liaw, Alireza Rahimi, Pradeep Ray, Jane Taggart, S. Dennis, Simon de Lusignan, B. Jalaludin, A. E. T. Yeo, and Amir Talaei-Khoei. 2013. Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inf. 82, 1 (2013), 10--24.Google ScholarCross Ref
Jau-Huei Lin and Peter J. Haug. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J. Biomed. Inform. 41, 1 (February 2008), 1--14. Google ScholarDigital Library
Wen-Yang Lin, Lin Lan, Feng-Hsiung Huang, and Min-Hsien Wang. 2015. Rough-set-based ADR signaling from spontaneous reporting data with missing values. J. Biomed. Inform. 58, Supplement C (December 2015), 235--246. Google ScholarDigital Library
Caihua Liu, Amir Talaei-Khoei, Didar Zowghi, and Jay Daniel. 2017. Data Completeness in Healthcare: A Literature Survey. Pac. Asia J. Assoc. Inf. Syst. 9, 2 (2017).Google Scholar
Yong-Nan Liu, Jian-Zhong Li, and Zhao-Nian Zou. 2016. Determining the Real Data Completeness of a Relational Dataset. J. Comput. Sci. Technol. 31, 4 (2016), 720--740.Google ScholarCross Ref
Judith R. Logan, Paul N. Gorman, and Blackford Middleton. 2001. Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data. In Proceedings of the AMIA Symposium, 408.Google Scholar
Jeanne M. Madden, Matthew D. Lakoma, Donna Rusinak, Christine Y. Lu, and Stephen B. Soumerai. 2016. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J. Am. Med. Inform. Assoc. 23, 6 (2016), 1143--1149.Google ScholarCross Ref
Alan G. Marshall and Francis R. Verdun. 2016. Fourier transforms in NMR, optical, and mass spectrometry: a user's handbook. Elsevier.Google Scholar
Thomas Mazzocco and Amir Hussain. 2012. Novel logistic regression models to aid the diagnosis of dementia. Expert Syst. Appl. 39, 3 (February 2012), 3356--3361. Google ScholarDigital Library
Amy McQuillan, Suzanne Aigrain, and Tsevi Mazeh. 2013. Measuring the rotation period distribution of field M dwarfs with Kepler. Mon. Not. R. Astron. Soc. 432, 2 (2013), 1203--1216.Google ScholarCross Ref
Sandra de F. Mendes Sampaio, Chao Dong, and Pedro Sampaio. 2015. DQ2S -- A framework for data quality-aware information management. Expert Syst. Appl. 42, 21 (November 2015), 8304--8326. Google ScholarDigital Library
Amihai Motro and Igor Rakov. 1997. Not all answers are equally good: Estimating the quality of database answers. Springer.Google Scholar
Loris Nanni, Alessandra Lumini, and Sheryl Brahnam. 2012. A classifier ensemble approach for the missing feature problem. Artif. Intell. Med. 55, 1 (2012), 37--50. Google ScholarDigital Library
Felix Naumann, Johann-Christoph Freytag, and Ulf Leser. 2004. Completeness of integrated information sources. Inf. Syst. 29, 7 (2004), 583--615. Google ScholarDigital Library
Kais Ncibi, Tarek Sadraoui, Mili Faycel, and Amor Djenina. 2017. A Multilayer Perceptron Artificial Neural Networks Based a Preprocessing and Hybrid Optimization Task for Data Mining and Classification. Int. J. Econom. Financ. Manag. Int. J. Econom. Financ. Manag. 5, 1 (March 2017), 12--21.Google Scholar
Z. Omiotek, A. Burda, and W. Wójcik. 2013. The use of decision tree induction and artificial neural networks for automatic diagnosis of Hashimoto's disease. Expert Syst. Appl. 40, 16 (November 2013), 6684--6689. Google ScholarDigital Library
Rasaq Otunba and Jessica Lin. 2014. APT: Approximate Period Detection in Time Series. In SEKE, 490--494.Google Scholar
Amit Paul, Jaya Sil, and Chitrangada Das Mukhopadhyay. 2017. Gene selection for designing optimal fuzzy rule base classifier by estimating missing value. Appl. Soft Comput. 55, Supplement C (June 2017), 276--288. Google ScholarDigital Library
Erkki Pesonen, Matti Eskelinen, and Martti Juhola. 1998. Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artif. Intell. Med. 13, 3 (July 1998), 139--146.Google ScholarCross Ref
Jolt Roukema, Renske K. Los, Sacha E. Bleeker, Astrid M. van Ginneken, Johan van der Lei, and Henriette A. Moll. 2006. Paper versus computer: feasibility of an electronic medical record in general pediatrics. Pediatrics 117, 1 (2006), 15--21.Google ScholarCross Ref
Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henriques Abreu, Hélder Araújo, and João Santos. 2017. Influence of Data Distribution in Missing Data Imputation. In Conference on Artificial Intelligence in Medicine in Europe, 285--294.Google Scholar
Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Missing values in deduplication of electronic patient data. J. Am. Med. Inform. Assoc. 19, e1 (2011), e76--e82.Google ScholarCross Ref
Joseph L. Schafer and John W. Graham. 2002. Missing data: our view of the state of the art. Psychol. Methods 7, 2 (2002), 147.Google ScholarCross Ref
Mohammad Ali Shafia, Seyed Jafar Sadjadi, Amin Jamili, Reza Tavakkoli-Moghaddam, and Mohsen Pourseyed-Aghaee. 2012. The periodicity and robustness in a single-track train scheduling problem. Appl. Soft Comput. 12, 1 (January 2012), 440--452. Google ScholarDigital Library
Matthew Sperrin, Sarah Thew, James Weatherall, William Dixon, and Iain Buchan. 2011. Quantifying the longitudinal value of healthcare record collections for pharmacoepidemiology. In AMIA Annual Symposium Proceedings, 1318.Google Scholar
Dan Steinberg and Phillip Colla. 2009. C&RT: classification and regression trees. Top Ten Algorithms Data Min. 9, (2009), 179.Google Scholar
Ruxandra Stoean and Catalin Stoean. 2013. Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst. Appl. 40, 7 (June 2013), 2677--2686. Google ScholarDigital Library
Shan Suthaharan. 2016. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification. Springer, 207--235.Google Scholar
Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjenka Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. 2011. A predictive model for progression of chronic kidney disease to kidney failure. Jama 305, 15 (2011), 1553--1559.Google ScholarCross Ref
Michal Tkáč and Robert Verner. 2016. Artificial neural networks in business: Two decades of research. Appl. Soft Comput. 38, Supplement C (January 2016), 788--804. Google ScholarDigital Library
Michail Vlachos, Philip Yu, and Vittorio Castelli. 2005. On periodicity detection and structural periodic similarity. In Proceedings of the 2005 SIAM International Conference on Data Mining, 449--460.Google ScholarCross Ref
Akbar K Waljee, Peter D R Higgins, and Amit G Singal. 2014. A Primer on Predictive Models. Clin. Transl. Gastroenterol. 5, 1 (January 2014), e44.Google ScholarCross Ref
Nicole G. Weiskopf, George Hripcsak, Sushmita Swaminathan, and Chunhua Weng. 2013. Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46, 5 (2013), 830--836. Google ScholarDigital Library
Nicole Gray Weiskopf and Chunhua Weng. 2013. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20, 1 (2013), 144--151.Google ScholarCross Ref
Adam Wright, Allison B. McCoy, Thu-Trang T. Hickman, Daniel St Hilaire, Damian Borbolla, Watson A. Bowes, William G. Dixon, David A. Dorr, Michael Krall, and Sameer Malholtra. 2015. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. Int. J. Med. Inf. 84, 10 (2015), 784--790.Google ScholarCross Ref
Faramak Zandi. 2014. A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures. Appl. Soft Comput. 18, Supplement C (May 2014), 136--145. Google ScholarDigital Library
Di Zhao and Chunhua Weng. 2011. Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. J. Biomed. Inform. 44, 5 (2011), 859--868. Google ScholarDigital Library
Eric R. Ziegel. 1990. Juran's Quality Control Handbook. Taylor & Francis Group.Google Scholar

Recommendations

Electronic health records: how can IS researchers contribute to transforming healthcare?

Electronic health records (EHR) facilitate integration of patient health history for planning safe and proper treatment. Combined with data analytics, aggregate-level EHR enable examination and development of effective medicines and therapies for ...
Read More
Nurses' Views on Electronic Medical Records (EMR) in Turkey: An Analysis According to Use, Quality and User Satisfaction

Electronic medical records are generally used by nurses in hospitals. However, studies investigating views on and evaluations of electronic medical records by nurses are limited in Turkey and in other countries around the world. Thus, in this study, ...
Read More
Electronic Medical Records and Physician Productivity: Evidence from Panel Data Analysis

<P>This paper studies the impact of an electronic medical record (EMR) system on the productivity of physicians. Physicians influence a vast majority of treatment decisions and are central to the care delivery process; thus, it is important to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research
June 2018
216 pages
ISBN:9781450357685
DOI:10.1145/3209626
General Chairs:
Rajiv Kishore
University of Nevada, Las Vegas, USA
,
Daniel Beimborn
Frankfurt School of Finance & Management, Germany
,
Rajendra K. Bandi
Indian Institute of Management Bangalore, India
,
Program Chairs:
Benoit Aubert
Dalhousie University, Canada
,
Deborah Compeau
Washington State University, USA
,
Monideepa Tarafdar
Lancaster University, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data completeness
electronic medical records
missing records
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate300of480submissions,63%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 177
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Predictive Method to Determine Incomplete Electronic Medical Records

SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research

ABSTRACT

References

Cited By

Recommendations

Electronic health records: how can IS researchers contribute to transforming healthcare?

Nurses' Views on Electronic Medical Records (EMR) in Turkey: An Analysis According to Use, Quality and User Satisfaction

Electronic Medical Records and Physician Productivity: Evidence from Panel Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Predictive Method to Determine Incomplete Electronic Medical Records

SIGMIS-CPR'18: Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research

ABSTRACT

References

Cited By

Recommendations

Electronic health records: how can IS researchers contribute to transforming healthcare?

Nurses' Views on Electronic Medical Records (EMR) in Turkey: An Analysis According to Use, Quality and User Satisfaction

Electronic Medical Records and Physician Productivity: Evidence from Panel Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media