skip to main content
research-article

Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

Published:01 October 2012Publication History
Skip Abstract Section

Abstract

Many health care systems and services exploit drug related information stored in databases. The poor data quality of these databases, e.g. inaccuracy of drug contraindications, can lead to catastrophic consequences for the health condition of patients. Hence it is important to ensure their quality in terms of data completeness and soundness.

In the database domain, standard Functional Dependencies (FDs) and INclusion Dependencies (INDs), have been proposed to prevent the insertion of incorrect data. But they are generally not expressive enough to represent a domain-specific set of constraints. To this end, conditional dependencies, i.e. standard dependencies extended with tableau patterns containing constant values, have been introduced and several methods have been proposed for their discovery and representation. The quality of drug databases can be considerably improved by their usage.

Moreover, pharmacology information is inherently hierarchical and many standards propose graph structures to represent them, e.g. the Anatomical Therapeutic Chemical classification (ATC) or OpenGalen’s terminology. In this article, we emphasize that the technologies of the Semantic Web are adapted to represent these hierarchical structures, i.e. in RDFS and OWL. We also present a solution for representing conditional dependencies using a query language defined for these graph oriented structures, namely SPARQL. The benefits of this approach are interoperability with applications and ontologies of the Semantic Web as well as a reasoning-based query execution solution to clean underlying databases.

References

  1. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., and Patel-Schneider, P. F., Eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Batini, C. and Scannapieco, M. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer-Verlag, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bohannon, P., Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. 2007. Conditional functional dependencies for data cleaning. In Proceedings of ICDE. IEEE, 746--755.Google ScholarGoogle Scholar
  5. Bravo, L., Fan, W., and Ma, S. 2007. Extending dependencies with conditions. In Proceedings of VLDB. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bravo, L., Fan, W., Geerts, F., and Ma, S. 2008. Increasing the expressivity of conditional functional dependencies without extra complexity. In Proceedings of ICDE. IEEE, 516--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chiang, F. and Miller, R. J. 2008. Discovering data quality rules. Proc. VLDB 1, 1, 1166--1177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Curé, O. 2004. Ximsa : Extended interactive multimedia system for auto-medication. In Proceedings of CBMS. 570--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Curé, O. and Bensaid, J.-D. 2008. Integration of relational databases into owl knowledge bases: Demonstration of the DBOM system. In Proceedings of ICDE Workshops. 230--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Curé, O. and Squelbut, R. 2005. A database trigger strategy to maintain knowledge bases developed via data migration. In Proceedings of EPIA. 206--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fan, W. 2008. Dependencies revisited for improving data quality. In Proceedings of PODS. ACM, 159--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fan, W., Geerts, F., Laksmanan, L. V., and Xiong, M. 2009. Discovering conditional functional dependencies. In Proceedings of ICDE. 1231--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Giroud, J.-P. and Hagege, C. 2001. Le Guide de Tous les Médicaments. Editions du Rocher Paris, France.Google ScholarGoogle Scholar
  14. Goethals, B., Page, W. L., and Mannila, H. 2008. Mining association rules of simple conjunctive queries. In Proceedings of SDM. SIAM, 96--107.Google ScholarGoogle Scholar
  15. Golab, L., Karloff, H. J., Korn, F., Srivastava, D., and Yu, B. 2008. On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB 1, 1, 376--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Levy, A. Y., Mendelzon, A. O., Sagiv, Y., and Srivastava, D. 1995. Answering queries using views. In Proceedings of PODS. ACM, 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Marchi, F. D. and Petit, J.-M. 2003. Zigzag: A new algorithm for mining large inclusion dependencies in database. In Proceedings of ICDM. IEEE Computer Society, 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mitchell, J. C. 1983. The implication problem for functional and inclusion dependencies. Inform. Contr. 56, 3, 154--173. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

    Recommendations

    Reviews

    Kalman Balogh

    Discovering and analyzing dependencies in data has proven to be an effective approach for enhancing the quality of database content. This paper describes a method that uses special conditional counterparts of functional and inclusion dependencies (CFDs and CINDs) to improve the expressive power of the unconditional constraints. For the adequate representation of ontology-based data and relationship description, access, and integration, the author applies semantic web notions, including the web ontology language (OWL), the resource description framework schema (RDFS), and the SPARQL query language. The first step of the method searches for conditional dependencies in an automatic way. The author presents a new algorithm for the discovery of CFDs, including a sound and complete method for CINDs. This method enables the detection of erroneous, inconsistent, and missing data within a set of CFDs and CINDs, and with the help of generated SPARQL queries, it enables the explanation of violations in a declarative way. Thus, it may be used to guide experts in cleansing and completing databases. The theory and the method are domain independent. The approach is explained and demonstrated by examples in the drug database application domain. The paper sketches some features of the implementation, and describes and evaluates experiments to test the practical value of the theory. The author has been developing web applications to enable the general public to self-medicate efficiently and safely. Both of these systems are implemented using Java Platform, Enterprise Edition (Java EE) technologies to map ontology-level queries to the relational drug databases. I recommend this paper for semantic web researchers and implementers, as well as those developing applications in different domains, especially drug databases and medical systems. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 4, Issue 1
      October 2012
      69 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/2378016
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2012
      • Accepted: 1 January 2012
      • Received: 1 October 2010
      Published in jdiq Volume 4, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader