research-article

Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

Author:
Olivier Curé

Université Paris-Est

Université Paris-Est
View Profile

Authors Info & Claims

Journal of Data and Information Quality Volume 4 Issue 1Article No.: 3pp 1–21https://doi.org/10.1145/2378016.2378019

Published:01 October 2012Publication History

Journal of Data and Information Quality

Abstract

Many health care systems and services exploit drug related information stored in databases. The poor data quality of these databases, e.g. inaccuracy of drug contraindications, can lead to catastrophic consequences for the health condition of patients. Hence it is important to ensure their quality in terms of data completeness and soundness.

In the database domain, standard Functional Dependencies (FDs) and INclusion Dependencies (INDs), have been proposed to prevent the insertion of incorrect data. But they are generally not expressive enough to represent a domain-specific set of constraints. To this end, conditional dependencies, i.e. standard dependencies extended with tableau patterns containing constant values, have been introduced and several methods have been proposed for their discovery and representation. The quality of drug databases can be considerably improved by their usage.

Moreover, pharmacology information is inherently hierarchical and many standards propose graph structures to represent them, e.g. the Anatomical Therapeutic Chemical classification (ATC) or OpenGalen’s terminology. In this article, we emphasize that the technologies of the Semantic Web are adapted to represent these hierarchical structures, i.e. in RDFS and OWL. We also present a solution for representing conditional dependencies using a query language defined for these graph oriented structures, namely SPARQL. The benefits of this approach are interoperability with applications and ontologies of the Semantic Web as well as a reasoning-based query execution solution to clean underlying databases.

References

Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., and Patel-Schneider, P. F., Eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. Google ScholarDigital Library
Batini, C. and Scannapieco, M. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer-Verlag, Berlin. Google ScholarDigital Library
Bohannon, P., Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. 2007. Conditional functional dependencies for data cleaning. In Proceedings of ICDE. IEEE, 746--755.Google Scholar
Bravo, L., Fan, W., and Ma, S. 2007. Extending dependencies with conditions. In Proceedings of VLDB. 243--254. Google ScholarDigital Library
Bravo, L., Fan, W., Geerts, F., and Ma, S. 2008. Increasing the expressivity of conditional functional dependencies without extra complexity. In Proceedings of ICDE. IEEE, 516--525. Google ScholarDigital Library
Chiang, F. and Miller, R. J. 2008. Discovering data quality rules. Proc. VLDB 1, 1, 1166--1177. Google ScholarDigital Library
Curé, O. 2004. Ximsa : Extended interactive multimedia system for auto-medication. In Proceedings of CBMS. 570--575. Google ScholarDigital Library
Curé, O. and Bensaid, J.-D. 2008. Integration of relational databases into owl knowledge bases: Demonstration of the DBOM system. In Proceedings of ICDE Workshops. 230--233. Google ScholarDigital Library
Curé, O. and Squelbut, R. 2005. A database trigger strategy to maintain knowledge bases developed via data migration. In Proceedings of EPIA. 206--217. Google ScholarDigital Library
Fan, W. 2008. Dependencies revisited for improving data quality. In Proceedings of PODS. ACM, 159--170. Google ScholarDigital Library
Fan, W., Geerts, F., Laksmanan, L. V., and Xiong, M. 2009. Discovering conditional functional dependencies. In Proceedings of ICDE. 1231--1234. Google ScholarDigital Library
Giroud, J.-P. and Hagege, C. 2001. Le Guide de Tous les Médicaments. Editions du Rocher Paris, France.Google Scholar
Goethals, B., Page, W. L., and Mannila, H. 2008. Mining association rules of simple conjunctive queries. In Proceedings of SDM. SIAM, 96--107.Google Scholar
Golab, L., Karloff, H. J., Korn, F., Srivastava, D., and Yu, B. 2008. On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB 1, 1, 376--390. Google ScholarDigital Library
Levy, A. Y., Mendelzon, A. O., Sagiv, Y., and Srivastava, D. 1995. Answering queries using views. In Proceedings of PODS. ACM, 95--104. Google ScholarDigital Library
Marchi, F. D. and Petit, J.-M. 2003. Zigzag: A new algorithm for mining large inclusion dependencies in database. In Proceedings of ICDM. IEEE Computer Society, 27--34. Google ScholarDigital Library
Mitchell, J. C. 1983. The implication problem for functional and inclusion dependencies. Inform. Contr. 56, 3, 154--173. Google ScholarDigital Library

Index Terms

Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Extending Conditional Dependencies with Built-in Predicates
This paper proposes a natural extension of conditional functional dependencies (CFDs [1]) and conditional inclusion dependencies (CINDs [2]), denoted by CFD<sup>p</sup>s and CIND<sup>p</sup>s, respectively, by specifying patterns of data values with 6 &#...
Read More
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life Sciences

The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
Read More
Transforming XML documents to OWL ontologies: A survey

The aims of XML data conversion to ontologies are the indexing, integration and enrichment of existing ontologies with knowledge acquired from these sources. The contribution of this paper consists in providing a classification of the approaches used ...
Read More

Reviews

Reviewer: Kalman Balogh

Discovering and analyzing dependencies in data has proven to be an effective approach for enhancing the quality of database content. This paper describes a method that uses special conditional counterparts of functional and inclusion dependencies (CFDs and CINDs) to improve the expressive power of the unconditional constraints. For the adequate representation of ontology-based data and relationship description, access, and integration, the author applies semantic web notions, including the web ontology language (OWL), the resource description framework schema (RDFS), and the SPARQL query language. The first step of the method searches for conditional dependencies in an automatic way. The author presents a new algorithm for the discovery of CFDs, including a sound and complete method for CINDs. This method enables the detection of erroneous, inconsistent, and missing data within a set of CFDs and CINDs, and with the help of generated SPARQL queries, it enables the explanation of violations in a declarative way. Thus, it may be used to guide experts in cleansing and completing databases. The theory and the method are domain independent. The approach is explained and demonstrated by examples in the drug database application domain. The paper sketches some features of the implementation, and describes and evaluates experiments to test the practical value of the theory. The author has been developing web applications to enable the general public to self-medicate efficiently and safely. Both of these systems are implemented using Java Platform, Enterprise Edition (Java EE) technologies to map ontology-level queries to the relational drug databases. I recommend this paper for semantic web researchers and implementers, as well as those developing applications in different domains, especially drug databases and medical systems. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of Data and Information Quality Volume 4, Issue 1
October 2012
69 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/2378016
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2012
- Accepted: 1 January 2012
- Received: 1 October 2010
Published in jdiq Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data quality
conditional dependencies
description logics
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 735
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

Journal of Data and Information Quality

Abstract

References

Cited By

Index Terms

Recommendations

Extending Conditional Dependencies with Built-in Predicates

Using the relation ontology Metarel for modelling Linked Data as multi-digraphs

Transforming XML documents to OWL ontologies: A survey

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

Journal of Data and Information Quality

Abstract

References

Cited By

Index Terms

Recommendations

Extending Conditional Dependencies with Built-in Predicates

Using the relation ontology Metarel for modelling Linked Data as multi-digraphs

Transforming XML documents to OWL ontologies: A survey

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media