Abstract
Many health care systems and services exploit drug related information stored in databases. The poor data quality of these databases, e.g. inaccuracy of drug contraindications, can lead to catastrophic consequences for the health condition of patients. Hence it is important to ensure their quality in terms of data completeness and soundness.
In the database domain, standard Functional Dependencies (FDs) and INclusion Dependencies (INDs), have been proposed to prevent the insertion of incorrect data. But they are generally not expressive enough to represent a domain-specific set of constraints. To this end, conditional dependencies, i.e. standard dependencies extended with tableau patterns containing constant values, have been introduced and several methods have been proposed for their discovery and representation. The quality of drug databases can be considerably improved by their usage.
Moreover, pharmacology information is inherently hierarchical and many standards propose graph structures to represent them, e.g. the Anatomical Therapeutic Chemical classification (ATC) or OpenGalen’s terminology. In this article, we emphasize that the technologies of the Semantic Web are adapted to represent these hierarchical structures, i.e. in RDFS and OWL. We also present a solution for representing conditional dependencies using a query language defined for these graph oriented structures, namely SPARQL. The benefits of this approach are interoperability with applications and ontologies of the Semantic Web as well as a reasoning-based query execution solution to clean underlying databases.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., and Patel-Schneider, P. F., Eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. Google ScholarDigital Library
- Batini, C. and Scannapieco, M. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer-Verlag, Berlin. Google ScholarDigital Library
- Bohannon, P., Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. 2007. Conditional functional dependencies for data cleaning. In Proceedings of ICDE. IEEE, 746--755.Google Scholar
- Bravo, L., Fan, W., and Ma, S. 2007. Extending dependencies with conditions. In Proceedings of VLDB. 243--254. Google ScholarDigital Library
- Bravo, L., Fan, W., Geerts, F., and Ma, S. 2008. Increasing the expressivity of conditional functional dependencies without extra complexity. In Proceedings of ICDE. IEEE, 516--525. Google ScholarDigital Library
- Chiang, F. and Miller, R. J. 2008. Discovering data quality rules. Proc. VLDB 1, 1, 1166--1177. Google ScholarDigital Library
- Curé, O. 2004. Ximsa : Extended interactive multimedia system for auto-medication. In Proceedings of CBMS. 570--575. Google ScholarDigital Library
- Curé, O. and Bensaid, J.-D. 2008. Integration of relational databases into owl knowledge bases: Demonstration of the DBOM system. In Proceedings of ICDE Workshops. 230--233. Google ScholarDigital Library
- Curé, O. and Squelbut, R. 2005. A database trigger strategy to maintain knowledge bases developed via data migration. In Proceedings of EPIA. 206--217. Google ScholarDigital Library
- Fan, W. 2008. Dependencies revisited for improving data quality. In Proceedings of PODS. ACM, 159--170. Google ScholarDigital Library
- Fan, W., Geerts, F., Laksmanan, L. V., and Xiong, M. 2009. Discovering conditional functional dependencies. In Proceedings of ICDE. 1231--1234. Google ScholarDigital Library
- Giroud, J.-P. and Hagege, C. 2001. Le Guide de Tous les Médicaments. Editions du Rocher Paris, France.Google Scholar
- Goethals, B., Page, W. L., and Mannila, H. 2008. Mining association rules of simple conjunctive queries. In Proceedings of SDM. SIAM, 96--107.Google Scholar
- Golab, L., Karloff, H. J., Korn, F., Srivastava, D., and Yu, B. 2008. On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB 1, 1, 376--390. Google ScholarDigital Library
- Levy, A. Y., Mendelzon, A. O., Sagiv, Y., and Srivastava, D. 1995. Answering queries using views. In Proceedings of PODS. ACM, 95--104. Google ScholarDigital Library
- Marchi, F. D. and Petit, J.-M. 2003. Zigzag: A new algorithm for mining large inclusion dependencies in database. In Proceedings of ICDM. IEEE Computer Society, 27--34. Google ScholarDigital Library
- Mitchell, J. C. 1983. The implication problem for functional and inclusion dependencies. Inform. Contr. 56, 3, 154--173. Google ScholarDigital Library
Index Terms
- Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies
Recommendations
Extending Conditional Dependencies with Built-in Predicates
This paper proposes a natural extension of conditional functional dependencies (CFDs [1]) and conditional inclusion dependencies (CINDs [2]), denoted by CFD<sup>p</sup>s and CIND<sup>p</sup>s, respectively, by specifying patterns of data values with 6 &#...
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life SciencesThe Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
Transforming XML documents to OWL ontologies: A survey
The aims of XML data conversion to ontologies are the indexing, integration and enrichment of existing ontologies with knowledge acquired from these sources. The contribution of this paper consists in providing a classification of the approaches used ...
Comments