skip to main content
10.1145/2566486.2568002acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Test-driven evaluation of linked data quality

Published: 07 April 2014 Publication History

Abstract

Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

References

[1]
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? extracting semantics from wiki content. In Proceedings of the ESWC (2007), volume 4519 of Lecture Notes in Computer Science, pages 503--517, Berlin / Heidelberg, 2007. Springer.
[2]
C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. Web Semantics, 7(1):1 -- 10, Jan 2009.
[3]
L. Buhmann and J. Lehmann. Universal OWL axiom enrichment for large knowledge bases. In Proceedings of EKAW 2012, pages 57--71. Springer, 2012.
[4]
L. Buhmann and J. Lehmann. Pattern based knowledge base enrichment. In 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013.
[5]
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008.
[6]
J. Demter, S. Auer, M. Martin, and J. Lehmann. LODStats -- an extensible framework for high-performance dataset analytics. In Proceedings of the EKAW 2012, Lecture Notes in Computer Science (LNCS) 7603. Springer, 2012. 29
[7]
A. Deutsch. Fol modeling of integrity constraints (dependencies). In L. LIU and M. OZSU, editors, Encyclopedia of Database Systems, pages 1155--1161. Springer US, 2009.
[8]
W. Fan. Dependencies revisited for improving data quality. In Proceedings of the Twenty-seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '08, pages 159--170, New York, NY, USA, 2008. ACM.
[9]
A. Flemming. Quality characteristics of linked data publishing datasources. Master's thesis, Humboldt-Universitat of Berlin, 2010.
[10]
C. Furber and M. Hepp. Using semantic web resources for data quality management. In P. Cimiano and H. Pinto, editors, Knowledge Engineering and Management by the Masses, volume 6317 of Lecture Notes in Computer Science, pages 211--225. Springer Berlin Heidelberg, 2010.
[11]
C. Furber and M. Hepp. Using SPARQL and SPIN for data quality management on the semantic web. In W. Abramowicz and R. Tolksdorf, editors, BIS, volume 47 of Lecture Notes in Business Information Processing, pages 35--46. Springer, 2010.
[12]
C. Guéret, P. T. Groth, C. Stadler, and J. Lehmann. Assessing linked data mappings using network measures. In Proceedings of the 9th Extended Semantic Web Conference, volume 7295 of Lecture Notes in Computer Science, pages 87--102. Springer, 2012.
[13]
S. Hellmann, J. Lehmann, S. Auer, and M. Brummer. Integrating nlp using linked data. In 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013.
[14]
A. Hogan, A. Harth, A. Passant, S. Decker, and A. Polleres. Weaving the pedantic web. In LDOW, 2010.
[15]
Q. Ji, P. Haase, G. Qi, P. Hitzler, and S. Stadtmuller. Radon - repair and diagnosis in ontology networks. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyvonen, R. Mizoguchi, E. Oren, M. Sabou, and E. P. B. Simperl, editors, ESWC, volume 5554 of Lecture Notes in Computer Science, pages 863--867. Springer, 2009.
[16]
J. M. Juran. Quality Control Handbook. McGraw-Hill, 4th edition, August 1988.
[17]
H. Knublauch, J. A. Hendler, and K. Idehen. SPIN - overview and motivation. W3C Member Submission, W3C, February 2011.
[18]
D. Kontokostas, C. Bratsas, S. Auer, S. Hellmann, I. Antoniou, and G. Metakides. Internationalization of linked data: The case of the greek dbpedia edition. Web Semantics: Science, Services and Agents on the World Wide Web, 15(0):51 -- 61, 2012.
[19]
G. Lausen, M. Meier, and M. Schmidt. SPARQLing constraints for RDF. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '08, pages 499--509, New York, NY, USA, 2008. ACM.
[20]
J. Lehmann, C. Bizer, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - a crystallization point for the web of data. Journal of Web Semantics, 7(3):154--165, 2009.
[21]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 2014.
[22]
P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked data quality assessment and fusion. In D. Srivastava and I. Ari, editors, EDBT/ICDT Workshops, pages 116--123. ACM, 2012.
[23]
C. Rieß, N. Heino, S. Tramp, and S. Auer. EvoPat -- Pattern-Based Evolution and Refactoring of RDF Knowledge Bases. In Proceedings of the 9th International Semantic Web Conference (ISWC2010), Lecture Notes in Computer Science, Berlin / Heidelberg, 2010. Springer.
[24]
E. Sirin and J. Tao. Towards integrity constraints in owl. In Proceedings of the Workshop on OWL: Experiences and Directions, OWLED, 2009.
[25]
C. Stadler, J. Lehmann, K. Hoffner, and S. Auer. Linkedgeodata: A core for a web of spatial open data. Semantic Web Journal, 3(4):333--354, 2012.
[26]
O. Suominen and E. Hyvonen. Improving the quality of SKOS vocabularies with skosify. In Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management, EKAW'12, pages 383--397, Berlin, Heidelberg, 2012. Springer-Verlag.
[27]
A. Zaveri, D. Kontokostas, M. A. Sherif, L. Buhmann, M. Morsey, S. Auer, and J. Lehmann. User-driven quality evaluation of DBpedia. In Proceedings of 9th International Conference on Semantic Systems, I-SEMANTICS '13, Graz, Austria, September 4-6, 2013. ACM, 2013.
[28]
H. Zhu, P. A. V. Hall, and J. H. R. May. Software unit test coverage and adequacy. ACM Comput. Surv., 29(4):366--427, 1997.

Cited By

View all
  • (2024)A Deep Learning-Based Framework for Handling Incompleteness and Detecting Errors in Linked Data Applied to the UniProt Dataset2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP)10.1109/IDAP64064.2024.10710995(1-8)Online publication date: 21-Sep-2024
  • (2024)SeSICL: Semantic and Structural Integrated Contrastive Learning for Knowledge Graph Error DetectionIEEE Access10.1109/ACCESS.2024.338454312(56088-56096)Online publication date: 2024
  • (2024)Knowledge graph accuracy evaluation: an LLM-enhanced embedding approachInternational Journal of Data Science and Analytics10.1007/s41060-024-00661-3Online publication date: 8-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data quality
  2. dbpedia
  3. linked data

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)7
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Deep Learning-Based Framework for Handling Incompleteness and Detecting Errors in Linked Data Applied to the UniProt Dataset2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP)10.1109/IDAP64064.2024.10710995(1-8)Online publication date: 21-Sep-2024
  • (2024)SeSICL: Semantic and Structural Integrated Contrastive Learning for Knowledge Graph Error DetectionIEEE Access10.1109/ACCESS.2024.338454312(56088-56096)Online publication date: 2024
  • (2024)Knowledge graph accuracy evaluation: an LLM-enhanced embedding approachInternational Journal of Data Science and Analytics10.1007/s41060-024-00661-3Online publication date: 8-Oct-2024
  • (2023)Qualidade de dados Linked DataInformação@Profissões10.5433/2317-4390.2022v11n2p15311:2(153-169)Online publication date: 20-Sep-2023
  • (2023)Automatic transparency evaluation for open knowledge extraction systemsJournal of Biomedical Semantics10.1186/s13326-023-00293-914:1Online publication date: 31-Aug-2023
  • (2023)Automated approach for quality assessment of RDF resourcesBMC Medical Informatics and Decision Making10.1186/s12911-023-02182-823:S1Online publication date: 10-May-2023
  • (2023)Automated Ontology Evaluation: Evaluating Coverage and Correctness using a Domain CorpusCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587617(1127-1137)Online publication date: 30-Apr-2023
  • (2023)Web accessibility automatic evaluation tools: to what extent can they be automated?CCF Transactions on Pervasive Computing and Interaction10.1007/s42486-023-00127-85:3(288-320)Online publication date: 14-Mar-2023
  • (2022)An assertion and alignment correction framework for large scale knowledge basesSemantic Web10.3233/SW-21044814:1(29-53)Online publication date: 30-Nov-2022
  • (2022)A Shape Expression approach for assessing the quality of Linked Open Data in librariesSemantic Web10.3233/SW-21044114:2(159-179)Online publication date: 15-Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media