ABSTRACT
There has recently been an increase in the number of RDF knowledge bases published on the Internet. These rich RDF data sets can be useful in answering many queries, but much more interesting queries can be answered by integrating information from different data sets. This has given rise to research on automatically linking different RDF data sets representing different knowledge bases. This is challenging due to their scale and semantic heterogeneity. Various approaches have been proposed, but there is room for improving the quality of the generated links.
In this paper, we present ALEX, a system that aims at improving the quality of links between RDF data sets by using feedback provided by users on the answers to linked data queries. ALEX starts with a set of candidate links obtained using any automatic linking algorithm. ALEX utilizes user feedback to discover new links that did not exist in the set of candidate links while preserving link precision. ALEX discovers these new links by finding links that are similar to a link approved by the user through feedback on queries. ALEX uses a Monte-Carlo reinforcement learning method to learn how to explore in the space of possible links around a given link. Our experiments on real-world data sets show that ALEX is efficient and significantly improves the quality of links.
- A. Aboulnaga and K. El Gebaly.boldmath μbe: User guided source selection and schema mediation for internet scale data integration. In IEEE Int. Conf. on Data Engineering (ICDE), 2007.Google Scholar
- M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, S. Auer, and J. Lehmann. Crowdsourcing linked data quality assessment. In Proc. Int. Semantic Web Conf. (ISWC). 2013. Google ScholarDigital Library
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In Proc. Int. Semantic Web Conf. (ISWC). 2007. Google ScholarDigital Library
- D. Aumueller, H.-H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with COMA. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2005. Google ScholarDigital Library
- T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 2001.Google ScholarCross Ref
- I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Trans. on Knowledge Discovery from Data (TKDD), 2007. Google ScholarDigital Library
- C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. Int. Journal on Semantic Web and Information Systems, 5(3), 2009.Google ScholarCross Ref
- C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked data on the web. In Proc. Int. World Wide Web Conf. (WWW), 2008. Google ScholarDigital Library
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2008. Google ScholarDigital Library
- G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proc. Int. World Wide Web Conf. (WWW), 2012. Google ScholarDigital Library
- O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll (preliminary results). In Proc. Int. World Wide Web Conf. (WWW), 2004. Google ScholarDigital Library
- A. Ferrara, D. Lorusso, and S. Montanelli. Automatic identity recognition in the semantic web. In Proc. Int. Workshop on Identity and Reference on the Semantic Web (IRSW), 2008.Google Scholar
- J. Gracia, M. d'Aquin, and E. Mena. Large scale integration of senses for the semantic web. In Proc. Int. World Wide Web Conf. (WWW), 2009. Google ScholarDigital Library
- W. Hu, J. Chen, and Y. Qu. A self-training approach for resolving object coreference on the semantic web. In Proc. Int. World Wide Web Conf. (WWW), 2011. Google ScholarDigital Library
- S. R. Jeffery, M. J. Franklin, and A. Y. Halevy. Pay-as-you-go user feedback for dataspace systems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2008. Google ScholarDigital Library
- R. McCann, W. Shen, and A. Doan. Matching schemas in online communities: A web 2.0 approach. In Proc. IEEE Int. Conf. on Data Engineering (ICDE), 2008. Google ScholarDigital Library
- M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. The MIT Press, 2012. Google ScholarDigital Library
- B. Quilitz and U. Leser. Querying distributed RDF data sources with SPARQL. In The Semantic Web: Research and Applications. Springer, 2008. Google ScholarDigital Library
- A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. FedX: Optimization techniques for federated query processing on linked data. In Proc. Int. Semantic Web Conf. (ISWC). 2011. Google ScholarDigital Library
- S. P. Singh and R. S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1--3), 1996. Google ScholarDigital Library
- F. M. Suchanek, S. Abiteboul, and P. Senellart. PARIS: probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. (PVLDB), 5(3), 2011. Google ScholarDigital Library
- R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998. Google ScholarDigital Library
- J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Silk-A link discovery framework for the web of data. In Proc. Workshop on Linked Data on the Web (LDOW), 2009.Google Scholar
- S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. Proc. VLDB Endow. (PVLDB), 6(6), 2013. Google ScholarDigital Library
- Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. Actively soliciting feedback for query answers in keyword search-based data integration. Proc. VLDB Endow. (PVLDB), 6(3), 2013. Google ScholarDigital Library
Index Terms
- ALEX: Automatic Link Exploration in Linked Data
Recommendations
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life SciencesThe Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
Using SPARQL to query bioportal ontologies and metadata
ISWC'12: Proceedings of the 11th international conference on The Semantic Web - Volume Part IIBioPortal is a repository of biomedical ontologies--the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies ...
BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF
BioPortal is a repository of biomedical ontologies --the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medical terminologies ...
Comments