skip to main content
10.1145/3078714.3078717acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Entity-centric Data Fusion on the Web

Published: 04 July 2017 Publication History

Abstract

A lot of current web pages include structured data which can directly be processed and used. Search engines, in particular, gather that structured data and provide question answering capabilities over the integrated data with an entity-centric presentation of the results. Due to the decentralized nature of the web, multiple structured data sources can provide similar information about an entity. But data from different sources may involve different vocabularies and modeling granularities, which makes integration difficult. We present an approach that identifies similar entity-specific data across sources, independent of the vocabulary and data modeling choices. We apply our method along the scenario of a trustable knowledge panel, conduct experiments in which we identify and process entity data from web sources, and compare the output to a competing system. The results underline the advantages of the presented entity-centric data fusion approach.

References

[1]
Krisztian Balog, David Carmel, Arjen P. de Vries, Daniel M. Herzig, Peter Mika, Haggai Roitman, Ralf Schenkel, Pavel Serdyukov, and Thanh Tran Duc. 2012. The First Joint International Workshop on Entity-oriented and Semantic Search ( JIWES). SIGIR Forum 46, 2 (2012), 87--94.
[2]
Tim Berners-Lee. 2006. Linked Data. https://www.w3.org/DesignIssues/ LinkedData.html. (2006).
[3]
Abraham Bernstein, James Hendler, and Natalya Noy. 2016. A New Look at the Semantic Web. Commun. ACM 59, 9 (2016), 35--37.
[4]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 1247--1250.
[5]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117.
[6]
Amy Cavenaile. 2016. You probably haven't even noticed Google's sketchy quest to control the world's knowledge. https://www.washingtonpost.com/news/the-intersect/wp/2016/05/11/you. (2016).
[7]
Michelle Cheatham and Pascal Hitzler. 2013. String Similarity Metrics for On- tology Alignment. In The Semantic Web -- ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceed- ings, Part II. Springer Berlin Heidelberg, Berlin, Heidelberg, 294--309.
[8]
Stefan Dietze. 2017. Retrieval, Crawling and Fusion of Entity-centric Data on the Web. In Semantic Keyword-Based Search on Structured Data Sources: COST Action IC1302 Second International KEYSTONE Conference, IKC 2016, Cluj Napoca, Romania, September 8-9, 2016, Revised Selected Papers, Andrea Calì, Dorian Gorgan, and Martín Ugarte (Eds.). Springer International Publishing, Cham, 3--16.
[9]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth Discovery and Copying Detection in a Dynamic World. Proc. VLDB Endow. 2, 1 (2009), 562--573.
[10]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM, New York, NY, USA, 601--610.
[11]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From Data Fusion to Knowledge Fusion. Proc. VLDB Endow. 7, 10 (2014), 881--892.
[12]
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based Trust: Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 8, 9 (2015), 938--949.
[13]
Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandečić. 2014. Introducing Wikidata to the Linked Data Web. In The Semantic Web -- ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I. Number 8796 in Lecture Notes in Computer Science. Springer International Publishing, 50--65.
[14]
Heather Ford and Mark Graham. 2016. Code and the City. Routledge, Chapter Semantic Cities: Coded Geopolitics and the Rise of the Semantic Web, 200--214.
[15]
Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental Record Linkage. Proc. VLDB Endow. 7, 9 (2014), 697--708.
[16]
Ramanathan V. Guha, Dan Brickley, and Steve MacBeth. 2015. Schema.Org: Evolution of Structured Data on the Web. Queue 13, 9, Article 10 (2015), 28 pages.
[17]
Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What Works Well With Wikidata?. In Proceedings of the 11th International Work- shop on Scalable Semantic Web Knowledge Base Systems (CEUR Workshop Proceed- ings), Vol. 1457. CEUR-WS.org, 32--47.
[18]
Daniel M. Herzig, Peter Mika, Roi Blanco, and Thanh Tran. 2013. Federated Entity Search Using On-the-Fly Consolidation. In The Semantic Web -- ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I. Springer Berlin Heidelberg, Berlin, Heidelberg, 167--183.
[19]
Aidan Hogan, Andreas Harth, and Stefan Decker. 2007. Performing Object Consolidation on the Semantic Web Data Graph. In Proceedings of 1st I3: Identity, Identifiers, Identification Workshop co-located with the 16th International World Wide Web Conference (WWW2007), Banff, Alberta, Canada .
[20]
Wei Hu, Jianfeng Chen, Hang Zhang, and Yuzhong Qu. 2011. How Matchable Are Four Thousand Ontologies on the Semantic Web. In The Semantic Web: Research and Applications: 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Greece, May 29-June 2, 2011, Proceedings, Part I. Springer Berlin Heidelberg, Berlin, Heidelberg, 290--304.
[21]
Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC '98). ACM, New York, NY, USA, 604--613.
[22]
Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. 2006. Record Linkage: Similarity Measures and Algorithms. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 802--803.
[23]
Frank Manola and Eric Miller. 2004. RDF Primer. (2004). W3C Recommendation, http://www.w3.org/TR/rdf-syntax/.
[24]
Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The WebDataCommons Microdata, RDFa and Microformat Dataset Series. In The Semantic Web -- ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I. Springer International Publishing, Cham, 277--292.
[25]
Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite. 2010. Linking and Building Ontologies of Linked Data. Springer Berlin Heidelberg, Berlin, Heidelberg, 598--614.
[26]
Dominique Ritze, Christian Meilicke, Ondrej óŠváb Zamazal, and Heiner Stuckenschmidt. 2009. A Pattern-based Ontology Matching Approach for Detecting Complex Correspondences. In Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) Collocated with the 8th International Semantic Web Conference (ISWC 2009) (CEUR Workshop Proceedings), Vol. 551. CEUR-WS.org, 25--36.
[27]
Pavel Shvaiko, Jérôme Euzenat, Fausto Giunchiglia, Heiner Stuckenschmidt, Natasha Noy, and Arnon Rosenthal (Eds.). 2009. Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) Collocated with the 8th International Semantic Web Conference (ISWC 2009). CEUR Workshop Proceedings, Vol. 551. CEUR-WS.org.
[28]
Amit Singhal. 2012. Introducing the Knowledge Graph: things, not strings. http: //googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not. html. (2012).
[29]
Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Proba- bilistic Alignment of Relations, Instances, and Schema. Proc. VLDB Endow. 5, 3 (2011), 157--168.
[30]
Andreas Thalhammer. 2016. Linked Data Entity Summarization. Phdthesis. KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe.
[31]
Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger. 2016. LinkSUM: Using Link Analysis to Summarize Entity Data. In Web Engineering: 16th International Conference, ICWE 2016, Lugano, Switzerland, June 6-9, 2016. Proceedings. Lecture Notes in Computer Science, Vol. 9671. Springer International Publishing, Cham, 244--261.
[32]
Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Renaud Delbru, and Stefan Decker. 2010. Sig.ma: Live views on the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 8, 4 (2010), 355--364.
[33]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
[34]
Denny Vrandečić, Varun Ratnakar, Markus Krötzsch, and Yolanda Gil. 2011. Shortipedia: Aggregating and Curating Semantic Web Data. Web Semantics: Science, Services and Agents on the World Wide Web 9, 3 (2011), 334--338.

Cited By

View all
  • (2021)The Method of User Information Fusion Oriented to Manufacturing Service Value NetHuman Centered Computing10.1007/978-3-030-70626-5_7(68-74)Online publication date: 12-Mar-2021
  • (2020)Interacting with Linked Data: A Survey from the SIGCHI PerspectiveExtended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3334480.3382909(1-12)Online publication date: 25-Apr-2020
  • (2019)FusEACM Transactions on the Web10.1145/330612813:2(1-36)Online publication date: 17-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social Media
July 2017
336 pages
ISBN:9781450347082
DOI:10.1145/3078714
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data provenance
  2. data/knowledge fusion
  3. entity data fusion
  4. entity-centric data fusion
  5. linked data
  6. n-ary relations
  7. structured data

Qualifiers

  • Research-article

Funding Sources

  • German Federal Ministry of Education and Research (BMBF) within the Software Campus project SumOn
  • Marie Curie International Research Staff Exchange Scheme (IRSES) of the European Union Seventh Framework Programme (FP7/2007- 2013)

Conference

HT'17
Sponsor:
HT'17: 28th Conference on Hypertext and Social Media
July 4 - 7, 2017
Prague, Czech Republic

Acceptance Rates

HT '17 Paper Acceptance Rate 19 of 69 submissions, 28%;
Overall Acceptance Rate 342 of 1,022 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)The Method of User Information Fusion Oriented to Manufacturing Service Value NetHuman Centered Computing10.1007/978-3-030-70626-5_7(68-74)Online publication date: 12-Mar-2021
  • (2020)Interacting with Linked Data: A Survey from the SIGCHI PerspectiveExtended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3334480.3382909(1-12)Online publication date: 25-Apr-2020
  • (2019)FusEACM Transactions on the Web10.1145/330612813:2(1-36)Online publication date: 17-Feb-2019
  • (2017)Distributed Holistic Clustering on Linked DataOn the Move to Meaningful Internet Systems. OTM 2017 Conferences10.1007/978-3-319-69459-7_25(371-382)Online publication date: 23-Oct-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media