skip to main content
10.1145/2983323.2983798acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Discovering Entities with Just a Little Help from You

Published: 24 October 2016 Publication History

Abstract

Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy.
What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches.
We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14. ACM, 2009.
[2]
H. Bast, F. Bäurle, B. Buchhold, and E. Haußmann. Semantic Full-Text Search with Broccoli. In SIGIR, 2014.
[3]
R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In PEACL, Trento, Italy, pages 9--16, 2006.
[4]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Query expansion by mining user logs. Knowledge and Data Engineering, IEEE Transactions on, 15(4):829--839, 2003.
[5]
J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In SIGIR, 2014.
[6]
Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen. Multi-dimensional search result diversification. In WSDM, pages 475--484. ACM, 2011.
[7]
B. Fetahu, A. Anand, and A. Anand. How much is wikipedia lagging behind news? In ACM Web Science Conference, Oxford, UK, 2015.
[8]
E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of clueweb corpora, version 1 (release date 2013-06--26, format version 1, correction level 0). Note: http://lemurproject.org/clueweb09/FACC1/Cited by, 5, 2013.
[9]
D. Harman. Relevance feedback and other query modification techniques., 1992.
[10]
J. Hoffart, Y. Altun, and G. Weikum. Discovering Emerging Entities with Ambiguous Names. In WWW, 2014.
[11]
J. Hoffart, D. Milchevski, and G. Weikum. STICS: Searching with Strings, Things, and Cats. In SIGIR, 2014.
[12]
J. Hoffart, D. Milchevski, G. Weikum, A. Anand and J. Singh. Jaspreet The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities. In WWW, 2016.
[13]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In EMNLP, 2011.
[14]
H. Ji, R. Grishman, and H. T. Dang. Overview of the TAC2011 Knowledge Base Population Track. In TAC, 2011.
[15]
H. Ji, J. Nothman, B. Hachey, and F. Radu. Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking.
[16]
B. Keegan, D. Gergle, and N. Contractor. Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events. American Behavioral Scientist, 57(5), 2013.
[17]
M. Lalmas, H. O'Brien and E .Yom-Tov. Measuring user engagement. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2014.
[18]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127. ACM, 2001.
[19]
Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining Evidences for Named Entity Disambiguation. In KDD, 2013.
[20]
C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
[21]
L.-A. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and Global Algorithms for Disambiguation to Wikipedia. In ACL-HLT, pages 1375--1384, Oregon, USA, 2011.
[22]
J. J. Rocchio. Relevance feedback in information retrieval. 1971.
[23]
R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In P WWW, pages 881--890, New York 2010.
[24]
W. Shen, J. Wang, and J. Han. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Trans. Knowl. Data Eng., 27(2), 2015.
[25]
J. Singh, W. Nejdl, and A. Anand. History by diversity: Helping historians search news archives. In ACM CHIIR, 2016.
[26]
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR, pages 4--11. ACM, 1996.
[27]
J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM TOIS, 18(1):79--112, 2000.

Cited By

View all
  • (2023)Extractive Explanations for Interpretable Text RankingACM Transactions on Information Systems10.1145/357692441:4(1-31)Online publication date: 23-Mar-2023
  • (2022)SparCAssistProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531677(3219-3223)Online publication date: 6-Jul-2022
  • (2022)Learning Entity Linking Features for Emerging EntitiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3197707(1-14)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. diversity
  2. human-in-the-loop
  3. knowledge base acceleration
  4. named entity disambiguation
  5. relevance feedback
  6. retrieval model
  7. user simulation

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Extractive Explanations for Interpretable Text RankingACM Transactions on Information Systems10.1145/357692441:4(1-31)Online publication date: 23-Mar-2023
  • (2022)SparCAssistProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531677(3219-3223)Online publication date: 6-Jul-2022
  • (2022)Learning Entity Linking Features for Emerging EntitiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3197707(1-14)Online publication date: 2022
  • (2021)FaxPlainACProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481985(4823-4827)Online publication date: 26-Oct-2021
  • (2019)How New is the (RDF) News?Companion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317702(714-721)Online publication date: 13-May-2019
  • (2019)Same but differentProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297381(1019-1026)Online publication date: 8-Apr-2019
  • (2019)VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity DetectionThe Semantic Web – ISWC 201910.1007/978-3-030-30796-7_24(382-399)Online publication date: 17-Oct-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media