ABSTRACT
Knowledge bases about entities are an important part of modern information retrieval systems. A strong ranking of entities can be used to enhance query understanding and document retrieval or can be presented as another vertical to the user. Given a keyword query, our task is to provide a ranking of the entities present in the collection of interest. We are particularly interested in approaches to this problem that generalize to different knowledge bases and different collections. In the past, this kind of problem has been explored in the enterprise domain through Expert Search. Recently, a dataset was introduced for entity ranking from news and web queries from more general TREC collections.
Approaches from prior work leverage a wide variety of lexical resources: e.g., natural language processing and relations in the knowledge base. We address the question of whether we can achieve competitive performance with minimal linguistic resources.
We propose a set of features that do not require index-time entity linking, and demonstrate competitive performance on the new dataset. As this paper is the first non-introductory work to leverage this new dataset, we also find and correct certain aspects of the benchmark. To support a fair evaluation, we collect 38% more judgments and contribute annotator agreement information.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.Google Scholar
- K. Balog, L. Azzopardi, and M. De Rijke. Formal models for expert finding in enterprise corpora. In SIGIR'06, pages 43--50. Google ScholarDigital Library
- K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si. Expertise retrieval. Foundations and Trends in Information Retrieval, 6(2--3):127--256, 2012. Google ScholarDigital Library
- K. Balog, P. Serdyukov, and A. P. d. Vries. Overview of the trec 2010 entity track. Technical report, DTIC Document, 2010.Google Scholar
- A. Boldyrev, G. Weikum, and M. Theobald. Dictionary-Based Named Entity Recognition. PhD thesis, Universitat des Saarlandes Saarbrücken, 2013.Google Scholar
- D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang. ERD'14: entity recognition and disambiguation challenge. In ACM SIGIR Forum, 2014. Google ScholarDigital Library
- N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec 2005 enterprise track. In TREC, pages 199--205, 2005.Google Scholar
- W. B. Croft, D. Metzler, and T. Strohman. Search engines: Information retrieval in practice. Addison-Wesley Reading, 2010. Google ScholarDigital Library
- J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In SIGIR'14, pages 365--374. Google ScholarDigital Library
- V. Dang. Ranklib. https://sourceforge.net/p/lemur/wiki/RankLib, 2015.Google Scholar
- G. Demartini, T. Iofciu, and A. P. De Vries. Overview of the INEX 2009 entity ranking track. In Focused Retrieval and Evaluation, pages 254--264. Springer, 2010. Google ScholarDigital Library
- J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. EACL'14, page 205.Google Scholar
- P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In CIKM'10, pages 1625--1628. Google ScholarDigital Library
- E. Gabrilovich, M. Ringgaard, and A. Subramanya. Freebase annotation of clueweb corpora. http://lemurproject.org/clueweb09/FACC1/, June 2013.Google Scholar
- F. Hasibi, K. Balog, and S. E. Bratsberg. On the reproducibility of the tagme entity linking system. In ECIR'16, pages 436--449.Google Scholar
- J. Hoffart, D. Milchevski, and G. Weikum. Stics: searching with strings, things, and cats. In SIGIR'14, pages 1247--1248. Google ScholarDigital Library
- J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28--61, 2013. Google ScholarDigital Library
- Y. Hong, D. Lu, D. Yu, X. Pan, X. Wang, Y. Chen, L. Huang, and H. Ji. RPI Blender TAC-KBP2015 system description. In Text Analysis Conference, 2015.Google Scholar
- H. Ji, J. Nothman, B. Hachey, and R. Florian. Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking. 2015.Google Scholar
- V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR 2001, pages 120--127, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- X. Liu and H. Fang. Latent entity space: a novel retrieval approach for entity-bearing queries. Information Retrieval Journal, 18(6):473--503, 2015. Google ScholarDigital Library
- C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL'14 Demo.Google Scholar
- P. McNamee and H. T. Dang. Overview of the TAC 2009 knowledge base population track. In TAC'09, volume 17, pages 111--113, 2009.Google Scholar
- D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In CIKM'07, pages 731--740. Google ScholarDigital Library
- M. Schuhmacher, L. Dietz, and S. Ponzetto. Ranking entities for web queries through text and knowledge. In CIKM'15. Google ScholarDigital Library
- C. Xiong and J. Callan. EsdRank: Connecting Query and Documents through External Semi-Structured Data. In CIKM'15. Google ScholarDigital Library
- N. Zhiltsov, A. Kotov, and F. Nikolaev. Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In SIGIR'15, pages 253--262. Google ScholarDigital Library
- G. Zuccon, B. Koopman, and P. Bruza. Exploiting inference from semantic annotations for information retrieval: Reflections from medical IR. In ESAIR'14, pages 43--45, 2014. Google ScholarDigital Library
Index Terms
- Improving Entity Ranking for Keyword Queries
Recommendations
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Improving Semantic Search through Entity-Based Document Ranking
WIMS '15: Proceedings of the 5th International Conference on Web Intelligence, Mining and SemanticsTraditional keyword-based IR approaches take into account the document context only in a limited manner. In our paper we present a novel document ranking approach based on the semantic relationships between named entities. In the first step we annotate ...
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IVThis study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Comments