skip to main content
10.1145/1321440.1321537acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Leveraging context in user-centric entity detection systems

Published: 06 November 2007 Publication History

Abstract

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document.
However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to "see" every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs.
We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.

References

[1]
D. Appelt, J. Hobbs, J. Bear, D. J. Israel, and M. Tyson. FASTUS: a finite-state processor for information extraction from real-world text. In Proceedings of IJCAI-93, 1993.
[2]
D. E. Appelt, J. R. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. SRI International FASTUS system MUC-6 test results and analysis. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 237--248, San Francisco, 1995. Morgan Kaufmann.
[3]
S. Baluja, V. Mittal, and R. Sukthankar. Applying Machine Learning for High Performance Named-Entity Extraction. Computational Intelligence, 16(4), November 2000.
[4]
O. Bender, F. J. Och, and H. Ney. Maximum entropy models for named entity recognition. In Seventh Conference on Natural Language Learning(CoNLL-03), 2003.
[5]
D. M. Bikel, R. L. Schwartz, and R. M. Weischedel. An algorithm that learns what's in a name. Machine Learning, 34(1-3):211--231, 1999.
[6]
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Proceedings of the 6th Workshop on Very Large Corpora, 1998.
[7]
S. Dumais, E. Cutrell, R. Sarin, and E. Horvitz. Implicit queries (IQ) for contextualized search. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 594--594, New York, NY, USA, 2004. ACM Press.
[8]
E. Frank, G. Paynter, I. Witten, C. Gutwin, and C. Nevill-Manning. Domain-specific keyphrase extraction. In Proceedings of the 1999 International Joint Conference on Artificial Intelligence, pages 668--673, 1999.
[9]
J. Goodman and V. R. Carvalho. Implicit queries for email. In Proceedings of the 2nd Conference on Email and Anti-Spam, 2005.
[10]
R. Grishman and B. Sundheim. Design of the MUC-6 evaluation. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 1--11, SanFrancisco, 1995. Morgan Kaufmann.
[11]
M. Henzinger, B.-W. Chang, B. Milch, and S. Brin. Query-free news search. In Proceedings of the 12th International World Wide Web Conference (WWW), pages 1--10, 2003.
[12]
A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processcing, pages 216--223, 2003.
[13]
P. Jackson and I. Moulinier. Natural Language Processing for Online Applications. John Benjamins Publishing Company, 2002.
[14]
S. Kapur and D. Joshi. Systems and methods for generating concept units from search queries. United States Patent 7051023, May 2006.
[15]
B. A. Nardi, J. R. Miller, and D. J. Wright. Collaborative, programmable intelligent agents. Communications of the ACM, 41(3):96--104, March 1998.
[16]
D. Palmer and D. Day. A statistical profile of the named entity task. In Proceedings of the Conference on Applied Natural Language Processing, 1997.
[17]
M. Pandit and S. Kalbag. The selection recognition agent: Instant access to relevant information and operations. In Proceedings of the International Conference on Intelligent User Interfaces, 1997.
[18]
J. Parikh and S. Kapur. Unity: relevance feedback using user query logs. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2006.
[19]
G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA, 1987.
[20]
W. tau Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web pages. In Proceedings of the 15th international conference on World Wide Web, pages 213--222, New York, NY, USA, 2006. ACM Press.
[21]
P. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000.

Cited By

View all
  • (2018)LinkifyIEEE Intelligent Systems10.1109/MIS.2018.11114423333:5(37-46)Online publication date: 1-Sep-2018
  • (2014)Evaluating the helpfulness of linked entities to readersProceedings of the 25th ACM conference on Hypertext and social media10.1145/2631775.2631802(169-178)Online publication date: 1-Sep-2014
  • (2011)Citation recommendation without author supervisionProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935926(755-764)Online publication date: 9-Feb-2011
  • Show More Cited By

Index Terms

  1. Leveraging context in user-centric entity detection systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. content syndication
    2. context
    3. contextual
    4. contextual shortcuts
    5. entity detection
    6. information extraction

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)LinkifyIEEE Intelligent Systems10.1109/MIS.2018.11114423333:5(37-46)Online publication date: 1-Sep-2018
    • (2014)Evaluating the helpfulness of linked entities to readersProceedings of the 25th ACM conference on Hypertext and social media10.1145/2631775.2631802(169-178)Online publication date: 1-Sep-2014
    • (2011)Citation recommendation without author supervisionProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935926(755-764)Online publication date: 9-Feb-2011
    • (2010)The semantic webTransactions on computational collective intelligence II10.5555/1985614.1985619(90-107)Online publication date: 1-Jan-2010
    • (2010)A scalable machine-learning approach for semi-structured named entity recognitionProceedings of the 19th international conference on World wide web10.1145/1772690.1772738(461-470)Online publication date: 26-Apr-2010
    • (2010)The Semantic Web: From Representation to RealizationTransactions on Computational Collective Intelligence II10.1007/978-3-642-17155-0_5(90-107)Online publication date: 2010
    • (2010)Next Generation SearchAlgorithms for Next Generation Networks10.1007/978-1-84882-765-3_16(373-401)Online publication date: 20-Jan-2010
    • (2009)Computational community interest for rankingProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1645987(245-254)Online publication date: 2-Nov-2009
    • (2009)Personalizing entity detection and recommendation with a fusion of web log mining techniquesProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology10.1145/1516360.1516486(1100-1103)Online publication date: 24-Mar-2009
    • (2009)Contextual Ranking of Keywords Using Click DataProceedings of the 2009 IEEE International Conference on Data Engineering10.1109/ICDE.2009.76(457-468)Online publication date: 29-Mar-2009
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media