skip to main content
10.1145/1031453.1031472acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Grouping search-engine returned citations for person-name queries

Published: 12 November 2004 Publication History

Abstract

We present a technique to group search-engine returned citations for person-name queries, such that the search-engine returned citations in each group belong to the same person. To group the returned citations, we use a multi-faceted approach that considers evidence from three facets: (1) attributes, (2) links, and (3) page similarity. Based on the three facets, we construct a relatedness confidence matrix for pairs of citations. We then merge pairs whose matching confidence value is above an empirically determined threshold. Experimental results from the implementation of our multi-faceted approach are promising.

References

[1]
A. Bagga and B. Baldwin. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 79--85, Montreal, Canada, June 1998.
[2]
Dmoz home page. http://dmoz.org/.
[3]
Google home page. http://www.google.com/.
[4]
T. Huang and S. Russell. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the 5th International Joint Conference on Artificial Intelligence, pp. 1276-1283, Nagoya, Japan, August 1997.
[5]
T. Huang and S. Russell. Object Identification: Analysis with Application to Traffic Surveillance. Artificial Intelligence, 103:77--93, August 1998.
[6]
T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning, pp. 137--142, Chemnitz, Germany, April 1998.
[7]
G. Luger and W. Stubblefield. Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Addison Wesley Longman, Reading, Massachusetts, USA, September 1997.
[8]
G. Mann and D. Yarowsky. Unsupervised Personal Name Disambiguation. In Proceedings of the 7th Conference on Natural Language Learning, pp. 33-40, Edmonton, Canada, June 2003.
[9]
E. Ristad and P. Yianilos. Learning String Edit Distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(2):522-532, May 1998.
[10]
S. Tejada, C. A. Knoblock, and S. Minton. Learning Object Identification Rules for Information Integration. Information Systems, 26(8):607-633, December 2001.
[11]
D. Winchester and M. Lee. Cross-Document Co-Reference of Proper Names. In Proceedings of the 5th Computational Linguistics in the UK, Leeds, UK, January 2002.
[12]
D. Winchester and M. Lee. Using Proper Names to Cluster Documents. In Acquiring (and Using) Linguistic (and World) Knowledge for Information Access: Papers from the spring Symposium (Technical Report SS-02-09), pp. 3--8, Menlo, California, USA, January 2002.
[13]
Yahoo home page. http://www.yahoo.com/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WIDM '04: Proceedings of the 6th annual ACM international workshop on Web information and data management
November 2004
168 pages
ISBN:1581139780
DOI:10.1145/1031453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. citation grouping
  2. multi-faceted approach
  3. person-name
  4. queries
  5. search engines

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 12 - 13, 2004
Washington DC, USA

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Survey of Person Name Disambiguation on the WebIEEE Access10.1109/ACCESS.2018.28748916(59496-59514)Online publication date: 2018
  • (2016)Person-Centric Mining of Historical Newspaper CollectionsResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_25(320-331)Online publication date: 10-Aug-2016
  • (2014)Measures of Entity Resolution ResultInnovative Techniques and Applications of Entity Resolution10.4018/978-1-4666-5198-2.ch002(15-39)Online publication date: 2014
  • (2014)Timeline generationProceedings of the 23rd international conference on World wide web10.1145/2566486.2567969(643-652)Online publication date: 7-Apr-2014
  • (2014)Learning an accurate entity resolution model from crowdsourced labelsProceedings of the 8th International Conference on Ubiquitous Information Management and Communication10.1145/2557977.2558060(1-8)Online publication date: 9-Jan-2014
  • (2013)Discovering filter keywords for company name disambiguation in twitterExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.03.00140:12(4986-5003)Online publication date: 1-Sep-2013
  • (2011)Enabling search for facts and implied facts in historical documentsProceedings of the 2011 Workshop on Historical Document Imaging and Processing10.1145/2037342.2037353(59-66)Online publication date: 16-Sep-2011
  • (2011)Focused Crawling Using Name Disambiguation on Search Engine ResultsProceedings of the 2011 European Intelligence and Security Informatics Conference10.1109/EISIC.2011.31(340-345)Online publication date: 12-Sep-2011
  • (2011)Improving the Precision and Recall of Web People Search Using Hash Table ClusteringComputer Networks and Intelligent Computing10.1007/978-3-642-22786-8_19(155-160)Online publication date: 2011
  • (2010)Identifying and ranking possible semantic and common usage categories of search engine queriesProceedings of the 11th international conference on Web information systems engineering10.5555/1991336.1991365(254-261)Online publication date: 12-Dec-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media