skip to main content
10.1145/1378889.1378940acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Using web information for creating publication venue authority files

Published: 16 June 2008 Publication History

Abstract

Citations to publication venues in the form of journal, conference and workshop contain spelling variants, acronyms, abbreviated forms and misspellings, all of which make more difficult to retrieve the item of interest. The task of discovering and reconciling these variant forms of bibliographic references is known as authority work. The key goal is to create the so called authority files, which maintain, for any given bibliographic item, a list of variant labels (i.e., variant strings) used as a reference to it. In this paper we propose to use information available on the Web to create high quality publication venue authority files. Our idea is to recognize (and extract) references to publication venues in the text snippets of the answers returned by a search engine. References to a same publication venue are then reconciled in an authority file. Each entry in this file is composed of a canonical name for the venue, an acronym, the venue type (i.e., journal, conference, or workshop), and a mapping to various forms of writing its name in bibliographic citations. Experimental results show that our Web-based method for creating authority files is superior to previous work based on straight string matching techniques. Considering the average precision in finding correct venue canonical names, we observe gains up to 41.7%.

References

[1]
L. Auld. Authority control: An eight-year review. Library Resources & Technical Services, 26:319--330, 1982.
[2]
D. Bollegala, Y. Matsuo, and M. Ishizuka. Measuring semantic similarity between words using web search engines. In 16th World Wide Web Conf., pages 757--766, Banff, Canada, 2007.
[3]
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, 2003.
[4]
P. T. Davis, D. K. Elson, and J. L. Klavans. Methods for precise named entity matching in digital collections. In 3rd ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 125--127, Houston, USA, 2003.
[5]
J. C. French, A. L. Powell, and E. Schulman. Using clustering strategies for creating authority files. Journal of the American Society for Information Science, 51(8):774--786, 2000.
[6]
C. L. Giles, K. Bollacker, and S. Lawrence. CiteSeer: An automatic citation indexing system. In I. Witten, R. Akscyn, and F. M. Shipman III, editors, 3th ACM Conf. on Digital Libraries, pages 89--98, Pittsburgh, USA, 1998.
[7]
H. Han, C. L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In 4th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 296--305, Tuscon, USA, 2004.
[8]
Y. Hong, B.-W. On, and D. Lee. System support for name authority control problem in digital libraries: Opendblp approach. In 8th European Conf. on Digital Libraries, volume 3232/2004 of Lecture Notes in Computer Science, pages 134--144, Bath, UK, 2004.
[9]
J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pages 536--544, Berlin, Germany, 2006.
[10]
G. Karypis, E.-H. Han, and V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68--75, 1999.
[11]
I. Lapidot. Self-organizing-maps with BIC for speaker clustering. IDIAP research report 02-60, IDIAP Research Institute, Martigny, Switzerland, 2002.
[12]
L. S. Larkey, P. Ogilvie, M. A. Price, and B. Tamilio. Acrophile: An automated acronym extractor and server. In 5th ACM Intl. Conf. on Digital Libraries, pages 205--214, San Antonio, USA, 2000.
[13]
D. Lee. Practical maintenance of evolving metadata for digital preservation: Algorithmic solution and system support. Intl. Journal on Digital Libraries, 6(4):313--326, 2007.
[14]
Y. Matsuo, J. Mori, and M. Hamasaki. POLYPHONET: An advanced social network extraction system from the web. In 15th World Wide Web Conf., pages 397--406, Edinburgh, Scotland, 2006.
[15]
B.-W. On, D. Lee, J. Kang, and P. Mitra. Comparative study of name disambiguation problem using a scalable blocking-based framework. In 5th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 344--353, Denver, USA, 2005.
[16]
M. Pasca. Acquisition of categorized named entities for web search. In 13th ACM Intl. Conf. on Information and Knowledge Management, pages 137--145, Washington, USA, 2004.
[17]
M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In 15th World Wide Web Conf., pages 377--386, Edinburgh, Scotland, 2006.
[18]
Y. Song, J. Huang, I. G. Councill, J. Li, and C. L. Giles. Efficient topic-based unsupervised name disambiguation. In 7th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 342--351, Vancouver, Canada, 2007.
[19]
Y. F. Tan, M.-Y. Kan, and D. Lee. Search engine driven author disambiguation. In 6th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 314--315, Chapel Hill, USA, 2006.
[20]
Google Scholar. http://scholar.google.com. Accessed in January, 2008.
[21]
VIAF: The virtual international authority file. http://www.oclc.org/research/projects/viaf/default.htm. Accessed in January, 2008.
[22]
J. W. Warner and E. W. Brown. Automated name authority control. In 1st ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 21--22, Roanoke, USA, 2001.

Cited By

View all
  • (2020)PVAF: an environment for disambiguation of scientific publication venuesInternational Journal on Digital Libraries10.1007/s00799-020-00289-121:4(407-421)Online publication date: 1-Dec-2020
  • (2014)Disambiguating publication venue titles using association rulesProceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries10.5555/2740769.2740783(77-85)Online publication date: 8-Sep-2014
  • (2014)Disambiguating publication venue titles using association rulesIEEE/ACM Joint Conference on Digital Libraries10.1109/JCDL.2014.6970153(77-86)Online publication date: Sep-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
June 2008
490 pages
ISBN:9781595939982
DOI:10.1145/1378889
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. authority file
  2. bibliographic citation
  3. canonical name
  4. publication venue

Qualifiers

  • Research-article

Conference

JCDL08
JCDL08: Joint Conference on Digital Libraries
June 16 - 20, 2008
PA, Pittsburgh PA, USA

Acceptance Rates

JCDL '08 Paper Acceptance Rate 33 of 117 submissions, 28%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)PVAF: an environment for disambiguation of scientific publication venuesInternational Journal on Digital Libraries10.1007/s00799-020-00289-121:4(407-421)Online publication date: 1-Dec-2020
  • (2014)Disambiguating publication venue titles using association rulesProceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries10.5555/2740769.2740783(77-85)Online publication date: 8-Sep-2014
  • (2014)Disambiguating publication venue titles using association rulesIEEE/ACM Joint Conference on Digital Libraries10.1109/JCDL.2014.6970153(77-86)Online publication date: Sep-2014
  • (2011)A generic Web-based entity resolution frameworkJournal of the American Society for Information Science and Technology10.1002/asi.2151862:5(919-932)Online publication date: 1-May-2011
  • (2009)Using web information for author name disambiguationProceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries10.1145/1555400.1555409(49-58)Online publication date: 15-Jun-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media