skip to main content
10.1145/1835449.1835681acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
tutorial

Extraction of open-domain class attributes from text: building blocks for faceted search

Published: 19 July 2010 Publication History

Abstract

Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., "top speed", "body style" and "number of cylinders" for the class of "sports cars") from text is a particularly appealing task and has received much attention recently, given its natural fit as a building block towards the far-reaching goal of constructing knowledge bases from text. This tutorial provides an overview of extraction methods developed in the area of Web-based information extraction, with the purpose of acquiring attributes of open-domain classes. The attributes are extracted for classes organized either as a flat set or hierarchically. The extraction methods operate over unstructured or semi-structured text available within collections of Web documents, or over relatively more intriguing data sources consisting of anonymized search queries. The methods take advantage of weak supervision provided in the form of seed examples or small amounts of annotated data, or draw upon knowledge already encoded within human-compiled resources (e.g., Wikipedia). The more ambitious methods, aiming at acquiring as many accurate attributes from text as possible for hundreds or thousands of classes covering a wide range of domains of interest, need to be designed to scale to Web collections. This restriction has significant consequences on the overall complexity and choice of underlying tools, in order for the extracted attributes to ultimately aid information retrieval in general and Web search in particular, by producing relevant attributes for open-domain classes, along with other types of relations among instances or among classes.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Check for updates

Author Tags

  1. attribute extraction
  2. class attributes
  3. information extraction
  4. knowledge acquisition

Qualifiers

  • Tutorial

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 326
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media