Article

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Author:

Marius PaşcaAuthors Info & Claims

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 101 - 110

https://doi.org/10.1145/1242572.1242587

Published: 08 May 2007 Publication History

Abstract

As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly supervised extraction of class attributes (e.g., side effects and generic equivalent for drugs) from anonymized query logs. The extraction is guided by a small set of seed attributes, without any need for handcrafted extraction patterns or further domain-specific knowledge. The attributes of classes pertaining to various domains of interest to Web search users have accuracy levels significantly exceeding current state of the art. Inherently noisy search queries are shown to be a highly valuable, albeit unexplored, resource for Web-based information extraction, in particular for the task of class attribute extraction.

References

[1]

A. Budanitsky and G. Hirst. Evaluating WordNet-based measures of semantic distance. Computational Linguistics, 2006.

Digital Library

[2]

M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005.

Digital Library

[3]

T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005.

Digital Library

[4]

H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query qxpansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002.

Digital Library

[5]

S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-02)}, pages 207--214, Tampere, Finland, 2002.

Digital Library

[6]

L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999.

Digital Library

[7]

M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006.

Digital Library

[8]

X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-02), pages 556--562, Taipei, Taiwan, 2002.

Digital Library

[9]

D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-98), pages 768--774, Montreal, Quebec, 1998.

Digital Library

[10]

L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.

[11]

R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005.

Digital Library

[12]

M. Paşca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the World Wide Web of facts -- step one: the one-million fact extraction challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1400--1405, Boston, Massachusetts, 2006.

Digital Library

[13]

M. Paşca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2832--2837, Hyderabad, India, 2007.

Digital Library

[14]

P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 113--120, Sydney, Australia, 2006.

Digital Library

[15]

P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.

[16]

M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.

[17]

L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006.

Digital Library

[18]

Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the 2006 Human Language Technology Conference (HLT-NAACL-06), pages 204--311, New York, New York, 2006.

Digital Library

[19]

M. Strube and S. Ponzetto. Wikirelate! computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1419--1424, Boston, Massachusetts, 2006.

Digital Library

[20]

K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island, Korea, 2005.

Digital Library

[21]

E. Voorhees. Evaluating answers to definition questions. In Proceedings of the 2003 Human Language Technology Conference (HLT-NAACL-03), pages 109--111, Edmonton, Canada, 2003.

Digital Library

[22]

Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), Arlington, Virginia, 2006.

Digital Library

Cited By

Murdock VLee CHersh W(2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_9
Wu TQi GLuo BZhang LWang H(2022)Language-Independent Type Inference of the Instances from Multilingual WikipediaResearch Anthology on Bilingual and Multilingual Education10.4018/978-1-6684-3690-5.ch030(580-606)Online publication date: 2022
https://doi.org/10.4018/978-1-6684-3690-5.ch030
Han YHan WLi SWang Z(2020)Attribute Value Extraction Based on Rule MatchingArtificial Intelligence and Security10.1007/978-981-15-8101-4_10(92-104)Online publication date: 13-Sep-2020
https://doi.org/10.1007/978-981-15-8101-4_10
Show More Cited By

Index Terms

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Weakly-supervised discovery of named entities using web search queries
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted ...
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

A weakly-supervised extraction method identifies concepts within conceptual hierarchies, at the appropriate level of specificity (e.g., Bank vs. Institution), to which attributes (e.g., routing number) extracted from unstructured text best apply. The ...
Extraction of open-domain class attributes from text: building blocks for faceted search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., "top speed", "body style" and "number of cylinders" for the class of "sports cars") ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '07: Proceedings of the 16th international conference on World Wide Web

May 2007

1382 pages

ISBN:9781595936547

DOI:10.1145/1242572

General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

WWW'07

Sponsor:

ACM

WWW'07: 16th International World Wide Web Conference

May 8 - 12, 2007

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

80
Total Citations
View Citations
1,076
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Murdock VLee CHersh W(2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_9
Wu TQi GLuo BZhang LWang H(2022)Language-Independent Type Inference of the Instances from Multilingual WikipediaResearch Anthology on Bilingual and Multilingual Education10.4018/978-1-6684-3690-5.ch030(580-606)Online publication date: 2022
https://doi.org/10.4018/978-1-6684-3690-5.ch030
Han YHan WLi SWang Z(2020)Attribute Value Extraction Based on Rule MatchingArtificial Intelligence and Security10.1007/978-981-15-8101-4_10(92-104)Online publication date: 13-Sep-2020
https://doi.org/10.1007/978-981-15-8101-4_10
Wu TQi GLuo BZhang LWang H(2019)Language-Independent Type Inference of the Instances from Multilingual WikipediaInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.201904010215:2(22-46)Online publication date: 1-Apr-2019
https://doi.org/10.4018/IJSWIS.2019040102
Balog KBalog K(2018)Populating Knowledge BasesEntity-Oriented Search10.1007/978-3-319-93935-3_6(189-222)Online publication date: 3-Oct-2018
https://doi.org/10.1007/978-3-319-93935-3_6
Monea A(2016)The Graphing of DifferenceCultural Studies ↔ Critical Methodologies10.1177/153270861665576316:5(452-461)Online publication date: 25-Jul-2016
https://doi.org/10.1177/1532708616655763
He YChakrabarti KCheng TTylenda TBourdeau JHendler JNkambou RHorrocks IZhao B(2016)Automatic Discovery of Attribute Synonyms Using Query Logs and Table CorporaProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2874816(1429-1439)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.1145/2872427.2874816
Nugent TLeidner J(2016)Risk Mining: Company-Risk Identification from Unstructured Sources2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2016.0191(1308-1311)Online publication date: Dec-2016
https://doi.org/10.1109/ICDMW.2016.0191
Huang CLu Z(2016)Discovering biomedical semantic relations in PubMed queries for information retrieval and database curationDatabase10.1093/database/baw0252016(baw025)Online publication date: 25-Mar-2016
https://doi.org/10.1093/database/baw025
Baeza-Yates RSaez-Trumper DYesilada YFarzan RHouben G(2015)Wisdom of the Crowd or Wisdom of a Few?Proceedings of the 26th ACM Conference on Hypertext & Social Media10.1145/2700171.2791056(69-74)Online publication date: 24-Aug-2015
https://dl.acm.org/doi/10.1145/2700171.2791056
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten