skip to main content
10.1145/1242572.1242587acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Published: 08 May 2007 Publication History

Abstract

As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly supervised extraction of class attributes (e.g., side effects and generic equivalent for drugs) from anonymized query logs. The extraction is guided by a small set of seed attributes, without any need for handcrafted extraction patterns or further domain-specific knowledge. The attributes of classes pertaining to various domains of interest to Web search users have accuracy levels significantly exceeding current state of the art. Inherently noisy search queries are shown to be a highly valuable, albeit unexplored, resource for Web-based information extraction, in particular for the task of class attribute extraction.

References

[1]
A. Budanitsky and G. Hirst. Evaluating WordNet-based measures of semantic distance. Computational Linguistics, 2006.
[2]
M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005.
[3]
T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005.
[4]
H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query qxpansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002.
[5]
S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-02)}, pages 207--214, Tampere, Finland, 2002.
[6]
L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999.
[7]
M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006.
[8]
X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-02), pages 556--562, Taipei, Taiwan, 2002.
[9]
D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-98), pages 768--774, Montreal, Quebec, 1998.
[10]
L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.
[11]
R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005.
[12]
M. Paşca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the World Wide Web of facts -- step one: the one-million fact extraction challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1400--1405, Boston, Massachusetts, 2006.
[13]
M. Paşca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2832--2837, Hyderabad, India, 2007.
[14]
P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 113--120, Sydney, Australia, 2006.
[15]
P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.
[16]
M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.
[17]
L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006.
[18]
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the 2006 Human Language Technology Conference (HLT-NAACL-06), pages 204--311, New York, New York, 2006.
[19]
M. Strube and S. Ponzetto. Wikirelate! computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1419--1424, Boston, Massachusetts, 2006.
[20]
K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island, Korea, 2005.
[21]
E. Voorhees. Evaluating answers to definition questions. In Proceedings of the 2003 Human Language Technology Conference (HLT-NAACL-03), pages 109--111, Edmonton, Canada, 2003.
[22]
Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), Arlington, Virginia, 2006.

Cited By

View all
  • (2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
  • (2022)Language-Independent Type Inference of the Instances from Multilingual WikipediaResearch Anthology on Bilingual and Multilingual Education10.4018/978-1-6684-3690-5.ch030(580-606)Online publication date: 2022
  • (2020)Attribute Value Extraction Based on Rule MatchingArtificial Intelligence and Security10.1007/978-981-15-8101-4_10(92-104)Online publication date: 13-Sep-2020
  • Show More Cited By

Index Terms

  1. Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '07: Proceedings of the 16th international conference on World Wide Web
        May 2007
        1382 pages
        ISBN:9781595936547
        DOI:10.1145/1242572
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 May 2007

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. class attributes
        2. fact extraction
        3. knowledge acquisition
        4. named entities
        5. unstructured text
        6. web search queries

        Qualifiers

        • Article

        Conference

        WWW'07
        Sponsor:
        WWW'07: 16th International World Wide Web Conference
        May 8 - 12, 2007
        Alberta, Banff, Canada

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 02 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
        • (2022)Language-Independent Type Inference of the Instances from Multilingual WikipediaResearch Anthology on Bilingual and Multilingual Education10.4018/978-1-6684-3690-5.ch030(580-606)Online publication date: 2022
        • (2020)Attribute Value Extraction Based on Rule MatchingArtificial Intelligence and Security10.1007/978-981-15-8101-4_10(92-104)Online publication date: 13-Sep-2020
        • (2019)Language-Independent Type Inference of the Instances from Multilingual WikipediaInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.201904010215:2(22-46)Online publication date: 1-Apr-2019
        • (2018)Populating Knowledge BasesEntity-Oriented Search10.1007/978-3-319-93935-3_6(189-222)Online publication date: 3-Oct-2018
        • (2016)The Graphing of DifferenceCultural Studies ↔ Critical Methodologies10.1177/153270861665576316:5(452-461)Online publication date: 25-Jul-2016
        • (2016)Automatic Discovery of Attribute Synonyms Using Query Logs and Table CorporaProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2874816(1429-1439)Online publication date: 11-Apr-2016
        • (2016)Risk Mining: Company-Risk Identification from Unstructured Sources2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2016.0191(1308-1311)Online publication date: Dec-2016
        • (2016)Discovering biomedical semantic relations in PubMed queries for information retrieval and database curationDatabase10.1093/database/baw0252016(baw025)Online publication date: 25-Mar-2016
        • (2015)Wisdom of the Crowd or Wisdom of a Few?Proceedings of the 26th ACM Conference on Hypertext & Social Media10.1145/2700171.2791056(69-74)Online publication date: 24-Aug-2015
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media