skip to main content
10.1145/1321440.1321499acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Clustering for unsupervised relation identification

Published: 06 November 2007 Publication History

Abstract

Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.

References

[1]
Chen, J., D. Ji, C. L. Tan, and Z. Niu. Unsupervised Feature Selection for Relation Extraction. in IJCNLP-05. 2005. Jeju Island, Korea.
[2]
Dash, M. and H. Liu. Feature Selection for Clustering. in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000.
[3]
Etzioni, O., M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates, Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 2005. 165(1): p. 91--134.
[4]
Feldman, R. and B. Rosenfeld. Self-Supervised Relation Extraction from the Web. in ISMIS-2006. Bari, Italy.
[5]
Halkidi, M., Y. Batistakis, and M. Vazirgiannis, On Clustering Validation Techniques. Journal of Intelligent Information Systems, 2001(17:2/3): p. 107--145.
[6]
Hasegawa, T., S. Sekine, and R. Grishman. Discovering Relations among Named Entities from Large Corpora. in ACL 2004.
[7]
Lewis, D. D., Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 2004. 5: p. 361--397.
[8]
Rosenfeld, B. and R. Feldman. High-Performance Unsupervised Relation Extraction from Large Corpora. in ICDM-06, IEEE International Conference on Data Mining. 2006. Hong Kong.
[9]
Rosenfeld, B. and R. Feldman. Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. in ACL-2007.
[10]
Shinyama, Y. and S. Sekine. Preemptive Information Extraction using Unrestricted Relation Discovery. in HLT-NAACL 2006.
[11]
Tjong, E. F., K. Sang, and S. Buchholz. Introduction to the CoNLL-2000 Shared Task: Chunking. in Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal.

Cited By

View all
  • (2022)Knowledge Mining: A Cross-disciplinary SurveyMachine Intelligence Research10.1007/s11633-022-1323-619:2(89-114)Online publication date: 10-Mar-2022
  • (2022)Information Extraction and Knowledge GraphsMachine Learning for Text10.1007/978-3-030-96623-2_13(419-463)Online publication date: 10-Feb-2022
  • (2021)Relation Representation Learning for Special Cargo Ontology2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660108(1-8)Online publication date: 5-Dec-2021
  • Show More Cited By

Index Terms

  1. Clustering for unsupervised relation identification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. information extraction
      3. relation learning
      4. unsupervised relation identification

      Qualifiers

      • Research-article

      Conference

      CIKM07

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Knowledge Mining: A Cross-disciplinary SurveyMachine Intelligence Research10.1007/s11633-022-1323-619:2(89-114)Online publication date: 10-Mar-2022
      • (2022)Information Extraction and Knowledge GraphsMachine Learning for Text10.1007/978-3-030-96623-2_13(419-463)Online publication date: 10-Feb-2022
      • (2021)Relation Representation Learning for Special Cargo Ontology2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660108(1-8)Online publication date: 5-Dec-2021
      • (2019)Unsupervised Approaches for Textual Semantic Annotation, A SurveyACM Computing Surveys10.1145/332447352:4(1-45)Online publication date: 30-Aug-2019
      • (2019)Attentive Dual Embedding for Understanding Medical Concepts in Electronic Health Records2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852429(1-8)Online publication date: Jul-2019
      • (2019)Dilated Convolutional Networks Incorporating Soft Entity Type Constraints for Distant Supervised Relation Extraction2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852286(1-7)Online publication date: Jul-2019
      • (2019)Author Disambiguation through Adversarial Network Representation Learning2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852233(1-8)Online publication date: Jul-2019
      • (2019)Event Relation Identification Based on Dependency and Co-occurrenceComputational Intelligence and Intelligent Systems10.1007/978-981-13-6473-0_26(292-305)Online publication date: 8-Feb-2019
      • (2018)Information ExtractionMachine Learning for Text10.1007/978-3-319-73531-3_12(381-411)Online publication date: 20-Mar-2018
      • (2017)Extracting microRNA-gene relations from biomedical literature using distant supervisionPLOS ONE10.1371/journal.pone.017192912:3(e0171929)Online publication date: 6-Mar-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media