research-article

Clustering for unsupervised relation identification

Authors:

Benjamin Rosenfeld,

Ronen FeldmanAuthors Info & Claims

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 411 - 418

https://doi.org/10.1145/1321440.1321499

Published: 06 November 2007 Publication History

Get Access

Abstract

Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.

References

[1]

Chen, J., D. Ji, C. L. Tan, and Z. Niu. Unsupervised Feature Selection for Relation Extraction. in IJCNLP-05. 2005. Jeju Island, Korea.

Google Scholar

[2]

Dash, M. and H. Liu. Feature Selection for Clustering. in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000.

Digital Library

Google Scholar

[3]

Etzioni, O., M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates, Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 2005. 165(1): p. 91--134.

Crossref

Google Scholar

[4]

Feldman, R. and B. Rosenfeld. Self-Supervised Relation Extraction from the Web. in ISMIS-2006. Bari, Italy.

Digital Library

Google Scholar

[5]

Halkidi, M., Y. Batistakis, and M. Vazirgiannis, On Clustering Validation Techniques. Journal of Intelligent Information Systems, 2001(17:2/3): p. 107--145.

Digital Library

Google Scholar

[6]

Hasegawa, T., S. Sekine, and R. Grishman. Discovering Relations among Named Entities from Large Corpora. in ACL 2004.

Digital Library

Google Scholar

[7]

Lewis, D. D., Y. Yang, T. Rose, and F. Li, RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 2004. 5: p. 361--397.

Digital Library

Google Scholar

[8]

Rosenfeld, B. and R. Feldman. High-Performance Unsupervised Relation Extraction from Large Corpora. in ICDM-06, IEEE International Conference on Data Mining. 2006. Hong Kong.

Digital Library

Google Scholar

[9]

Rosenfeld, B. and R. Feldman. Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. in ACL-2007.

Google Scholar

[10]

Shinyama, Y. and S. Sekine. Preemptive Information Extraction using Unrestricted Relation Discovery. in HLT-NAACL 2006.

Digital Library

Google Scholar

[11]

Tjong, E. F., K. Sang, and S. Buchholz. Introduction to the CoNLL-2000 Shared Task: Chunking. in Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal.

Digital Library

Google Scholar

Cited By

View all

Rui YCarmona VPourvali MXing YYi WRuan HZhang Y(2022)Knowledge Mining: A Cross-disciplinary SurveyMachine Intelligence Research10.1007/s11633-022-1323-619:2(89-114)Online publication date: 10-Mar-2022
https://doi.org/10.1007/s11633-022-1323-6
Aggarwal CAggarwal C(2022)Information Extraction and Knowledge GraphsMachine Learning for Text10.1007/978-3-030-96623-2_13(419-463)Online publication date: 10-Feb-2022
https://doi.org/10.1007/978-3-030-96623-2_13
Reshadat VAkcay AZervanou KZhang YDe Jong E(2021)Relation Representation Learning for Special Cargo Ontology2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660108(1-8)Online publication date: 5-Dec-2021
https://doi.org/10.1109/SSCI50451.2021.9660108
Show More Cited By

Index Terms

Clustering for unsupervised relation identification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering
Abstract
Fast centroid-based clustering algorithms such as k-means usually converge to a local optimum. In this work, we propose a method for constructing a better clustering from two such suboptimal clustering solutions based on the fact that each ...

Comments

Information & Contributors

Information

Published In

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 6 - 10, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
1,081
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Rui YCarmona VPourvali MXing YYi WRuan HZhang Y(2022)Knowledge Mining: A Cross-disciplinary SurveyMachine Intelligence Research10.1007/s11633-022-1323-619:2(89-114)Online publication date: 10-Mar-2022
https://doi.org/10.1007/s11633-022-1323-6
Aggarwal CAggarwal C(2022)Information Extraction and Knowledge GraphsMachine Learning for Text10.1007/978-3-030-96623-2_13(419-463)Online publication date: 10-Feb-2022
https://doi.org/10.1007/978-3-030-96623-2_13
Reshadat VAkcay AZervanou KZhang YDe Jong E(2021)Relation Representation Learning for Special Cargo Ontology2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660108(1-8)Online publication date: 5-Dec-2021
https://doi.org/10.1109/SSCI50451.2021.9660108
Liao XZhao Z(2019)Unsupervised Approaches for Textual Semantic Annotation, A SurveyACM Computing Surveys10.1145/332447352:4(1-45)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3324473
Peng XLong GPan SJiang JNiu Z(2019)Attentive Dual Embedding for Understanding Medical Concepts in Electronic Health Records2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852429(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852429
Peng MHu WTian GWang BWang HWang G(2019)Dilated Convolutional Networks Incorporating Soft Entity Type Constraints for Distant Supervised Relation Extraction2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852286(1-7)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852286
Peng LShen SLi DXu JFu YSu H(2019)Author Disambiguation through Adversarial Network Representation Learning2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852233(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852233
Yang JLiu ZLiu W(2019)Event Relation Identification Based on Dependency and Co-occurrenceComputational Intelligence and Intelligent Systems10.1007/978-981-13-6473-0_26(292-305)Online publication date: 8-Feb-2019
https://doi.org/10.1007/978-981-13-6473-0_26
Aggarwal CAggarwal C(2018)Information ExtractionMachine Learning for Text10.1007/978-3-319-73531-3_12(381-411)Online publication date: 20-Mar-2018
https://doi.org/10.1007/978-3-319-73531-3_12
Lamurias AClarke LCouto F(2017)Extracting microRNA-gene relations from biomedical literature using distant supervisionPLOS ONE10.1371/journal.pone.017192912:3(e0171929)Online publication date: 6-Mar-2017
https://doi.org/10.1371/journal.pone.0171929
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Ant clustering algorithm with K-harmonic means clustering

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations