research-article

Large-scale learning of word relatedness with constraints

Authors:
Guy Halawi

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

,
Gideon Dror

Yahoo! Research, Haifa, Israel

Yahoo! Research, Haifa, Israel
View Profile

,
Evgeniy Gabrilovich

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

,
Yehuda Koren

Yahoo! Research, Haifa, Israel

Yahoo! Research, Haifa, Israel
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 1406–1414https://doi.org/10.1145/2339530.2339751

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1406–1414

ABSTRACT

Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. We learn for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.

Supplemental Material

306_w_talk_7.mp4

mp4

344.4 MB

Download

References

M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 2009.Google ScholarCross Ref
Y. Bengio and J.-S. Senécal. Quick training of probabilistic neural nets by sampling. In Proc. 9th International Workshop on Artificial Intelligence and Statistics (AISTATS'03), 2003.Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, LNAI 3176, pages 146--168. Springer Verlag, 2004.Google Scholar
A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1):13--47, 2006. Google ScholarDigital Library
I. Dagan, L. Lee, and F. C. N. Pereira. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1--3):43--69, 1999. Google ScholarDigital Library
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarCross Ref
E. Fieller, H. Hartley, and E. Pearson. Tests for rank correlation coefficients. Biometrika, 44:470--481, 1957.Google ScholarCross Ref
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. ACM TOIS, 20(1):116--131, January 2002. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34:443--498, 2009. Google ScholarCross Ref
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2009.Google Scholar
R. Hoffmann, C. Zhang, and D. S. Weld. Learning 5000 relational extractors. In ACL, pages 286--295, 2010. Google ScholarDigital Library
C. Kunze. Computerlinguistik und sprachtechnologie. In Lexikalisch-semantische Wortnetze, pages 423--431. Spektrum Akademischer Verlag, 2004.Google Scholar
L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the ACL, pages 25--32, 1999. Google ScholarDigital Library
A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW, 2011. Google ScholarDigital Library
H. Robbins and S. Monro. A stochastic approximation method. Annals of Math. Statistics, 22:400--407, 1951.Google ScholarCross Ref
P. Roget. Roget's Thesaurus of English Words and Phrases. Longman Group Ltd., 1852.Google Scholar
G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, 1971. Google ScholarDigital Library
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast -- but is it good? Evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarDigital Library
J. C. Spall. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., 2003. Google ScholarDigital Library
Q. Sun, R. Li, D. Luo, and X. Wu. Text segmentation with LDA-based fisher kernel. In ACL-HLT Short Papers, pages 269--272, 2008. Google ScholarDigital Library
C. Tan, E. Gabrilovich, and B. Pang. To each his own: Personalized content selection based on text comprehensibility. In WSDM, 2012. Google ScholarDigital Library
A. Tversky. Features of similarity. Psychological Review, 84(4):327--352, 1977.Google ScholarCross Ref
S. K. M. Wong, W. Ziarko, and P. C. N. Wong. Generalized vector spaces model in information retrieval. In SIGIR, 1985. Google ScholarDigital Library
E. Yeh, D. Ramage, C. D. Manning, E. Agirre, and A. Soroa. Wikiwalk: Random walks on wikipedia for semantic relatedness. In 2009 TextGraphs-4 Workshop, 2009. Google ScholarDigital Library
T. Zesch and I. Gurevych. Wisdom of crowds versus wisdom of linguists? measuring the semantic relatedness of words. Natural Language Engineering, 16(1):25--59, 2010. Google ScholarDigital Library
T. Zesch, C. Mueller, and I. Gurevych. Using Wiktionary for computing semantic relatedness. In AAAI, pages 861--866, 2008. Google ScholarDigital Library

Index Terms

Large-scale learning of word relatedness with constraints
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A word at a time: computing word relatedness using temporal semantic analysis
WWW '11: Proceedings of the 20th international conference on World wide web

Computing the degree of semantic relatedness of words is a key functionality of many language applications such as search, clustering, and disambiguation. Previous approaches to computing semantic relatedness mostly used static language resources, while ...
Read More
Efficient Computation of Co-occurrence Based Word Relatedness
DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering

Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness ...
Read More
Hindi Word Sense Disambiguation Using Semantic Relatedness Measure
MIWAI 2013: Proceedings of the 7th International Workshop on Multi-disciplinary Trends in Artificial Intelligence - Volume 8271

In this paper we propose and evaluate a method of Hindi word sense disambiguation that computes similarity based on the semantics. We adapt an existing measure for semantic relatedness between two lexically expressed concepts of Hindi WordNet. This ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
semantic similarity
word relatedness
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 924
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale learning of word relatedness with constraints

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A word at a time: computing word relatedness using temporal semantic analysis

Efficient Computation of Co-occurrence Based Word Relatedness

Hindi Word Sense Disambiguation Using Semantic Relatedness Measure