ABSTRACT
Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. We learn for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.
Supplemental Material
- M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 2009.Google ScholarCross Ref
- Y. Bengio and J.-S. Senécal. Quick training of probabilistic neural nets by sampling. In Proc. 9th International Workshop on Artificial Intelligence and Statistics (AISTATS'03), 2003.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, LNAI 3176, pages 146--168. Springer Verlag, 2004.Google Scholar
- A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1):13--47, 2006. Google ScholarDigital Library
- I. Dagan, L. Lee, and F. C. N. Pereira. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1--3):43--69, 1999. Google ScholarDigital Library
- S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
- C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarCross Ref
- E. Fieller, H. Hartley, and E. Pearson. Tests for rank correlation coefficients. Biometrika, 44:470--481, 1957.Google ScholarCross Ref
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. ACM TOIS, 20(1):116--131, January 2002. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34:443--498, 2009. Google ScholarCross Ref
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2009.Google Scholar
- R. Hoffmann, C. Zhang, and D. S. Weld. Learning 5000 relational extractors. In ACL, pages 286--295, 2010. Google ScholarDigital Library
- C. Kunze. Computerlinguistik und sprachtechnologie. In Lexikalisch-semantische Wortnetze, pages 423--431. Spektrum Akademischer Verlag, 2004.Google Scholar
- L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the ACL, pages 25--32, 1999. Google ScholarDigital Library
- A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
- K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW, 2011. Google ScholarDigital Library
- H. Robbins and S. Monro. A stochastic approximation method. Annals of Math. Statistics, 22:400--407, 1951.Google ScholarCross Ref
- P. Roget. Roget's Thesaurus of English Words and Phrases. Longman Group Ltd., 1852.Google Scholar
- G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, 1971. Google ScholarDigital Library
- R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast -- but is it good? Evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarDigital Library
- J. C. Spall. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., 2003. Google ScholarDigital Library
- Q. Sun, R. Li, D. Luo, and X. Wu. Text segmentation with LDA-based fisher kernel. In ACL-HLT Short Papers, pages 269--272, 2008. Google ScholarDigital Library
- C. Tan, E. Gabrilovich, and B. Pang. To each his own: Personalized content selection based on text comprehensibility. In WSDM, 2012. Google ScholarDigital Library
- A. Tversky. Features of similarity. Psychological Review, 84(4):327--352, 1977.Google ScholarCross Ref
- S. K. M. Wong, W. Ziarko, and P. C. N. Wong. Generalized vector spaces model in information retrieval. In SIGIR, 1985. Google ScholarDigital Library
- E. Yeh, D. Ramage, C. D. Manning, E. Agirre, and A. Soroa. Wikiwalk: Random walks on wikipedia for semantic relatedness. In 2009 TextGraphs-4 Workshop, 2009. Google ScholarDigital Library
- T. Zesch and I. Gurevych. Wisdom of crowds versus wisdom of linguists? measuring the semantic relatedness of words. Natural Language Engineering, 16(1):25--59, 2010. Google ScholarDigital Library
- T. Zesch, C. Mueller, and I. Gurevych. Using Wiktionary for computing semantic relatedness. In AAAI, pages 861--866, 2008. Google ScholarDigital Library
Index Terms
- Large-scale learning of word relatedness with constraints
Recommendations
A word at a time: computing word relatedness using temporal semantic analysis
WWW '11: Proceedings of the 20th international conference on World wide webComputing the degree of semantic relatedness of words is a key functionality of many language applications such as search, clustering, and disambiguation. Previous approaches to computing semantic relatedness mostly used static language resources, while ...
Efficient Computation of Co-occurrence Based Word Relatedness
DocEng '15: Proceedings of the 2015 ACM Symposium on Document EngineeringMeasuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness ...
Hindi Word Sense Disambiguation Using Semantic Relatedness Measure
MIWAI 2013: Proceedings of the 7th International Workshop on Multi-disciplinary Trends in Artificial Intelligence - Volume 8271In this paper we propose and evaluate a method of Hindi word sense disambiguation that computes similarity based on the semantics. We adapt an existing measure for semantic relatedness between two lexically expressed concepts of Hindi WordNet. This ...
Comments