skip to main content
10.1145/2339530.2339751acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale learning of word relatedness with constraints

Published:12 August 2012Publication History

ABSTRACT

Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. We learn for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.

Skip Supplemental Material Section

Supplemental Material

306_w_talk_7.mp4

mp4

344.4 MB

References

  1. M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. Y. Bengio and J.-S. Senécal. Quick training of probabilistic neural nets by sampling. In Proc. 9th International Workshop on Artificial Intelligence and Statistics (AISTATS'03), 2003.Google ScholarGoogle Scholar
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, LNAI 3176, pages 146--168. Springer Verlag, 2004.Google ScholarGoogle Scholar
  5. A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1):13--47, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Dagan, L. Lee, and F. C. N. Pereira. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1--3):43--69, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. Fieller, H. Hartley, and E. Pearson. Tests for rank correlation coefficients. Biometrika, 44:470--481, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. ACM TOIS, 20(1):116--131, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34:443--498, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2009.Google ScholarGoogle Scholar
  13. R. Hoffmann, C. Zhang, and D. S. Weld. Learning 5000 relational extractors. In ACL, pages 286--295, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Kunze. Computerlinguistik und sprachtechnologie. In Lexikalisch-semantische Wortnetze, pages 423--431. Spektrum Akademischer Verlag, 2004.Google ScholarGoogle Scholar
  15. L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the ACL, pages 25--32, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google ScholarGoogle Scholar
  17. K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Robbins and S. Monro. A stochastic approximation method. Annals of Math. Statistics, 22:400--407, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Roget. Roget's Thesaurus of English Words and Phrases. Longman Group Ltd., 1852.Google ScholarGoogle Scholar
  20. G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast -- but is it good? Evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. C. Spall. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Sun, R. Li, D. Luo, and X. Wu. Text segmentation with LDA-based fisher kernel. In ACL-HLT Short Papers, pages 269--272, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Tan, E. Gabrilovich, and B. Pang. To each his own: Personalized content selection based on text comprehensibility. In WSDM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Tversky. Features of similarity. Psychological Review, 84(4):327--352, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. K. M. Wong, W. Ziarko, and P. C. N. Wong. Generalized vector spaces model in information retrieval. In SIGIR, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Yeh, D. Ramage, C. D. Manning, E. Agirre, and A. Soroa. Wikiwalk: Random walks on wikipedia for semantic relatedness. In 2009 TextGraphs-4 Workshop, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Zesch and I. Gurevych. Wisdom of crowds versus wisdom of linguists? measuring the semantic relatedness of words. Natural Language Engineering, 16(1):25--59, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Zesch, C. Mueller, and I. Gurevych. Using Wiktionary for computing semantic relatedness. In AAAI, pages 861--866, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large-scale learning of word relatedness with constraints

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader