ABSTRACT
Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting alignment problem. This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.
- James Allan, Rahul Gupta, and Vikas Khandelwal . 2001. Temporal summaries of new topics. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 10--18. Google ScholarDigital Library
- Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski . 2015. Rand-walk: A latent variable model approach to word embeddings. arXiv preprint arXiv:1502.03520 (2015).Google Scholar
- Marco Baroni, Georgiana Dinu, and Germán Kruszewski . 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.. In ACL (1). 238--247.Google Scholar
- Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro . 2014. Analysing word meaning over time by exploiting temporal random indexing First Italian Conference on Computational Linguistics CLiC-it.Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin . 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research Vol. 3 (2003), 1137--1155. Google ScholarDigital Library
- David M Blei and John D Lafferty . 2006. Dynamic topic models Proceedings of the 23rd international conference on Machine learning. ACM, 113--120. Google ScholarDigital Library
- Hyunyoung Choi and Hal Varian . 2012. Predicting the present with Google Trends. Economic Record, Vol. 88, s1 (2012), 2--9.Google ScholarCross Ref
- Ronan Collobert and Jason Weston . 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167. Google ScholarDigital Library
- Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman . 1990. Indexing by latent semantic analysis. Journal of the American society for information science, Vol. 41, 6 (1990), 391.Google ScholarCross Ref
- John R Firth . 1957. $$A synopsis of linguistic theory, 1930--1955$$. (1957).Google Scholar
- Kristina Gulordava and Marco Baroni . 2011. A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. Association for Computational Linguistics, 67--71. Google ScholarDigital Library
- William L Hamilton, Jure Leskovec, and Dan Jurafsky . 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. arXiv preprint arXiv:1605.09096 (2016).Google Scholar
- Gerhard Heyer, Florian Holz, and Sven Teresniak . 2009. Change of Topics over Time-Tracking Topics by their Change of Meaning. KDIR Vol. 9 (2009), 223--228.Google Scholar
- Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, and Slav Petrov . 2014. Temporal analysis of language through neural language models. arXiv preprint arXiv:1405.3515 (2014).Google Scholar
- Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena . 2015. Statistically significant detection of linguistic change Proceedings of the 24th International Conference on World Wide Web. ACM, 625--635. Google ScholarDigital Library
- Matt J Kusner, Yu Sun, Nicholas I Kolkin, Kilian Q Weinberger, and others . 2015. From Word Embeddings To Document Distances.. In ICML, Vol. Vol. 15. 957--966. Google ScholarDigital Library
- Omer Levy and Yoav Goldberg . 2014. Neural word embedding as implicit matrix factorization Advances in neural information processing systems. 2177--2185. Google ScholarDigital Library
- Omer Levy, Yoav Goldberg, and Ido Dagan . 2015. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics Vol. 3 (2015), 211--225.Google ScholarCross Ref
- Xuanyi Liao and Guang Cheng . 2016. Analysing the Semantic Change Based on Word Embedding International Conference on Computer Processing of Oriental Languages. Springer, 213--223.Google Scholar
- Kevin Lund and Curt Burgess . 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, Vol. 28, 2 (1996), 203--208.Google ScholarCross Ref
- Guy Merchant . 2001. Teenagers in cyberspace: An investigation of language use and language change in internet chatrooms. Journal of Research in Reading Vol. 24, 3 (2001), 293--306.Google ScholarCross Ref
- Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K Gray, Joseph P Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, and others . 2011. Quantitative analysis of culture using millions of digitized books. science, Vol. 331, 6014 (2011), 176--182.Google Scholar
- Rada Mihalcea and Vivi Nastase . 2012. Word epoch disambiguation: Finding how words change over time Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 259--263. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013 a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013 b. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119. Google ScholarDigital Library
- Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh Mukherjee, and Pawan Goyal . 2014. That's sick dude!: Automatic identification of word sense change across different timescales. arXiv preprint arXiv:1405.4392 (2014).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D Manning . 2014. Glove: Global Vectors for Word Representation.. In EMNLP, Vol. Vol. 14. 1532--1543.Google ScholarCross Ref
- James Petterson, Wray Buntine, Shravan M Narayanamurthy, Tibério S Caetano, and Alex J Smola . 2010. Word features for latent dirichlet allocation. In Advances in Neural Information Processing Systems. 1921--1929. Google ScholarDigital Library
- Michael JD Powell . 1973. On search directions for minimization algorithms. Mathematical Programming Vol. 4, 1 (1973), 193--201.Google ScholarCross Ref
- Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon . 2015. Collaborative filtering with graph information: Consistency and scalable methods Advances in neural information processing systems. 2107--2115. Google ScholarDigital Library
- Eyal Sagi, Stefan Kaufmann, and Brady Clark . 2011. Tracing semantic change with latent semantic analysis. Current methods in historical semantics (2011), 161--183.Google Scholar
- Diane J Schiano, Coreena P Chen, Ellen Isaacs, Jeremy Ginsberg, Unnur Gretarsdottir, and Megan Huddleston . 2002. Teen use of messaging media. In CHI'02 extended abstracts on Human factors in computing systems. ACM, 594--595. Google ScholarDigital Library
- Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy, and Thorsten Joachims . 2012. Temporal corpus summarization using submodular word coverage Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 754--763. Google ScholarDigital Library
- Sali A Tagliamonte and Derek Denis . 2008. Linguistic ruin? LOL! Instant messaging and teen language. American speech, Vol. 83, 1 (2008), 3--34.Google Scholar
- Xuri Tang, Weiguang Qu, and Xiaohe Chen . 2016. Semantic change computation: A successive approach. World Wide Web, Vol. 19, 3 (2016), 375--415. Google ScholarDigital Library
- Xuerui Wang and Andrew McCallum . 2006. Topics over time: a non-Markov continuous-time model of topical trends Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 424--433. Google ScholarDigital Library
- Derry Tanti Wijaya and Reyyan Yeniterzi . 2011. Understanding semantic change of words over centuries Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural diversiTy on the social web. ACM, 35--40. Google ScholarDigital Library
- Stephen J Wright . 2015. Coordinate descent algorithms. Mathematical Programming Vol. 151, 1 (2015), 3--34. Google ScholarDigital Library
- Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon . 2012. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems 12th IEEE International Conference on Data Mining (ICDM). IEEE, 765--774. Google ScholarDigital Library
- Yating Zhang, Adam Jatowt, Sourav S Bhowmick, and Katsumi Tanaka . 2016. The Past is Not a Foreign Country: Detecting Semantically Similar Terms across Time. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 10 (2016), 2793--2807. Google ScholarDigital Library
Index Terms
- Dynamic Word Embeddings for Evolving Semantic Discovery
Recommendations
Jointly learning bilingual word embeddings and alignments
AbstractLearning bilingual word embeddings can be much easier if the parallel corpora are available with their words well aligned explicitly. However, in most cases, the parallel corpora only provide a set of pairs that are semantically equivalent to each ...
Composing Word Embeddings for Compound Words Using Linguistic Knowledge
In recent years, the use of distributed representations has been a fundamental technology for natural language processing. However, Japanese has multiple compound words, and often we must compare the meanings of a word and a compound word. Moreover, word ...
Exploring Implicit Semantic Constraints for Bilingual Word Embeddings
Bilingual word embeddings (BWEs) have proven to be useful in many cross-lingual natural language processing tasks. Previous studies often require bilingual texts or dictionaries that are scarce resources. As a result, in these studies, the exploited ...
Comments