ABSTRACT
A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into the contents of a large document collection. The models are demonstrated by analyzing the OCR'ed archives of the journal Science from 1880 through 2000.
- Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B, 44(2):139--177.]]Google Scholar
- Blei, D., Ng, A., and Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022.]] Google ScholarDigital Library
- Blei, D. M. and Lafferty, J. D. (2006). Correlated topic models. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA.]]Google Scholar
- Buntine, W. and Jakulin, A. (2004). Applying discrete PCA in data analysis. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages 59--66. AUAI Press.]] Google ScholarDigital Library
- Erosheva, E. (2002). Grade of membership and latent structure models with application to disability survey data. PhD thesis, Carnegie Mellon University, Department of Statistics.]]Google Scholar
- Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Vision and Pattern Recognition.]]Google ScholarDigital Library
- Griffiths, T. and Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235.]]Google ScholarCross Ref
- Kalman, R. (1960). A new approach to linear filtering and prediction problems. Transaction of the AMSE: Journal of Basic Engineering, 82:35--45.]]Google ScholarCross Ref
- McCallum, A., Corrada-Emmanuel, A., and Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Technical report, University of Massachusetts, Amherst.]]Google Scholar
- Pritchard, J., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155:945--959.]]Google ScholarCross Ref
- Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages 487--494. AUAI Press.]] Google ScholarDigital Library
- Sivic, J., Rusell, B., Efros, A., Zisserman, A., and Freeman, W. (2005). Discovering objects and their location in images. In International Conference on Computer Vision (ICCV 2005).]] Google ScholarDigital Library
- Snelson, E. and Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18, Cambridge, MA. MIT Press.]]Google Scholar
- Wasserman, L. (2006). All of Nonparametric Statistics. Springer.]] Google ScholarDigital Library
- West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models. Springer.]] Google ScholarDigital Library
Index Terms
- Dynamic topic models
Recommendations
Scaling up Dynamic Topic Models
WWW '16: Proceedings of the 25th International Conference on World Wide WebDynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of ...
Topic Models with Topic Ordering Regularities for Topic Segmentation
ICDM '14: Proceedings of the 2014 IEEE International Conference on Data MiningDocuments from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these ...
Probabilistic topic models
KDD '11 Tutorials: Proceedings of the 17th ACM SIGKDD International Conference TutorialsProbabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. ...
Comments