research-article

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

Authors:
James Foulds

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA
View Profile

,
Levi Boyles

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA
View Profile

,
Christopher DuBois

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA
View Profile

,
Padhraic Smyth

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA
View Profile

,
Max Welling

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 446–454https://doi.org/10.1145/2487575.2487697

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 446–454

ABSTRACT

There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.

References

A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence, 2009. Google ScholarDigital Library
D. C. Atkins, T. N. Rubin, M. Steyvers, M. A. Doeden, B. R. Baucom, and A. Christensen. Topic models: A novel method for modeling couple and family text data. Journal of Family Psychology, 6:816--827, 2012.Google ScholarCross Ref
A. Banerjee and S. Basu. Topic models over text streams: A study of batch and online unsupervised learning. In SIAM Data Mining, 2007.Google Scholar
J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.Google Scholar
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, 2009.Google Scholar
O. Cappé and E. Moulines. On-line expectation--maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):593--613, 2009.Google Scholar
B. Carpenter. Integrating out multinomial parameters in latent Dirichlet allocation and naive bayes for collapsed Gibbs sampling. Technical report, LingPipe, 2010.Google Scholar
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228, 2004.Google ScholarCross Ref
M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. arXiv preprint arXiv:1206.7051, 2012.Google Scholar
M. D. Hoffman, D. M. Blei, and F. Bach. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 23:856--864, 2010.Google ScholarDigital Library
D. Mimno. Computational historiography: Data mining in a century of classics journals. Journal on Computing and Cultural Heritage (JOCCH), 5(1):3, 2012. Google ScholarDigital Library
D. Mimno. Reconstructing pompeian households. Uncertainty in Artificial Intelligence, 2012.Google Scholar
D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent Dirichlet allocation. In Proceedings of the International Conference on Machine Learning, 2012.Google Scholar
T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, UK, 2004.Google Scholar
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801--1828, 2009. Google ScholarDigital Library
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 569--577, 2008. Google ScholarDigital Library
I. Sato and H. Nakagawa. Rethinking collapsed variational Bayes inference for LDA. Proceedings of the International Conference on Machine Learning, 2012.Google Scholar
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1--2):703--710, 2010. Google ScholarDigital Library
Y. Teh, D. Newman, and M. Welling. A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 19:1353, 2007.Google Scholar
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009. Google ScholarDigital Library

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
1. Computing methodologies

Recommendations

Stochastic variational inference

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet ...
Read More
A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation
NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing Systems

Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision. Due to the large scale nature of these applications, current inference procedures like ...
Read More
Practical collapsed variational bayes inference for hierarchical dirichlet process
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

We propose a novel collapsed variational Bayes (CVB) inference for the hierarchical Dirichlet process (HDP). While the existing CVB inference for the HDP variant of latent Dirichlet allocation (LDA) is more complicated and harder to implement than that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
stochastic learning
topic models
variational inference
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 1,147
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Recommendations

Stochastic variational inference

A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation

Practical collapsed variational bayes inference for hierarchical dirichlet process

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Recommendations

Stochastic variational inference

A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation

Practical collapsed variational bayes inference for hierarchical dirichlet process

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media