skip to main content
10.1145/2107736.2107741acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Probabilistic topic models

Published:21 August 2011Publication History

ABSTRACT

Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.

In this tutorial, I will review the state-of-the-art in probabilistic topic models. I will describe the three components of topic modeling:

(1) Topic modeling assumptions

(2) Algorithms for computing with topic models

(3) Applications of topic models

In (1), I will describe latent Dirichlet allocation (LDA), which is one of the simplest topic models, and then describe a variety of ways that we can build on it. These include dynamic topic models, correlated topic models, supervised topic models, author-topic models, bursty topic models, Bayesian nonparametric topic models, and others. I will also discuss some of the fundamental statistical ideas that are used in building topic models, such as distributions on the simplex, hierarchical Bayesian modeling, and models of mixed-membership.

In (2), I will review how we compute with topic models. I will describe approximate posterior inference for directed graphical models using both sampling and variational inference, and I will discuss the practical issues and pitfalls in developing these algorithms for topic models. Finally, I will describe some of our most recent work on building algorithms that can scale to millions of documents and documents arriving in a stream.

In (3), I will discuss applications of topic models. These include applications to images, music, social networks, and other data in which we hope to uncover hidden patterns. I will describe some of our recent work on adapting topic modeling algorithms to collaborative filtering, legislative modeling, and bibliometrics without citations.

Finally, I will discuss some future directions and open research problems in topic models.

Skip Supplemental Material Section

Supplemental Material

tutorial-6-part1.mp4

mp4

528.6 MB

tutorial-6-part2.mp4

mp4

463.4 MB

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    KDD '11 Tutorials: Proceedings of the 17th ACM SIGKDD International Conference Tutorials
    August 2011
    5 pages
    ISBN:9781450312011
    DOI:10.1145/2107736

    Copyright © 2011 Author

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 August 2011

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Upcoming Conference