skip to main content
10.1145/1458082.1458317acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Evaluating topic models for information retrieval

Published: 26 October 2008 Publication History

Abstract

We explore the utility of different types of topic models, both probabilistic and not, for retrieval purposes. We show that: (1) topic models are effective for document smoothing; (2) more elaborate topic models that capture topic dependencies provide no additional gains; (3) smoothing documents by using their similar documents is as effective as smoothing them by using topic models; (4) topics discovered on the whole corpus are too coarse-grained to be useful for query expansion. Experiments to measure topic models' ability to predict held-out likelihood confirm past results on small corpora, but suggest that simple approaches to topic model are better for large corpora.

References

[1]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[2]
T. Hofmann. Probabilistic latent semantic indexing. In Proc. of ACM SIGIR, pages 50--57, 1999.
[3]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of ACM SIGIR, pages 120--127, 2001.
[4]
W. Li and A. McCallum. Pachinko Allocation: DAG-structured mixture models of topic correlations. In Proc. of the 23rd ICML, pages 577--584, Pittsburgh, PA, 2006.
[5]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of ACM SIGIR, pages 186--193, 2004.
[6]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In Proc. of ACM SIGIR, pages 178--185, 2006.

Cited By

View all
  • (2024)Machine Learning Approach to Identify Case-Control Studies on ApoE Gene Mutations Linked to Alzheimer’s Disease in ItalyBioMedInformatics10.3390/biomedinformatics40100334:1(600-622)Online publication date: 23-Feb-2024
  • (2024)WHAT CAN WE LEARN FROM THE ‘WISDOM OF CROWDS’? DRIVERS OF (DIS)SATISFACTION IN SHARED MOBILITY PLATFORMS: A COMPARISON OF FREE-FLOATING AND STATION-BASED MODELS.Journal of Cleaner Production10.1016/j.jclepro.2024.144449(144449)Online publication date: Dec-2024
  • (2023)MOBI-Qual: a common framework to manage the product-service system quality of shared mobilityFlexible Services and Manufacturing Journal10.1007/s10696-023-09520-y36:4(1359-1398)Online publication date: 11-Nov-2023
  • Show More Cited By

Index Terms

  1. Evaluating topic models for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. retrieval
    3. topic model

    Qualifiers

    • Poster

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Machine Learning Approach to Identify Case-Control Studies on ApoE Gene Mutations Linked to Alzheimer’s Disease in ItalyBioMedInformatics10.3390/biomedinformatics40100334:1(600-622)Online publication date: 23-Feb-2024
    • (2024)WHAT CAN WE LEARN FROM THE ‘WISDOM OF CROWDS’? DRIVERS OF (DIS)SATISFACTION IN SHARED MOBILITY PLATFORMS: A COMPARISON OF FREE-FLOATING AND STATION-BASED MODELS.Journal of Cleaner Production10.1016/j.jclepro.2024.144449(144449)Online publication date: Dec-2024
    • (2023)MOBI-Qual: a common framework to manage the product-service system quality of shared mobilityFlexible Services and Manufacturing Journal10.1007/s10696-023-09520-y36:4(1359-1398)Online publication date: 11-Nov-2023
    • (2022)The Evolution of Topic ModelingACM Computing Surveys10.1145/350790054:10s(1-35)Online publication date: 10-Nov-2022
    • (2022)Proactive Query Expansion for Streaming Data Using External Sources2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020577(701-708)Online publication date: 17-Dec-2022
    • (2022)Digitalising the Systematic Literature Review process: the MySLR platformKnowledge Management Research & Practice10.1080/14778238.2022.204137521:4(777-794)Online publication date: 27-Feb-2022
    • (2020)oolong: An R package for validating automated content analysis toolsJournal of Open Source Software10.21105/joss.024615:55(2461)Online publication date: Nov-2020
    • (2018)Investigation of the Quality of Topic Models for Noisy Data Sources2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)10.1109/WI.2018.00-48(488-493)Online publication date: Dec-2018
    • (2018)Trends in anesthesiology research: a machine learning approach to theme discovery and summarizationJAMIA Open10.1093/jamiaopen/ooy0091:2(283-293)Online publication date: 4-Sep-2018
    • (2017)A Novel Query Extension Method Based on LDAAdvances in Internetworking, Data & Web Technologies10.1007/978-3-319-59463-7_25(253-261)Online publication date: 28-May-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media