skip to main content
10.1145/1871437.1871696acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

PTM: probabilistic topic mapping model for mining parallel document collections

Published: 26 October 2010 Publication History

Abstract

Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on one real parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[2]
G. Cong, L.Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In SIGIR '08, pages 467--474, New York, NY, USA, 2008. ACM.
[3]
A. Corrada-Emmanuel and W. B. Croft. Answer models for question answering passage retrieval. In SIGIR '04, pages 516--517, New York, NY, USA, 2004. ACM.
[4]
H. T. Dang, D. Kelly, and J. Lin. Overview of the trec 2007 question answering track. In Proceeding of the 16th Text Retrieval Conference, 2007.
[5]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn., 42(1-2):177--196, 2001.
[6]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01, pages 111--119, New York, USA, 2001. ACM.
[7]
D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. Mccallum. Polylingual topic models. In EMNLP'09, pages 880--889, Singapore, August 2009. Association for Computational Linguistics.
[8]
R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. In KDD '08, pages 542--550, New York, USA, 2008. ACM.
[9]
D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina. Clustering the tagged web. In WSDM '09, pages 54--63, New York, USA, 2009. ACM.
[10]
L. Shrestha and K. McKeown. Detection of question-answer pairs in email conversations. In COLING '04, Morristown, NJ, USA, 2004. Association for Computational Linguistics.
[11]
R. Soricut and E. Brill. Automatic question answering using the web: Beyond the factoid. Inf. Retr., 9(2):191--206, 2006.
[12]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD '04, pages 306--315, New York, USA, 2004. ACM.
[13]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR '08, pages 475--482, New York, NY, USA, 2008. ACM.
[14]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, New York, USA, 2001. ACM.
[15]
D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring social annotations for information retrieval. In WWW '08, pages 715--724, New York, NY, USA, 2008. ACM.

Cited By

View all
  • (2018)Discovering Correspondence of Sentiment Words and AspectsComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_18(233-245)Online publication date: 21-Mar-2018
  • (2012)Latent association analysis of document pairsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339752(1415-1423)Online publication date: 12-Aug-2012

Index Terms

  1. PTM: probabilistic topic mapping model for mining parallel document collections

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
    October 2010
    2036 pages
    ISBN:9781450300995
    DOI:10.1145/1871437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. mining parallel document collections
    2. probabilistic topic mapping

    Qualifiers

    • Poster

    Conference

    CIKM '10

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Discovering Correspondence of Sentiment Words and AspectsComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_18(233-245)Online publication date: 21-Mar-2018
    • (2012)Latent association analysis of document pairsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339752(1415-1423)Online publication date: 12-Aug-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media