skip to main content
10.1145/2396761.2398527acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Quality models for microblog retrieval

Published:29 October 2012Publication History

ABSTRACT

Microblog services typically contain very short documents (e.g., tweets) containing comments about the latest news and events. Many of these documents are not informative or have very little content due to their personal and ephemeral nature. Providing effective retrieval in a microblog service will require addressing the challenge of distinguishing the high-quality, informative documents from the others. Recent work has focused on finding features that indicate the quality of microblog documents, but the impact these quality features on retrieval is not clear. In this paper, we suggest a low-cost quality model using surrogate judgments based on user behavior (i.e., retweets) that can be collected automatically. We analyze the relationship between document informativeness and relevance judgments for microblog retrieval. Then we demonstrate that our behavior-based quality metric has a high correlation with manual judgments. Also, we perform experiments to study the impact of the quality model on microblog retrieval. The results based on the TREC Microblog track show that the proposed quality model, combined with a variety of retrieval models, can improve retrieval performance and is competitive with a model trained using manual relevance judgments.

References

  1. O. Alonso, C. Carson, D. Gerster, X. Ji, and S. U. Nabar. Detecting uninteresting content in text streams. In SIGIR'10 Crowdsourcing for Search Evaluation Workshop, 2010.Google ScholarGoogle Scholar
  2. M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In WSDM'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In WWW'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Duan, L. Jiang, T. Qin, M. Zhou, H. Shum. An empirical study on learning to rank of tweets. In Coling'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. In WWW'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Huang, Y. Yang, and X. Zhu. Quality-biased ranking of short texts in microblogging services, In IJCNLP'11, 2011.Google ScholarGoogle Scholar
  8. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Lavrenko, W. B. Croft. Relevance-based language models. In SIGIR'01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Massoudi, E. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In ECIR'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Metzler, W. B. Croft. A Markov random field model for term dependencies. In SIGIR'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Metzler and W. B. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Metzler and C. Cai, USC/ISI at TREC 2011: Microblog Track, In TREC'11, 2012.Google ScholarGoogle Scholar
  14. N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Bad news travel fast: A content-based analysis of interestingness on twitter. In WebSci'11, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Searching microblogs: Coping with sparsity and document quality. In CIKM'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H.-K. Peng, J. Zhu, D. Piao, R. Yan and J. Y. Zhang. Retweet Modeling Using Conditional Random Fields. ICDM Workshops, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Seo and W. B. Croft. Unsupervised estimation of dirichlet smoothing parameters. In SIGIR'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. D. Smucker, J. Allan, and B. Carterette, A Comparison of Statistical Significance Tests for Information Retrieval Evaluation, CIKM'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval, In SIGIR'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Teevan, D. Ramage, and M. Morris. #Twittersearch: A comparison of microblog search and web search. In WSDM'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR'01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Zhou and W. B. Croft. Document quality models for web ad hoc retrieval. In CIKM'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quality models for microblog retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader