skip to main content
10.1145/2808194.2809466acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

Published:27 September 2015Publication History

ABSTRACT

In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs).

Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking.

Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.

References

  1. Y. Aboulnaga, C. L. A. Clarke, and D. R. Cheriton. Frequent itemset mining for query expansion in microblog ad-hoc search.Google ScholarGoogle Scholar
  2. G. Amati, G. Amodeo, M. Bianchi, G. Marcone, F. U. Bordoni, C. Gaibisso, G. Gambosi, A. Celi, C. Di Nicola, and M. Flammini. Fub, iasi-cnr, univaq at trec 2011 microblog track. In TREC, 2011.Google ScholarGoogle Scholar
  3. G. Amati, C. Joost, and V. Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. 2003.Google ScholarGoogle Scholar
  4. A. E. C. Basave, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (# msm2013) concept extraction challenge. In # MSM, pages 1--15, 2013.Google ScholarGoogle Scholar
  5. F. Damak, K. Pinel-Sauvagnat, M. Boughanem, and G. Cabanac. Effectiveness of state-of-the-art features for microblog search. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 914--919, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Ferguson, N. O'Hare, J. Lanagan, O. Phelan, and K. McCarthy. An investigation of term weighting approaches for microblog retrieval. In Advances in Information Retrieval, pages 552--555. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Gao, G. Cui, S. Liu, Y. Liu, and X. Cheng. Ictnet at microblog track in trec 2013.Google ScholarGoogle Scholar
  8. Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at trec 2012 microblog track. TREC Microblog 2012, 2012.Google ScholarGoogle Scholar
  9. D. Hiemstra. Using language models for information retrieval. 2001.Google ScholarGoogle Scholar
  10. L. B. Jabeur, F. Damak, L. Tamine, G. Cabanac, K. Pinel-Sauvagnat, and M. Boughanem. Irit at trec microblog track 2013.Google ScholarGoogle Scholar
  11. Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. TREC Microblog 2012, 2012.Google ScholarGoogle Scholar
  12. Y. Li, Z. Zhang, W. Lv, Q. Xie, Y. Lin, R. Xu, W. Xu, G. Chen, and J. Guo. Pris at trec 2011 microblog track. In TREC, 2011.Google ScholarGoogle Scholar
  13. K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Advances in Information Retrieval, pages 362--367. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Metzler and C. Cai. Usc/isi at trec 2011: Microblog track. In Proceedings of the Text REtrieval Conference (TREC 2011), 2011.Google ScholarGoogle Scholar
  15. R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 153--157. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Naveed, T. Gottron, J. Kunegis, and A. C. Alhadi. Searching microblogs: coping with sparsity and document quality. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 183--188. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and D. Johnson. Terrier information retrieval platform. In Advances in Information Retrieval, pages 517--519. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference, 2011.Google ScholarGoogle Scholar
  19. J. A. R. Perez, A. J. McMinn, and J. M. Jose. University of glasgow (uog_twteam) at trec microblog.Google ScholarGoogle Scholar
  20. B. Pre-Processing. Bjut at trec 2013 microblog track.Google ScholarGoogle Scholar
  21. S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Roelleke. Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services, 5(3):1--163, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Sharifi, M.-A. Hutton, and J. Kalita. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 49--56, Aug 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Y. H. W. G. C. Siming Zhu, Zhe Gao. Pris at 2013 microblog track.Google ScholarGoogle Scholar
  25. A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Tao, F. Abel, C. Hauff, and G.-J. Houben. What makes a tweet relevant for a topic? Making Sense of Microposts (# MSM2012), pages 49--56, 2012.Google ScholarGoogle Scholar
  27. J. Teevan, D. Ramage, and M. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. K. J. Y. P. Thomas. Searching and filtering tweets: Csiro at the trec 2012 microblog track.Google ScholarGoogle Scholar
  29. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
        September 2015
        402 pages
        ISBN:9781450338332
        DOI:10.1145/2808194

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 September 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICTIR '15 Paper Acceptance Rate29of57submissions,51%Overall Acceptance Rate209of482submissions,43%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader