ABSTRACT
In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs).
Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking.
Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.
- Y. Aboulnaga, C. L. A. Clarke, and D. R. Cheriton. Frequent itemset mining for query expansion in microblog ad-hoc search.Google Scholar
- G. Amati, G. Amodeo, M. Bianchi, G. Marcone, F. U. Bordoni, C. Gaibisso, G. Gambosi, A. Celi, C. Di Nicola, and M. Flammini. Fub, iasi-cnr, univaq at trec 2011 microblog track. In TREC, 2011.Google Scholar
- G. Amati, C. Joost, and V. Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. 2003.Google Scholar
- A. E. C. Basave, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (# msm2013) concept extraction challenge. In # MSM, pages 1--15, 2013.Google Scholar
- F. Damak, K. Pinel-Sauvagnat, M. Boughanem, and G. Cabanac. Effectiveness of state-of-the-art features for microblog search. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 914--919, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- P. Ferguson, N. O'Hare, J. Lanagan, O. Phelan, and K. McCarthy. An investigation of term weighting approaches for microblog retrieval. In Advances in Information Retrieval, pages 552--555. Springer, 2012. Google ScholarDigital Library
- J. Gao, G. Cui, S. Liu, Y. Liu, and X. Cheng. Ictnet at microblog track in trec 2013.Google Scholar
- Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at trec 2012 microblog track. TREC Microblog 2012, 2012.Google Scholar
- D. Hiemstra. Using language models for information retrieval. 2001.Google Scholar
- L. B. Jabeur, F. Damak, L. Tamine, G. Cabanac, K. Pinel-Sauvagnat, and M. Boughanem. Irit at trec microblog track 2013.Google Scholar
- Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. TREC Microblog 2012, 2012.Google Scholar
- Y. Li, Z. Zhang, W. Lv, Q. Xie, Y. Lin, R. Xu, W. Xu, G. Chen, and J. Guo. Pris at trec 2011 microblog track. In TREC, 2011.Google Scholar
- K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Advances in Information Retrieval, pages 362--367. Springer, 2011. Google ScholarDigital Library
- D. Metzler and C. Cai. Usc/isi at trec 2011: Microblog track. In Proceedings of the Text REtrieval Conference (TREC 2011), 2011.Google Scholar
- R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 153--157. IEEE, 2010. Google ScholarDigital Library
- N. Naveed, T. Gottron, J. Kunegis, and A. C. Alhadi. Searching microblogs: coping with sparsity and document quality. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 183--188. ACM, 2011. Google ScholarDigital Library
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and D. Johnson. Terrier information retrieval platform. In Advances in Information Retrieval, pages 517--519. Springer, 2005. Google ScholarDigital Library
- I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference, 2011.Google Scholar
- J. A. R. Perez, A. J. McMinn, and J. M. Jose. University of glasgow (uog_twteam) at trec microblog.Google Scholar
- B. Pre-Processing. Bjut at trec 2013 microblog track.Google Scholar
- S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009. Google ScholarDigital Library
- T. Roelleke. Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services, 5(3):1--163, 2013. Google ScholarDigital Library
- B. Sharifi, M.-A. Hutton, and J. Kalita. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 49--56, Aug 2010. Google ScholarDigital Library
- Y. Y. H. W. G. C. Siming Zhu, Zhe Gao. Pris at 2013 microblog track.Google Scholar
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29. ACM, 1996. Google ScholarDigital Library
- K. Tao, F. Abel, C. Hauff, and G.-J. Houben. What makes a tweet relevant for a topic? Making Sense of Microposts (# MSM2012), pages 49--56, 2012.Google Scholar
- J. Teevan, D. Ramage, and M. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarDigital Library
- S. K. J. Y. P. Thomas. Searching and filtering tweets: Csiro at the trec 2012 microblog track.Google Scholar
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001. Google ScholarDigital Library
Index Terms
- On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval
Recommendations
On using inter-document relations in microblog retrieval
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMicroblog Ad-hoc retrieval has received much attention in recent years. As a result of the high vocabulary diversity of the publishing users, a mismatch is formed between the queries being formulated and the tweets representing the actual topics. In ...
Analysis of Microblog Rumors and Correction Texts for Disaster Situations
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesMicroblogging systems such as Twitter have become popular. They are especially useful and helpful for users in disaster situations. Microblogs have facilitated the spread of information of all kinds, even rumors. Rumors block adequate information ...
User spread influence measurement in microblog
With the popular of online social network, the studies of information diffusion on social media also become very attractive direction. Knowing the influence of users and being able to predict it can be very helpful in enhancing or controlling the ...
Comments