research-article

On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

Authors:
Jesus Alberto Rodriguez Perez

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Joemon M. Jose

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information RetrievalSeptember 2015Pages 211–220https://doi.org/10.1145/2808194.2809466

Published:27 September 2015Publication History

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Pages 211–220

ABSTRACT

In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs).

Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking.

Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.

References

Y. Aboulnaga, C. L. A. Clarke, and D. R. Cheriton. Frequent itemset mining for query expansion in microblog ad-hoc search.Google Scholar
G. Amati, G. Amodeo, M. Bianchi, G. Marcone, F. U. Bordoni, C. Gaibisso, G. Gambosi, A. Celi, C. Di Nicola, and M. Flammini. Fub, iasi-cnr, univaq at trec 2011 microblog track. In TREC, 2011.Google Scholar
G. Amati, C. Joost, and V. Rijsbergen. Probabilistic models for information retrieval based on divergence from randomness. 2003.Google Scholar
A. E. C. Basave, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (# msm2013) concept extraction challenge. In # MSM, pages 1--15, 2013.Google Scholar
F. Damak, K. Pinel-Sauvagnat, M. Boughanem, and G. Cabanac. Effectiveness of state-of-the-art features for microblog search. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 914--919, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
P. Ferguson, N. O'Hare, J. Lanagan, O. Phelan, and K. McCarthy. An investigation of term weighting approaches for microblog retrieval. In Advances in Information Retrieval, pages 552--555. Springer, 2012. Google ScholarDigital Library
J. Gao, G. Cui, S. Liu, Y. Liu, and X. Cheng. Ictnet at microblog track in trec 2013.Google Scholar
Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at trec 2012 microblog track. TREC Microblog 2012, 2012.Google Scholar
D. Hiemstra. Using language models for information retrieval. 2001.Google Scholar
L. B. Jabeur, F. Damak, L. Tamine, G. Cabanac, K. Pinel-Sauvagnat, and M. Boughanem. Irit at trec microblog track 2013.Google Scholar
Y. Kim, R. Yeniterzi, and J. Callan. Overcoming vocabulary limitations in twitter microblogs. TREC Microblog 2012, 2012.Google Scholar
Y. Li, Z. Zhang, W. Lv, Q. Xie, Y. Lin, R. Xu, W. Xu, G. Chen, and J. Guo. Pris at trec 2011 microblog track. In TREC, 2011.Google Scholar
K. Massoudi, M. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In Advances in Information Retrieval, pages 362--367. Springer, 2011. Google ScholarDigital Library
D. Metzler and C. Cai. Usc/isi at trec 2011: Microblog track. In Proceedings of the Text REtrieval Conference (TREC 2011), 2011.Google Scholar
R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 153--157. IEEE, 2010. Google ScholarDigital Library
N. Naveed, T. Gottron, J. Kunegis, and A. C. Alhadi. Searching microblogs: coping with sparsity and document quality. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 183--188. ACM, 2011. Google ScholarDigital Library
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and D. Johnson. Terrier information retrieval platform. In Advances in Information Retrieval, pages 517--519. Springer, 2005. Google ScholarDigital Library
I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference, 2011.Google Scholar
J. A. R. Perez, A. J. McMinn, and J. M. Jose. University of glasgow (uog_twteam) at trec microblog.Google Scholar
B. Pre-Processing. Bjut at trec 2013 microblog track.Google Scholar
S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009. Google ScholarDigital Library
T. Roelleke. Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services, 5(3):1--163, 2013. Google ScholarDigital Library
B. Sharifi, M.-A. Hutton, and J. Kalita. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 49--56, Aug 2010. Google ScholarDigital Library
Y. Y. H. W. G. C. Siming Zhu, Zhe Gao. Pris at 2013 microblog track.Google Scholar
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29. ACM, 1996. Google ScholarDigital Library
K. Tao, F. Abel, C. Hauff, and G.-J. Houben. What makes a tweet relevant for a topic? Making Sense of Microposts (# MSM2012), pages 49--56, 2012.Google Scholar
J. Teevan, D. Ramage, and M. Morris. # twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarDigital Library
S. K. J. Y. P. Thomas. Searching and filtering tweets: Csiro at the trec 2012 microblog track.Google Scholar
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001. Google ScholarDigital Library

Index Terms

On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

On using inter-document relations in microblog retrieval
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Microblog Ad-hoc retrieval has received much attention in recent years. As a result of the high vocabulary diversity of the publishing users, a mismatch is formed between the queries being formulated and the tweets representing the actual topics. In ...
Read More
Analysis of Microblog Rumors and Correction Texts for Disaster Situations
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

Microblogging systems such as Twitter have become popular. They are especially useful and helpful for users in disaster situations. Microblogs have facilitated the spread of information of all kinds, even rumors. Rumors block adequate information ...
Read More
User spread influence measurement in microblog

With the popular of online social network, the studies of information diffusion on social media also become very attractive direction. Knowing the influence of users and being able to predict it can be very helpful in enhancing or controlling the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval
September 2015
402 pages
ISBN:9781450338332
DOI:10.1145/2808194
General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Bruce Croft
University of Massachusetts Amherst, USA
,
Program Chairs:
Arjen de Vries
CWI Amsterdam, The Netherlands
,
Chengxiang Zhai
University of Illinois at Urbana-Champaign, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ad-hoc retrieval
dimensions
microblog
modelling
ranking
state machine
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '15 Paper Acceptance Rate29of57submissions,51%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 106
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

On using inter-document relations in microblog retrieval

Analysis of Microblog Rumors and Correction Texts for Disaster Situations

User spread influence measurement in microblog