ABSTRACT
Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which identifies key events. LTC uses k-means clustering and we explore the use of various distance measures for clustering using Euclidean, cosine similarity and Manhattan distance. We collected three original data sets consisting of Twitter microblog posts covering sporting events, consisting of a cricket and two football matches. The match summaries generated by LTC were compared against standard summaries taken from sports sections of various news outlets, which yielded up to 81% precision, 58% recall and 62% F-measure on different data sets. In addition, we also report results of all three variants of the recall-oriented understudy for gisting evaluation (ROUGE) software, a tool which compares and scores automatically generated summaries against standard summaries.
- G. Beverungen and J. Kalita. Evaluating methods for summarizing twitter posts. In Proceedings of International AAAI Conference on Web and Social Media (ICWSM), 11:9--12, 2011.Google Scholar
- S. Bird, E. Klein, and E. Loper. Natural Language Processing with Python. O'Reilly Media Inc., 2009. Google ScholarDigital Library
- D. Chakrabarti and K. Punera. Event Summarization Using Tweets. In International Conference on Weblogs and Social Media (ICWSM), 2011.Google Scholar
- M. Chaput. stemming 1.0: Python package index. https://pypi. python. org/pypi/stemming/1. 0, 2017.Google Scholar
- eMarketer. Worldwide Social Network Users: 2013 Forecast and Comparative Estimates. Technical report, eMarketer, 2013.Google Scholar
- G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457--479, 2004. Google ScholarCross Ref
- ESPN. ESPN Commentary. In http://goo.gl/UHpQBO, {accessed Jan-2016}.Google Scholar
- ESPNcricinfo. Indian Premier League - Final, Kolkata Knight Riders vs Chennai Super Kings, Scorecard. In http://goo.gl/vTpi3l, {accessed Jan-2016}.Google Scholar
- R. Halvorsen. Simple Twitter Streaming API access, tweetstream 1.1.1, https://pypi.python.org/pypi/tweetstream. Technical report, Pyhthon.org, 2011.Google Scholar
- Y. Hu, A. John, D. D. Seligmann, and F. Wang. What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds. In Intern. Conf. on Weblogs and Social Media, 2012.Google Scholar
- K. Inc. Klout|be known for what you love. https://klout.com/, 2015.Google Scholar
- Indiatoday. IPL 2012 Final Live: scores and commentary. In http: //goo.gl/UIhIkR, {accessed Jan-2016}.Google Scholar
- D. Inouye and J. K. Kalita. Comparing Twitter Summarization Algorithms for Multiple Post Summaries. In Third IEEE International Conference on Social Computing (SocialCom), pages 298--306, October 2011. Google ScholarCross Ref
- R. Kelly. Twitter Study Reveals Interesting Results About Usage, 40% is Pointless Babble. http://goo.gl/DZea6f, 2009.Google Scholar
- M. A. H. Khan, D. Bollegala, G. Liu, and K. Sezaki. Multi-tweet summarization of real-time events. In Social Computing (SocialCom), 2013 International Conference on, pages 128--133. IEEE, 2013. Google ScholarDigital Library
- K. Lerman and R. Ghosh. Information contagion: An empirical study of the spread of news on digg and twitter social networks. International Conference on Weblogs and Social Media, 10:90--97, 2010.Google Scholar
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.Google Scholar
- A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and Visualizing Microblogs for Eevent Exploration. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 227--236, 2011. Google ScholarDigital Library
- R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP), volume 4, page 275. Barcelona, Spain, 2004.Google Scholar
- J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 189--198. ACM, 2012. Google ScholarDigital Library
- B. O'Connor, M. Krieger, and D. Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. In International AAAI Conference on Web and Social Media (ICWSM), 2010.Google Scholar
- D. A. Shamma, L. Kennedy, and E. F. Churchill. Tweet the debates: Understanding community annotation of uncollected sources. In Proceedings of the first SIGMM workshop on Social media, pages 3--10, 2009. Google ScholarDigital Library
- B. P. Sharifi. Automatic microblog classification and summarization. Doctoral dissertation, University of Colorado at Colorado Springs, 2010, 2010.Google Scholar
- B. P. Sharifi, M. A. Hutton, and J. Kalita. Summarizing Microblogs Automatically. In Human Language Technologies, pages 685--688. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- B. P. Sharifi, M. A. Hutton, and J. K. Kalita. Experiments in Microblog Summarization. In IEEE International Conference on Social Computing, 2010. Google ScholarDigital Library
- A. Singhal. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35--43, 2001.Google Scholar
- Skysports. European Championships Commentary. In http://goo.gl/ Wk3mR6, {accessed Jan-2016}.Google Scholar
- Skysports. UEFA Champions League Commentary. In http://goo.gl/Df1NQo, {accessed Jan-2016}.Google Scholar
- K. Tao, F. Abel, C. Hauff, G. Houben, and U. Gadiraju. Groundhog Day: Near-Duplicate Detection on Twitter. In Proceedings of the international conference on World Wide Web, 2013. Google ScholarDigital Library
- Twitter. Twitter Statistics. Technical report, available at www.statisticbrain.com/twitter-statistics/,Online; accessed Jan-2016.Google Scholar
- UEFAchampionsLeague. UCL 2012 Final Post-Match Commentary. In http://goo.gl/LWift2, {accessed Jan-2016}Google Scholar
Index Terms
- Post Summarization of Microblogs of Sporting Events
Recommendations
Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementMicroblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however, this information is immersed among hundreds of ...
News comments generation via mining microblogs
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebMicroblogging websites such as Twitter and Chinese Sina Weibo contain large amounts of microblogs posted by users. Many of these microblogs are highly sensitive to the important real-world events and correlated to the news events. Thus, microblogs from ...
Topic and sentiment aware microblog summarization for twitter
AbstractRecent advances in microblog content summarization has primarily viewed this task in the context of traditional multi-document summarization techniques where a microblog post or their collection form one document. While these techniques already ...
Comments