research-article

Growing Story Forest Online from Massive Breaking News

Authors:
Bang Liu

University of Alberta, Edmonton, AB, Canada

University of Alberta, Edmonton, AB, Canada
View Profile

,
Di Niu

University of Alberta, Edmonton, AB, Canada

University of Alberta, Edmonton, AB, Canada
View Profile

,
Kunfeng Lai

Tencent Inc., Shenzhen, China

Tencent Inc., Shenzhen, China
View Profile

,
Linglong Kong

University of Alberta, Edmonton, AB, Canada

University of Alberta, Edmonton, AB, Canada
View Profile

,
Yu Xu

Tencent Inc., Shenzhen, China

Tencent Inc., Shenzhen, China
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 777–785https://doi.org/10.1145/3132847.3132852

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 777–785

ABSTRACT

We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. We conducted extensive evaluation based on 60 GB of real-world Chinese news data, although our ideas are not language-dependent and can easily be extended to other languages, through detailed pilot user experience studies. The results demonstrate the superior capability of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers, compared to multiple existing algorithm frameworks.

References

Charu C Aggarwal and ChengXiang Zhai. 2012. A survey of text clustering algorithms. Mining text data. Springer, 77--128.Google Scholar
James Allan. 2012. Topic detection and tracking: event-based information organization. Vol. Vol. 12. Springer Science & Business Media. Google ScholarDigital Library
James Allan, Ron Papka, and Victor Lavrenko. 1998. On-line new event detection and tracking. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 37--45. Google ScholarDigital Library
Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. 2010. Evolutionary Clustering. Springer US. 332--337 pages.Google Scholar
Pi-Chuan Chang, Michel Galley, and Christopher D Manning. 2008. Optimizing Chinese word segmentation for machine translation performance Proceedings of the third workshop on statistical machine translation. Association for Computational Linguistics, 224--232. Google ScholarDigital Library
Christos Faloutsos, Kevin S McCurley, and Andrew Tomkins. 2004. Fast discovery of connection subgraphs. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 118--127. Google ScholarDigital Library
Renchu Guan, Xiaohu Shi, Maurizio Marchese, Chen Yang, and Yanchun Liang. 2011. Text clustering with seeds affinity propagation. IEEE Transactions on Knowledge and Data Engineering, Vol. 23, 4 (2011), 627--637. Google ScholarDigital Library
Ting Hua, Xuchao Zhang, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. 2016. Automatical Storyline Generation with Help from Twitter Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2383--2388. Google ScholarDigital Library
Lifu Huang and Lian'en Huang. 2013. Optimized Event Storyline Generation based on Mixture-Event-Aspect Model. EMNLP. 726--735.Google Scholar
Liping Jing, Michael K Ng, and Joshua Z Huang. 2010. Knowledge-based vector space model for text clustering. Knowledge and information systems Vol. 25, 1 (2010), 35--55. Google ScholarDigital Library
Liping Jing, Michael K Ng, Jun Xu, and Joshua Zhexue Huang. 2005. Subspace clustering of text documents with feature weighting k-means algorithm Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 802--812. Google ScholarDigital Library
Luying Liu, Jianchu Kang, Jing Yu, and Zhongliang Wang. 2005. A comparative study on unsupervised feature selection methods for text clustering Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE'05. Proceedings of 2005 IEEE International Conference on. IEEE, 597--601.Google Scholar
Ida Mele and Fabio Crestani. 2017. Event Detection for Heterogeneous News Streams. In International Conference on Applications of Natural Language to Information Systems. 110--123.Google Scholar
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics.Google Scholar
Ramesh Nallapati, Ao Feng, Fuchun Peng, and James Allan. 2004. Event threading within news topics. In Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, 446--453. Google ScholarDigital Library
Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. 2004. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, 9 (2004), 2658--2663.Google ScholarCross Ref
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. EMNLP-CoNLL, Vol. Vol. 7. 410--420.Google Scholar
Hassan Sayyadi, Matthew Hurst, and Alexey Maykov. 2009. Event detection and tracking in social streams.. Icwsm.Google Scholar
Hassan Sayyadi and Louiqa Raschid. 2013. A graph analytical approach for topic detection. ACM Transactions on Internet Technology (TOIT), Vol. 13, 2 (2013), 4. Google ScholarDigital Library
Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. 2012. Trains of thought: Generating information maps. In Proceedings of the 21st international conference on World Wide Web. ACM, 899--908. Google ScholarDigital Library
Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, and Jure Leskovec. 2013. Information cartography: creating zoomable, large-scale maps of information Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1097--1105. Google ScholarDigital Library
Lu Wang, Claire Cardie, and Galen Marchetti. 2016. Socially-informed timeline generation for complex events. arXiv preprint arXiv:1606.05699 (2016).Google Scholar
Shize Xu, Shanshan Wang, and Yan Zhang. 2013. Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction. EMNLP. 1281--1291.Google Scholar
Rui Yan, Xiaojun Wan, Jahna Otterbacher, Liang Kong, Xiaoming Li, and Yan Zhang. 2011. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 745--754. Google ScholarDigital Library
Christopher C Yang, Xiaodong Shi, and Chih-Ping Wei. 2009. Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Vol. 39, 4 (2009), 850--863. Google ScholarDigital Library
Yiming Yang, Jaime Carbonell, Ralf Brown, John Lafferty, Thomas Pierce, and Thomas Ault. 2002. Multi-strategy learning for topic detection and tracking. Topic detection and tracking. Springer, 85--114. Google ScholarDigital Library
Deyu Zhou, Haiyang Xu, and Yulan He. 2015. An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles. EMNLP. 1943--1948.Google Scholar

Growing Story Forest Online from Massive Breaking News
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Story Forest: Extracting Events and Telling Stories from Breaking News

Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. ...
Read More
From Linear Story Generation to Branching Story Graphs

Interactive narrative systems are storytelling systems in which the user can influence the content or ordering of story world events. Conceptually, an interactive narrative can be represented as a branching graph of narrative elements, implying points ...
Read More
Say Anything: A Massively Collaborative Open Domain Story Writing Companion
Interactive Storytelling
Abstract
Interactive storytelling is an interesting cross-disciplinary area that has importance in research as well as entertainment. In this paper we explore a new area of interactive storytelling that blurs the line between traditional interactive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
online story tree
text clustering
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 400
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Growing Story Forest Online from Massive Breaking News

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Recommendations

Story Forest: Extracting Events and Telling Stories from Breaking News

From Linear Story Generation to Branching Story Graphs

Say Anything: A Massively Collaborative Open Domain Story Writing Companion