skip to main content
10.1145/3018661.3018728acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Modeling Event Importance for Ranking Daily News Events

Published: 02 February 2017 Publication History

Abstract

We deal with the problem of ranking news events on a daily basis for large news corpora, an essential building block for news aggregation. News ranking has been addressed in the literature before but with individual news articles as the unit of ranking. However, estimating event importance accurately requires models to quantify current day event importance as well as its significance in the historical context. Consequently, in this paper we show that a cluster of news articles representing an event is a better unit of ranking as it provides an improved estimation of popularity, source diversity and authority cues. In addition, events facilitate quantifying their historical significance by linking them with long-running topics and recent chain of events. Our main contribution in this paper is to provide effective models for improved news event ranking.
To this end, we propose novel event mining and feature generation approaches for improving estimates of event importance. Finally, we conduct extensive evaluation of our approaches on two large real-world news corpora each of which span for more than a year with a large volume of up to tens of thousands of daily news articles. Our evaluations are large-scale and based on a clean human curated ground-truth from Wikipedia Current Events Portal. Experimental comparison with a state-of-the-art news ranking technique based on language models demonstrates the effectiveness of our approach.

References

[1]
J. Allan. Topic Detection and Tracking: Event-Based Information Organization. Springer, Feb. 2002.
[2]
J. Aslam, F. Diaz, M. Ekstrand-Abueg, R. McCreadie, V. Pavlu, and T. Sakai. Trec 2014 temporal summarization track overview. Technical report, DTIC Document, 2015.
[3]
G. Binh Tran. Structured summarization for news events. In WWW, pages 343--348, 2013.
[4]
R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190--1208, 1995.
[5]
W. Dakka and L. Gravano. Efficient summarization-aware search for online news articles. In JCDL, pages 63--72. ACM, 2007.
[6]
G. M. Del Corso, A. Gullí, and F. Romani. Ranking a stream of news. In WWW, pages 97--106. ACM, 2005.
[7]
P. Ferragina and U. Scaiella. TAGME: On-the-fly annotation of short text fragments (by wikipedia entities). In CIKM, pages 1625--1628. ACM, 2010.
[8]
B. Fetahu, K. Markert, and A. Anand. Automated news suggestions for populating wikipedia entity pages. In CIKM, pages 323--332. ACM, 2015.
[9]
B. Fetahu, K. Markert, W. Nejdl, and A. Anand. Finding news citations for wikipedia. In CIKM, pages 337--346, New York, NY, USA, 2016. ACM.
[10]
M. Gallé and J.-M. Renders. Full and mini-batch clustering of news articles with star-em. In Advances in Information Retrieval, pages 494--498. Springer, 2012.
[11]
W. Gao, P. Li, and K. Darwish. Joint topic modeling for event summarization across news and social media streams. In CIKM, pages 1173--1182. ACM, 2012.
[12]
A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529, 1999.
[13]
R. Gwadera and F. Crestani. Mining and ranking streams of news stories using cross-stream sequential patterns. In CIKM, pages 1709--1712. ACM, 2009.
[14]
R. Gwadera and F. Crestani. Mining news streams using cross-stream sequential patterns. In RIAO, pages 106--113, 2010.
[15]
J. Hoffart, D. Milchevski, and G. Weikum. Stics: searching with strings, things, and cats. In SIGIR, pages 1247--1248. ACM, 2014.
[16]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, pages 782--792. ACL, 2011.
[17]
T. Joachims. Training linear svms in linear time. In SIGKDD, pages 217--226. ACM, 2006.
[18]
M. Kabadjov, M. Atkinson, J. Steinberger, R. Steinberger, and E. Van Der Goot. Newsgist: a multilingual statistical news summarizer. In ECML PKDD, pages 591--594. Springer, 2010.
[19]
L. Kong, S. Jiang, R. Yan, S. Xu, and Y. Zhang. Ranking news events by influence decay and information fusion for media and users. In CIKM, pages 1849--1853. ACM, 2012.
[20]
E. Kuzey, V. Setty, J. Strötgen, and G. Weikum. As time goes by: Comprehensive tagging of textual phrases with temporal scopes. In WWW, pages 915--925, 2016.
[21]
E. Kuzey, J. Vreeken, and G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In CIKM, pages 1689--1698. ACM, 2014.
[22]
Y. Lee and J.-H. Lee. Identifying top news stories based on their popularity in the blogosphere. Information Retrieval, 17(4):326--350, May 2014.
[23]
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010.
[24]
L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan. SCENE: a scalable two-stage personalized news recommendation system. In SIGIR, pages 125--134. ACM, 2011.
[25]
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009.
[26]
R. McCreadie, C. Macdonald, and I. Ounis. News vertical search: when and what to display to users. In SIGIR, pages 253--262. ACM, 2013.
[27]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In SIGKDD, pages 198--207. ACM, 2005.
[28]
A. Mishra and K. Berberich. Leveraging semantic annotations to link wikipedia and news archives. In ECIR, pages 30--42. Springer International Publishing, 2016.
[29]
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In CIKM, pages 446--453. ACM, 2004.
[30]
I. Ounis, C. Macdonald, and I. Soboroff. Overview of the TREC-2008 blog track. 2008.
[31]
T. Pang-Ning, M. Steinbach, V. Kumar, et al. Introduction to data mining. In Library of Congress, page 74, 2006.
[32]
G. Raveendran and C. L. Clarke. Lightweight contrastive summarization for news comment mining. In SIGIR, pages 1103--1104. ACM, 2012.
[33]
V. Setty, S. Bedathur, K. Berberich, and G. Weikum. Inzeit: efficiently identifying insightful time points. Proceedings of the VLDB Endowment, 3(1--2):1605--1608, 2010.
[34]
D. Shahaf and C. Guestrin. Connecting the dots between news articles. In SIGKDD, pages 623--632. ACM, 2010.
[35]
D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec. Information cartography: creating zoomable, large-scale maps of information. In SIGKDD, pages 1097--1105. ACM, 2013.
[36]
J. Singh, W. Nejdl, and A. Anand. History by diversity: Helping historians search news archives. In CHIIR, pages 183--192. ACM, 2016.
[37]
S. Vadrevu, C. H. Teo, S. Rajan, K. Punera, B. Dom, A. J. Smola, Y. Chang, and Z. Zheng. Scalable clustering of news search results. In WSDM, pages 675--684. ACM, 2011.
[38]
C. Wang, M. Zhang, L. Ru, and S. Ma. Automatic online news topic ranking using media focus and user attention based on aging theory. In CIKM, pages 1033--1042. ACM, 2008.

Cited By

View all
  • (2023)Cross-collection latent Beta-Liouville allocation model training with privacy protection and applicationsApplied Intelligence10.1007/s10489-022-04378-353:14(17824-17848)Online publication date: 13-Jan-2023
  • (2020)Probabilistic Topic Modeling for Comparative Analysis of Document CollectionsACM Transactions on Knowledge Discovery from Data10.1145/336987314:2(1-27)Online publication date: 4-Mar-2020
  • (2020)An Empirical Study on Utilizing Neural Network for Event Information RetrievalJournal of Physics: Conference Series10.1088/1742-6596/1621/1/0120511621(012051)Online publication date: 5-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. daily news ranking
  2. event canonicalization
  3. event linking
  4. learning to rank
  5. news clustering
  6. news event mining
  7. news event ranking

Qualifiers

  • Research-article

Funding Sources

  • ERC Advanced Grant ALEXANDRIA

Conference

WSDM 2017

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Cross-collection latent Beta-Liouville allocation model training with privacy protection and applicationsApplied Intelligence10.1007/s10489-022-04378-353:14(17824-17848)Online publication date: 13-Jan-2023
  • (2020)Probabilistic Topic Modeling for Comparative Analysis of Document CollectionsACM Transactions on Knowledge Discovery from Data10.1145/336987314:2(1-27)Online publication date: 4-Mar-2020
  • (2020)An Empirical Study on Utilizing Neural Network for Event Information RetrievalJournal of Physics: Conference Series10.1088/1742-6596/1621/1/0120511621(012051)Online publication date: 5-Sep-2020
  • (2020)A Collaborative Filtering Based Ranking Algorithm for Classifying and Ranking NEWS TOPICS Using Factors of Social MediaAdvances in Computational and Bio-Engineering10.1007/978-3-030-46939-9_26(299-318)Online publication date: 20-Jul-2020
  • (2020)Identifying Notable News StoriesAdvances in Information Retrieval10.1007/978-3-030-45442-5_44(352-358)Online publication date: 8-Apr-2020
  • (2019)EventKG – the hub of event knowledge on the web – and biographical timeline generationSemantic Web10.3233/SW-19035510:6(1039-1070)Online publication date: 1-Jan-2019
  • (2019)HapPenIng: Happen, Predict, Infer—Event Series Completion in a Knowledge GraphThe Semantic Web – ISWC 201910.1007/978-3-030-30793-6_12(200-218)Online publication date: 17-Oct-2019
  • (2018)Open-Schema Event Profiling for Massive News CorporaProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271674(587-596)Online publication date: 17-Oct-2018
  • (2018)Distributed and Dynamic Clustering For News EventsProceedings of the 12th ACM International Conference on Distributed and Event-based Systems10.1145/3210284.3219774(254-257)Online publication date: 25-Jun-2018
  • (2018)Event2VecThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210136(1013-1016)Online publication date: 27-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media