skip to main content
10.1145/2806416.2806486acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events

Published: 17 October 2015 Publication History

Abstract

Long-running, high-impact events such as the Boston Marathon bombing often develop through many stages and involve a large number of entities in their unfolding. Timeline summarization of an event by key sentences eases story digestion, but does not distinguish between what a user remembers and what she might want to re-check. In this work, we present a novel approach for timeline summarization of high-impact events, which uses entities instead of sentences for summarizing the event at each individual point in time. Such entity summaries can serve as both (1) important memory cues in a retrospective event consideration and (2) pointers for personalized event exploration. In order to automatically create such summaries, it is crucial to identify the "right" entities for inclusion. We propose to learn a ranking function for entities, with a dynamically adapted trade-off between the in-document salience of entities and the informativeness of entities across documents, i.e., the level of new information associated with an entity for a time point under consideration. Furthermore, for capturing collective attention for an entity we use an innovative soft labeling approach based on Wikipedia. Our experiments on a real large news datasets confirm the effectiveness of the proposed methods.

References

[1]
P. André, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: serendipity and its role in web search. In CHI, 2009.
[2]
D. Berntsen. Involuntary autobiographical memories: An introduction to the unbidden past. Cambridge University Press, 2009.
[3]
J. Bian, X. Li, F. Li, Z. Zheng, and H. Zha. Ranking specialization for web search: a divide-and-conquer approach by using topical ranksvm. In WWW, 2010.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 2003.
[5]
B. Boguraev and C. Kennedy. Salience-based content characterisation of text documents. ACL, 1997.
[6]
I. Bordino, Y. Mejova, and M. Lalmas. Penguins in sweaters, or serendipitous entity search on user-generated content. In CIKM, 2013.
[7]
M. Ciglan and K. Nørvåg. Wikipop: Personalized event detection system based on wikipedia page view statistics. In CIKM, 2010.
[8]
G. Demartini, M. M. S. Missen, R. Blanco, and H. Zaragoza. Taer: time-aware entity retrieval-exploiting the past to find relevant entities in news articles. In CIKM, 2010.
[9]
Q. Do, D. Roth, M. Sammons, Y. Tu, and V. Vydiswaran. Robust, light-weight approaches to compute lexical similarity. Computer Science Research and Technical Reports, University of Illinois, 2009.
[10]
J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. EACL, 2014.
[11]
G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457--479, 2004.
[12]
M. Gamon, T. Yano, X. Song, J. Apacible, and P. Pantel. Identifying salient entities in web pages. In CIKM, 2013.
[13]
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In RecSys, 2010.
[14]
X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, 2008.
[15]
A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, 1999.
[16]
R. Gunning. Judges scold lawyers for bad writing, 1952.
[17]
J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. Kore: keyphrase overlap relatedness for entity disambiguation. In CIKM, 2012.
[18]
H. Imran and A. Sharan. Improving effectiveness of query expansion using information theoretic approach. In Trends in Applied Intelligent Systems. 2010.
[19]
T. Joachims. Optimizing search engines using clickthrough data. In KDD, 2002.
[20]
J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas (automated readability index, fog count and esch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document, 1975.
[21]
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In WSDM. ACM, 2010.
[22]
H. Lee, M. Recasens, A. Chang, M. Surdeanu, and D. Jurafsky. Joint entity and event coreference resolution across documents. In EMNLP, 2012.
[23]
R. McCreadie, C. Macdonald, and I. Ounis. Incremental update summarization: Adaptive sentence selection based on prevalence and novelty. In CIKM, 2014.
[24]
X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang. Entity-centric topic-oriented opinion summarization in twitter. In KDD, 2012.
[25]
Y. Moshfeghi, M. Matthews, R. Blanco, and J. M. Jose. Influence of timeline and named-entity components on user engagement. In ECIR. 2013.
[26]
A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramid method. In NAACL-HLT, 2004.
[27]
D. Shahaf, C. Guestrin, and E. Horvitz. Trains of thought: Generating information maps. In WWW, 2012.
[28]
G. B. Tran, T. Tran, N.-K. Tran, M. Alrifai, and N. Kanhabua. Leverage learning to rank in an optimization framework for timeline summarization. In TAIA Workshop at SIGIR, 2013.
[29]
E. van den Hoven and B. Egge. The cue is key - design for real-life remembering. Zeitschrift für Psychologie., 222(2):110--117, 2014.
[30]
L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova. Beyond sumbasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management, 43(6), 2007.
[31]
L. Wang, H. Raghavan, V. Castelli, R. Florian, and C. Cardie. A sentence compression based framework to query-focused multi-document summarization. In ACL, 2013.
[32]
S. Whiting, J. Jose, and O. Alonso. Wikipedia as a time machine. In WWW, pages 857--862, 2014.
[33]
Z. Wu and C. L. Giles. Measuring term informativeness in context. In NAACL-HLT, 2013.
[34]
R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In SIGIR, 2011.
[35]
X. W. Zhao, Y. Guo, R. Yan, Y. He, and X. Li. Timeline generation with social attention. In SIGIR, 2013

Cited By

View all
  • (2023)Joint Learning-based Heterogeneous Graph Attention Network for Timeline SummarizationJournal of Natural Language Processing10.5715/jnlp.30.18430:1(184-214)Online publication date: 2023
  • (2020)Context-Guided Learning to Rank EntitiesAdvances in Information Retrieval10.1007/978-3-030-45439-5_6(83-96)Online publication date: 8-Apr-2020
  • (2019)It all starts with entities: A Salient entity topic modelNatural Language Engineering10.1017/S135132491900058526:5(531-549)Online publication date: 22-Nov-2019
  • Show More Cited By

Index Terms

  1. Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
    October 2015
    1998 pages
    ISBN:9781450337946
    DOI:10.1145/2806416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity retrieval
    2. learning to rank
    3. news
    4. temporal ranking
    5. timeline summarization
    6. wikipedia

    Qualifiers

    • Research-article

    Funding Sources

    • ForgetIT EU FP7 Project
    • ERC Advanced Grant ALEXANDRIA project

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Joint Learning-based Heterogeneous Graph Attention Network for Timeline SummarizationJournal of Natural Language Processing10.5715/jnlp.30.18430:1(184-214)Online publication date: 2023
    • (2020)Context-Guided Learning to Rank EntitiesAdvances in Information Retrieval10.1007/978-3-030-45439-5_6(83-96)Online publication date: 8-Apr-2020
    • (2019)It all starts with entities: A Salient entity topic modelNatural Language Engineering10.1017/S135132491900058526:5(531-549)Online publication date: 22-Nov-2019
    • (2019)Discovering Latent Threads in Entity HistoriesData Science and Engineering10.1007/s41019-019-00108-x4:4(336-351)Online publication date: 15-Nov-2019
    • (2019)Incorporating word attention with convolutional neural networks for abstractive summarizationWorld Wide Web10.1007/s11280-019-00709-623:1(267-287)Online publication date: 6-Aug-2019
    • (2019)Metro maps for efficient knowledge learning by summarizing massive electronic textbooksInternational Journal on Document Analysis and Recognition10.1007/s10032-019-00319-y22:2(99-111)Online publication date: 1-Jun-2019
    • (2018)What to write and whyProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167274(1321-1330)Online publication date: 9-Apr-2018
    • (2018)Explicit Diversification of Event Aspects for Temporal SummarizationACM Transactions on Information Systems10.1145/315867136:3(1-31)Online publication date: 2-Feb-2018
    • (2018)Tracking the history and evolution of entities: entity-centric temporal analysis of large social media archivesInternational Journal on Digital Libraries10.1007/s00799-018-0257-721:1(5-17)Online publication date: 26-Oct-2018
    • (2017)Discovering Typical Histories of Entities by Multi-Timeline SummarizationProceedings of the 28th ACM Conference on Hypertext and Social Media10.1145/3078714.3078725(105-114)Online publication date: 4-Jul-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media