skip to main content
10.1145/1835804.1835884acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Connecting the dots between news articles

Published: 25 July 2010 Publication History

Abstract

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture.
In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate.
We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.

Supplementary Material

JPG File (kdd2010_shahaf_cdb_01.jpg)
MOV File (kdd2010_shahaf_cdb_01.mov)

References

[1]
Copernic, http://www.copernic.com.
[2]
Google news timeline, http://newstimeline.googlelabs.com/.
[3]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, 1998.
[4]
R. Choudhary, S. Mehta, A. Bagchi, and R. Balakrishnan. Towards characterization of actor evolution and interactions in news corpora. In Advances in Information Retrieval.
[5]
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD '09, 2009.
[6]
E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In WWW '04, 2004.
[7]
D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in uence through a social network. In KDD '03.
[8]
J. Kleinberg. Authoritative sources in a hyperlinked environment, 1999.
[9]
J. Kleinberg. Bursty and hierarchical structure in streams, 2002.
[10]
D. D. Lewis and K. A. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 33, 1997.
[11]
B. Masand, G. Linoff, and D. Waltz. Classifying news stories using memory based reasoning. In SIGIR '92, 1992.
[12]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD '05, 2005.
[13]
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In CIKM '04, 2004.
[14]
J. Niehaus and R. M. Young. A computational model of inferencing in narrative. In AAAI Spring Symposium '09, 2009.
[15]
J. P. Rowe, S. W. McQuiggan, J. L. Robison, D. R. Marcey, and J. C. Lester. Storyeval: An empirical evaluation framework for narrative generation. In AAAI Spring Symposium '09, 2009.
[16]
S. R. Turner. The creative process: A computer model of storytelling and creativity, 1994.
[17]
C. Yang, X. Shi, and C. Wei. Tracing the event evolution of terror attacks from on-line news. In Intelligence and Security Informatics.
[18]
Y. Yang, T. Ault, T. Pierce, and C. Lattimer. Improving text categorization methods for event tracking. In SIGIR '00, 2000.
[19]
Y. Yang, J. Carbonell, R. Brown, T. Pierce, B. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4), 1999.

Cited By

View all
  • (2024)Supporting the End-User Curation of Cultural Heritage Knowledge GraphsProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675132(35-44)Online publication date: 10-Sep-2024
  • (2024)DifStoryGen: Diffusion-Based Storytelling Algorithm with Distributed Attention2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826071(761-768)Online publication date: 15-Dec-2024
  • (2024)Towards Scalable Topic Detection on Web via Simulating Lévy Walks Nature of Topics in Similarity SpaceInformation Sciences10.1016/j.ins.2024.121544(121544)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coherence
  2. news

Qualifiers

  • Research-article

Conference

KDD '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Supporting the End-User Curation of Cultural Heritage Knowledge GraphsProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675132(35-44)Online publication date: 10-Sep-2024
  • (2024)DifStoryGen: Diffusion-Based Storytelling Algorithm with Distributed Attention2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826071(761-768)Online publication date: 15-Dec-2024
  • (2024)Towards Scalable Topic Detection on Web via Simulating Lévy Walks Nature of Topics in Similarity SpaceInformation Sciences10.1016/j.ins.2024.121544(121544)Online publication date: Oct-2024
  • (2024)Query-attentive video summarization: a comprehensive reviewMultimedia Tools and Applications10.1007/s11042-024-19977-0Online publication date: 6-Aug-2024
  • (2024)CoRBS: a dynamic storytelling algorithm using a novel contextualization approach for documents utilizing BERT featuresKnowledge and Information Systems10.1007/s10115-024-02263-867:2(1213-1248)Online publication date: 14-Oct-2024
  • (2023)FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative ConstructionProceedings of the International Conference on Advances in Social Networks Analysis and Mining10.1145/3625007.3627505(603-610)Online publication date: 6-Nov-2023
  • (2023)A Survey on Event-Based News Narrative ExtractionACM Computing Surveys10.1145/358474155:14s(1-39)Online publication date: 17-Jul-2023
  • (2023)Mixed Multi-Model Semantic Interaction for Graph-based Narrative VisualizationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584076(866-888)Online publication date: 27-Mar-2023
  • (2023)Designing and Evaluating Interfaces that Highlight News Coverage Diversity Using Discord QuestionsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581569(1-21)Online publication date: 19-Apr-2023
  • (2023)Entity graphs for exploring online discourseKnowledge and Information Systems10.1007/s10115-023-01877-865:9(3591-3609)Online publication date: 24-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media