skip to main content
10.1145/2020408.2020483acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Refining causality: who copied from whom?

Published: 21 August 2011 Publication History

Abstract

Inferring causal networks behind observed data is an active area of research with wide applicability to areas such as epidemiology, microbiology and social science. In particular recent research has focused on identifying how information propagates through the Internet. This research has so far only used temporal features of observations, and while reasonable results have been achieved, there is often further information which can be used.
In this paper we show that additional features of the observed data can be used very effectively to improve an existing method. Our particular example is one of inferring an underlying network for how text is reused in the Internet, although the general approach is applicable to other inference methods and information sources.
We develop a method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies. Hence we narrow down the number of ways the node could have acquired the infection. Text reuse is detected using a suffix tree which is also used to identify the substring relations between chunks of reused text. We then use a modification of the NetCover method to infer the underlying network.
Experimental results -- on both synthetic and real life data -- show that using more information than just timing leads to greater accuracy in the inferred networks.

References

[1]
Spinn3r API. http://www.spinn3r.com/, 2008.
[2]
Supporting website, https://patterns.enm.bris.ac.uk/projects/refining_causality, 2011.
[3]
E. Adar and L. Adamic. Tracking information epidemics in blogspace. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pages 207--214. Citeseer, 2005.
[4]
B. Bollobás, C. Borgs, J. Chayes, and O. Riordan. Directed scale-free graphs. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pages 132--139. Society for Industrial and Applied Mathematics, 2003.
[5]
V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of operations research, 4(3):233--235, 1979.
[6]
P. Erdös and A. Rényi. On the evolution of random graphs. In Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pages 17--61. 1960.
[7]
I. Flaounas, M. Turchi, T. De Bie, and N. Cristianini. Inference and validation of networks. In Machine Learning and Knowledge Discovery in Databases, volume 5781 of Lecture Notes in Computer Science, pages 344--358. Springer Berlin / Heidelberg, 2009.
[8]
N. Fyson, T. De Bie, and N. Cristianini. Reconstruction of Causal Networks by Set Covering. In ICANNGA'11: International Conference on Adaptive and Natural Computing Algorithms, Ljubljana, Slovenia, 2011.
[9]
M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Influence. In KDD2010: The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
[10]
D. Gusfield. Algorithms on strings, trees and sequences. Cambridge University Press, 1997.
[11]
J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the Dynamics of the News Cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497--506. ACM New York, NY, USA, 2009.
[12]
E. M. McCreight. A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM, 23(2):262--272, Apr. 1976.
[13]
B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization. SIAM Journal on Computing, 17(6):1253--1262, 1988.
[14]
P. Slavík. Improved performance of the greedy algorithm for partial cover. Information Processing Letters, 64(5):251--254, Dec. 1997.
[15]
T. Snowsill and F. Nicart. A hard-disk based suffix tree implementation. Technical Report 133088, University of Bristol, Bristol, UK, 2011.
[16]
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
[17]
P. Weiner. Linear pattern matching algorithms. In IEEE Conference Record of 14th Annual Symposium on Switching and Automata Theory, 1973. SWAT'08, pages 1--11, 1973.

Cited By

View all
  • (2020)Recommender System-Based Diffusion Inferring for Open Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29501397:1(24-34)Online publication date: Feb-2020
  • (2019)Learning Clusters through Information DiffusionThe World Wide Web Conference10.1145/3308558.3313560(3151-3157)Online publication date: 13-May-2019
  • (2018)Who Spread to Whom? Inferring Online Social Networks with User Features2018 IEEE International Conference on Communications (ICC)10.1109/ICC.2018.8422246(1-6)Online publication date: May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. network inference
  2. network reconstruction
  3. suffix tree

Qualifiers

  • Research-article

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Recommender System-Based Diffusion Inferring for Open Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29501397:1(24-34)Online publication date: Feb-2020
  • (2019)Learning Clusters through Information DiffusionThe World Wide Web Conference10.1145/3308558.3313560(3151-3157)Online publication date: 13-May-2019
  • (2018)Who Spread to Whom? Inferring Online Social Networks with User Features2018 IEEE International Conference on Communications (ICC)10.1109/ICC.2018.8422246(1-6)Online publication date: May-2018
  • (2018)Complex contagions with timersChaos: An Interdisciplinary Journal of Nonlinear Science10.1063/1.499003828:3Online publication date: 1-Mar-2018
  • (2017)Inferring Traffic Cascading PatternsProceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3139958.3139960(1-10)Online publication date: 7-Nov-2017
  • (2016)A Model-Free Approach to Infer the Diffusion Network from Event CascadeProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983718(1653-1662)Online publication date: 24-Oct-2016
  • (2016)Inferring Dynamic Diffusion Networks in Online MediaACM Transactions on Knowledge Discovery from Data10.1145/288296810:4(1-22)Online publication date: 14-Jun-2016
  • (2016)A Gaussian Bayesian model to identify spatio-temporal causalities for air pollution based on urban big data2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)10.1109/INFCOMW.2016.7562036(3-8)Online publication date: Apr-2016
  • (2015)Clustering Embedded Approaches for Efficient Information Network InferenceData Science and Engineering10.1007/s41019-015-0003-81:1(29-40)Online publication date: 22-Sep-2015
  • (2015)Information diffusion network inferring and pathway tracking基于多维特征融合的信息传播网络推断及传播路径跟踪算法Science China Information Sciences10.1007/s11432-015-5288-858:9(1-15)Online publication date: 1-May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media