skip to main content
10.1145/2736277.2741677acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Beyond Models: Forecasting Complex Network Processes Directly from Data

Published: 18 May 2015 Publication History

Abstract

Complex network phenomena -- such as information cascades in online social networks -- are hard to fully observe, model, and forecast. In forecasting, a recent trend has been to forgo the use of parsimonious models in favor of models with increasingly large degrees of freedom that are trained to learn the behavior of a process from historical data. Extrapolating this trend into the future, eventually we would renounce models all together. But is it possible to forecast the evolution of a complex stochastic process directly from the data without a model? In this work we show that model-free forecasting is possible. We present SED, an algorithm that forecasts process statistics based on relationships of statistical equivalence using two general axioms and historical data. To the best of our knowledge, SED is the first method that can perform axiomatic, model-free forecasts of complex stochastic processes. Our simulations using simple and complex evolving processes and tests performed on a large real-world dataset show promising results.

References

[1]
Lada A Adamic. Zipf, power-laws, and pareto-a ranking tutorial. Xerox Palo Alto Research Center, 2000.
[2]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. Group formation in large social networks: membership, growth, and evolution. In Proc. SIGKDD, 2006.
[3]
Eytan Bakshy, Brian Karrer, and Lada A Adamic. Social influence and the diffusion of user-created content. In EC, 2009.
[4]
Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Leskovec. Can cascades be predicted? In Proc. WWW, 2014.
[5]
Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law distributions in empirical data. SIAM review, 51, 2009.
[6]
R. A. Fisher. The design of experiments. Oliver & Boyd, 1935.
[7]
W Galuba, K Aberer, and D Chakraborty. Outtweeting the twitterers-predicting information cascades in microblogs. In Proc. WOSN, 2010.
[8]
Michaela Goetz, Jure Leskovec, Mary McGlohon, and Christos Faloutsos. Modeling blog dynamics. In ICWSM, 2009.
[9]
Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learning influence probabilities in social networks. In Proc. WSDM, 2010.
[10]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. JMLR, 13(1):723--773, March 2012.
[11]
Liangjie Hong, Ovidiu Dan, and Brian D Davison. Predicting popular messages in twitter. In Proc. WWW, 2011.
[12]
Ting-Kai Huang, Bruno Ribeiro, Harsha V Madhyastha, and Michalis Faloutsos. The socio-monetary incentives of online social network malware campaigns. In Proc. COSN, pages 259--270. ACM, 2014.
[13]
Maximilian Jenders, Gjergji Kasneci, and Felix Naumann. Analyzing and predicting viral tweets. In Proc. WWW, 2013.
[14]
Eamonn Keogh, Stefano Lonardi, and CA Ratanamahatana. Towards parameter-free data mining. Proc. SIGKDD, pages 206--215, 2004.
[15]
A. N. Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Co., 1950.
[16]
N H Kuiper. Tests concerning random points on a circle. Proceedings of the Koninklijke Nederlandse Akademie, 1960.
[17]
Andrey Kupavskii, Liudmila Ostroumova, Alexey Umnov, Svyatoslav Usachev, Pavel Serdyukov, Gleb Gusev, and Andrey Kustarev. Prediction of retweet cascade size over time. In CIKM, 2012.
[18]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In Proc. WWW, 2010.
[19]
Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proc. SIGKDD, 2009.
[20]
Zongyang Ma, Aixin Sun, and Gao Cong. On predicting the popularity of newly emerging hashtags in twitter. JASIST, 64, 2013.
[21]
Yasuko Matsubara, Yasushi Sakurai, B Aditya Prakash, Lei Li, and Christos Faloutsos. Rise and fall patterns of information diffusion: model and implications. In Proc. SIGKDD, pages 6--14, 2012.
[22]
Fabricio Murai, Bruno Ribeiro, Don Towsley, and Krista Gile. Characterizing Branching Processes from Sampled Data. In Proc. WWW Companion, pages 805--811, 2013.
[23]
Fabricio Murai, Bruno Ribeiro, Don Towsley, and Pinghui Wang. On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling. IEEE Journal on Selected Areas in Communications, 31(6):1017--1025, 2013.
[24]
Seth A Myers, Chenguang Zhu, and Jure Leskovec. Information diffusion and external influence in networks. In Proc. SIGKDD, 2012.
[25]
Anis Najar, Ludovic Denoyer, and Patrick Gallinari. Predicting information diffusion on social networks with partial knowledge. In Proc. WWW Companion, pages 1197--1204. ACM, 2012.
[26]
Charles T Perretti, Stephan B Munch, and George Sugihara. Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. PNAS, 110(13):5253--7, March 2013.
[27]
Daniel Ramage, Susan T Dumais, and Daniel J Liebling. Characterizing microblogs with topic models. ICWSM, 10, 2010.
[28]
Sidney I Resnick. Extreme values, regular variation, and point processes. Springer Science & Business Media, 2007.
[29]
Bruno Ribeiro, Minh X. Hoang, and Ambuj K. Singh. Beyond models: Forecasting complex network processes directly from data. http://www.cs.cmu.edu/ribeiro/pdf/Ribeiro_etal_BeyondTR15.pdf.
[30]
Daniel M Romero, Chenhao Tan, and Johan Ugander. On the interplay between social and topical structure. In ICWSM, 2013.
[31]
Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343--348, 1967.
[32]
M A Stephens. Use of the Kolmogorov-Smirnov, Cramer--Von-Mises and related statistics without extensive tables. J. of the Royal Statistical Society Series B, 32, 1970.
[33]
Gabor Szabo and Bernardo A Huberman. Predicting the popularity of online content. Communications of the ACM, 53, 2010.
[34]
Oren Tsur and Ari Rappoport. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In Proc. WSDM, 2012.
[35]
Feng Wang, Haiyan Wang, and Kuai Xu. Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks. WINE, cs.SI(June), August 2011.
[36]
James R. Wilson. The inspection paradox in renewal-reward processes. Operations Research Letters, 2(1):27--30, April 1983.
[37]
Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter, 12(1):40, November 2010.
[38]
Jaewon Yang and Jure Leskovec. Modeling information diffusion in implicit networks. In ICDM, 2010.
[39]
Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. In Proc.\ WSDM, 2011.
[40]
Ron Zass and Amnon Shashua. Probabilistic graph and hypergraph matching. In CVPR, pages 1--8. IEEE, 2008.

Cited By

View all
  • (2017)Detecting Large Reshare Cascades in Social NetworksProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052718(597-605)Online publication date: 3-Apr-2017
  • (2015)Challenges of Forecasting and Measuring a Complex Networked WorldProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2744720(1067-1067)Online publication date: 18-May-2015

Index Terms

  1. Beyond Models: Forecasting Complex Network Processes Directly from Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1460 pages
    ISBN:9781450334693

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cascade forecast
    2. model-free forecasting

    Qualifiers

    • Research-article

    Funding Sources

    • Army Research Office
    • National Science Foundation

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Detecting Large Reshare Cascades in Social NetworksProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052718(597-605)Online publication date: 3-Apr-2017
    • (2015)Challenges of Forecasting and Measuring a Complex Networked WorldProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2744720(1067-1067)Online publication date: 18-May-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media