skip to main content
10.1145/1028788.1028800acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
Article

A pragmatic approach to dealing with high-variability in network measurements

Published:25 October 2004Publication History

ABSTRACT

The Internet is teeming with high variability phenomena, from measured IP flow sizes to aspects of inferred router-level connectivity, but there still exists considerable debate about how best to deal with this encountered high variability and model it. While one popular approach favors modeling highly variable event sizes with conventional, finite variance distributions such as lognormal or Weibull distributions, Mandelbrot has argued for the last 40 years that there are compelling mathematical, statistical, and practical reasons for why infinite variance distributions are natural candidates for capturing the essence behind high variability phenomena. In this paper, we elaborate on Mandelbrot's arguments and present a methodology that often allows for a clear distinction between the two approaches. In particular, by requiring the resulting models to be resilient to ambiguities (i.e., robust to real-world deficiencies in the underlying network measurements) and internally self-consistent (i.e., insensitive with respect the duration, location, or time of the data collection), we provide a rigorous framework for a qualitative assessment of the observed high variability. We apply the proposed framework to assess previously reported findings about measured Internet traffic and inferred router- and AS-level connectivity. In the process, we also discuss what our approach has to say about recent discussions concerning network traffic being Poisson or self-similar and router-level or AS-level connectivity graphs of the Internet being scale-free or not.

References

  1. R. Albert, H. Jeong, and A.-L. Barabási. Attack and error tolerance of complex networks, Nature 406, pp. 378--382, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Alderson. Technological and Economic Drivers and Constraints in the Internet's "Last Mile", Technical Report CIT-CDS-04-004, California Institute of Technology, 2004.Google ScholarGoogle Scholar
  3. A. Bookstein Informetric distributions, Part II: Resilience to ambiguity Journal of the Amer. Soc. for Information Science, Vol. 41, No. 5, pp. 376--386, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web, in: Proc. of 9th Intl. WWW Conference, pp. 309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward Poisson and independent as the load increases. in: Nonlinear Estimation and Classification, C. Holmes, D. Denison, M. Hansen, B. Yu, and B. Mallick (Eds.), Springer-Verlag, New York, 2002.Google ScholarGoogle Scholar
  6. H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger Towards Capturing Representative AS-Level Internet Topologies Proc. of ACM SIGMETRICS '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger. The Origin of power laws in Internet topologies revisited. Proc. IEEE INFOCOM '02, New York, NY, 2002.Google ScholarGoogle Scholar
  8. K. Claffy, H.-W. Braun, and G. Polyzos. A parameterizable methodology for Internet traffic flow profiling, IEEE Journal on Selected Areas in Communications 13, pp. 1481--1494, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cooperative Association for Internet Data Analysis (CAIDA), Skitter. Available from www.caida.org/tools/measurement/skitter/.Google ScholarGoogle Scholar
  10. A. B. Downey. The structural cause for file size distribution. www.allendowney.com/research/filesize/cds-tr25 2000.ps.gz, 2000.Google ScholarGoogle Scholar
  11. A. B. Downey. Evidence for long-tailed distributions in the Internet. Proc. 1st ACM SIGCOMM Internet Measurement Workshop '01, San Francisco, CA, pp. 229--241, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. B. Downey. Lognormal and Pareto distributions in the Internet. www.allendowney.com/research/longtail/downey03lognormal.pdf, 2003.Google ScholarGoogle Scholar
  13. D. Draper, J. S. Hodges, C. L. Mallows and D. Pregibon. Exchangeability and data analysis, J. R. Statist. Soc. A 156, pp. 9--37, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  14. N. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics, Proc. ACM SIGCOMM '03, Karlruhe, Germany, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. Proc. ACM SIGCOMM '99, Cambridge, MA, pp. 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Feldmann, Characteristics of TCP connection arrivals, in: Self-Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger (Eds.), J. Wiley & Sons, New York, pp. 367--400, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  17. W. Feller. An Introduction to Probability Theory and its Applications, Volume II, Second Edition. Wiley & Sons, New York, 1966.Google ScholarGoogle Scholar
  18. D.R. Figueiredo, A. Feldmann, B. Liu, V. Misra, D. Towsley, and W. Willinger. On TCP and self-similar traffic. Performance Evaluation, 2004 (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Floyd and V. Paxson. Difficulties in simulating the Internet. IEEE/ACM Trans. Networking, Vol. 9, pp. 392--403, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. M. Goldie and C. Klüppelberg. Subexponential distributions. in: A Practical Guide to Heavy Tails, R. J. Adler, R. E. Feldman, and M. S. Taqqu (Eds.), Birkhäuser, Boston, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Govindan and H. Tangmunarunkit. Heuristics for Internet Map Discovery, Proceeding of IEEE INFOCOM '00, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  22. N.L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, Volume1, Second Edition J. Wiley & Sons, New York, 1994.Google ScholarGoogle Scholar
  23. T. Karagiannis, M. Faloutsos, M. Molle, and A. Broido. A nonstationary Poisson view of Internet traffic. Proc. IEEE INFOCOM '04, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  24. T. G. Kurtz. Limit Theorems for Workload Input Models, in: Stochastic Networks: Theory and Applications, F.P. Kelly, S. Zachary, and I. Ziedins (Eds.), pp. 119--140, Oxford University Press, 1996.Google ScholarGoogle Scholar
  25. A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. Sampling biases in IP topology measurements. Proc. IEEE INFOCOM '03, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  26. L. Li, D. Alderson, W. Willinger, and J. Doyle. A first-principles approach to understanding the Internet's router-level topology. Proc. ACM SIGCOMM '04, Portland, Oregon, August 30 - September 2, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. B. Mandelbrot. New methods in statistical economics, Journal of Political Economics 71, pp. 421--440, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  28. B. B. Mandelbrot. Fractals and Scaling in Finance. Springer-Verlag, New York, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, Vol. 1, No. 2, pp. 226--251, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. P. Newman, T. Lyons, and G. Minshall. Flow labelled IP: A connectionless approach to ATM, Proc. IEEE INFOCOM '96, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J.-J. Pansiot and D. Grad. On routers and multicast trees in the Internet. ACM Computer Communication Review, 28(1):41-50, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Passive Measurement and Analysis (PMA) Project, http://pma.nlanr.net/Google ScholarGoogle Scholar
  33. V. Paxson and S. Floyd, Wide-area traffic: The failure of Poisson modeling, IEEE/ACM Trans. on Networking 3, pp. 226--244, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. I. Resnick. Heavy tail modeling and teletraffic data. Annals of Statistics 25, pp. 1805--1869, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  35. Route Views, University of Oregon Route Views Project, Available at http://www.antc.uoregon.edu/route-views/.Google ScholarGoogle Scholar
  36. N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with Rocketfuel. Proc. ACM SIGCOMM '02, Pittsburgh, PA, pp. 133--145, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York, 1994.Google ScholarGoogle Scholar
  38. J. W. Stewart III. BGP4: Inter-Domain ROuting in the Internet. Addison-Wesley, Boston, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M.S. Taqqu, W. Willinger, and R. Sherman. Proof of a fundamental result in self-similar traffic modeling, Computer Communication Review 27, pp. 5--23, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Teixeira, K. Marzullo, S. Savage, and G.M. Voelker. In Search of Path Diversity in ISP Networks, In Proc. IMC'03, Miami Beach, Florida. October 27-29, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. W. Tukey. Data analysis and behavioral science or learning to bear the quantitative's man burden by shunning badmandments, in: The Collected Works of John W. Tukey, L. W. Jones (Ed.), Vol. III, Wadsworth, Monterey, CA, 1986.Google ScholarGoogle Scholar
  42. A. Veres and M. Boda. The chaotic nature of TCP congestion control. Proc. IEEE INFOCOM '00, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  43. W. Willinger, R. Govindan, S. Jamin, V. Paxson and S. Shenker. Scaling phenomena in the Internet: Critically examining criticality. Proc. Nat. Acad. Sci. 99, suppl. 1, pp. 2573--2580, 2002.Google ScholarGoogle Scholar
  44. W. Willinger and V. Paxson, Where Mathematics meets the Internet, Notices of the AMS 45, pp. 961--970, 1998.Google ScholarGoogle Scholar
  45. S.-H. Yook, H. Jeong, and A.-L. Barabási. Modeling the Internet's large-scale topology. Proc. Nat. Acad. Sci. 99, pp. 13382--13386, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  46. X. Zhu, J. Yu, and J. C. Doyle. Heavy tails, generalized coding, and optimal Web layout. Proc. IEEE INFOCOM '01, 2001.Google ScholarGoogle Scholar

Index Terms

  1. A pragmatic approach to dealing with high-variability in network measurements

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader