skip to main content
article

Data streaming algorithms for estimating entropy of network traffic

Published:26 June 2006Publication History
Skip Abstract Section

Abstract

Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.

References

  1. A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estimating entropy and entropy norm on data streams. In Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of ACM Symposium on Theory of Computing (STOC), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Barford, J. Kline, D. Plonka, and A. Ron. A Signal Analysis of Network Traffic Anomalies. In Proceedings of ACM SIGCOMM Internet Measurement Workshop (IMW), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. D. Brutlag. Aberrant behavior detection in time series for network monitoring. In Proceedings of USENIX Large Installation System Administration Conference (LISA), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics. In Proceedings of ACM SIGCOMM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Estan and G. Varghese. New directions in traffic measurement and accounting. In Proceedings of ACM SIGCOMM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. Statistical approaches to DDoS attack detection and response. In Proceedings of the DARPA Information Survivability Conference and Exposition, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Guha, A. McGregor, and S. Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of ACM Symposium on Discrete Algorithms (SODA), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Hao, M. Kodialam, and T. V. Lakshman. ACCEL-RATE: a faster mechanism for memory efficient per-flow traffic estimation. In Proceedings of ACM SIGMETRICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Hohn and D. Veitch. Inverting sampled traffic. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics, 5(4):545--557, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Karamcheti, D. Geiger, Z. Kedem, and S. Muthukrishnan. Detecting malicious network traffic using inverse distributions of packet contents. In Proceedings of ACM SIGCOMM Workshop on Mining Network Data (MineNet), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kumar, M. Sung, J. Xu, and J. Wang. Data streaming algorithms for efficient and accurate estimation of flow distribution. In Proceedings of ACM SIGMETRICS/IFIP WG 7.3 Performance, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kumar, M. Sung, J. Xu, and E. Zegura. A data streaming algorithm for estimating subpopulation flow size distribution. In Proceedings of ACM SIGMETRICS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Kushilevitz and N. Nisan. Communication complexity. Cambridge University Press, New York, NY, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In Proceedings of ACM SIGCOMM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In Proceedings of ACM SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Levchenko, R. Paturi, and G. Varghese. On the difficulty of scalably detecting network attacks. In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Muthukrishnan. Data streams: algorithms and applications. http://athos.rutgers.edu/~muthu/stream-1-1.ps.Google ScholarGoogle Scholar
  20. Cisco Netflow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml.Google ScholarGoogle Scholar
  21. M. Roughan, A. Greenberg, C. Kalmanek, M. Rumsewicz, J. Yates, and Y. Zhang. Experience in measuring internet backbone traffic variability: Models, metrics, measurements and meaning. In Proceedings of International Teletraffic Congress (ITC), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Venkataraman, D. Song, P. B. Gibbons, and A. Blum. New Streaming Algorithms for Fast Detection of Superspreaders . In Proceedings of Network and Distributed System Security Symposium (NDSS), 2005.Google ScholarGoogle Scholar
  23. A. Wagner and B. Plattner. Entropy Based Worm and Anomaly Detection in Fast IP Networks. In Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Xu, Z.-L. Zhang, and S. Bhattacharya. Profiling internet backbone traffic: Behavior models and applications. In Proceedings of ACM SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Zhang, Z. Ge, M. Roughan, and A. Greenberg. Network anomography. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund. Online identification of hierarchical heavy hitters: algorithms, evaluations, and applications. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data streaming algorithms for estimating entropy of network traffic

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGMETRICS Performance Evaluation Review
      ACM SIGMETRICS Performance Evaluation Review  Volume 34, Issue 1
      Performance evaluation review
      June 2006
      388 pages
      ISSN:0163-5999
      DOI:10.1145/1140103
      Issue’s Table of Contents
      • cover image ACM Conferences
        SIGMETRICS '06/Performance '06: Proceedings of the joint international conference on Measurement and modeling of computer systems
        June 2006
        404 pages
        ISBN:1595933190
        DOI:10.1145/1140277

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2006

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader