Abstract
Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.
- A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estimating entropy and entropy norm on data streams. In Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS), 2006. Google ScholarDigital Library
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of ACM Symposium on Theory of Computing (STOC), 1996. Google ScholarDigital Library
- P. Barford, J. Kline, D. Plonka, and A. Ron. A Signal Analysis of Network Traffic Anomalies. In Proceedings of ACM SIGCOMM Internet Measurement Workshop (IMW), 2002. Google ScholarDigital Library
- J. D. Brutlag. Aberrant behavior detection in time series for network monitoring. In Proceedings of USENIX Large Installation System Administration Conference (LISA), 2000. Google ScholarDigital Library
- N. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics. In Proceedings of ACM SIGCOMM, 2003. Google ScholarDigital Library
- C. Estan and G. Varghese. New directions in traffic measurement and accounting. In Proceedings of ACM SIGCOMM, 2002. Google ScholarDigital Library
- L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. Statistical approaches to DDoS attack detection and response. In Proceedings of the DARPA Information Survivability Conference and Exposition, 2003.Google ScholarCross Ref
- S. Guha, A. McGregor, and S. Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of ACM Symposium on Discrete Algorithms (SODA), 2006. Google ScholarDigital Library
- F. Hao, M. Kodialam, and T. V. Lakshman. ACCEL-RATE: a faster mechanism for memory efficient per-flow traffic estimation. In Proceedings of ACM SIGMETRICS, 2004. Google ScholarDigital Library
- N. Hohn and D. Veitch. Inverting sampled traffic. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2003. Google ScholarDigital Library
- B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics, 5(4):545--557, 1992. Google ScholarDigital Library
- V. Karamcheti, D. Geiger, Z. Kedem, and S. Muthukrishnan. Detecting malicious network traffic using inverse distributions of packet contents. In Proceedings of ACM SIGCOMM Workshop on Mining Network Data (MineNet), 2005. Google ScholarDigital Library
- A. Kumar, M. Sung, J. Xu, and J. Wang. Data streaming algorithms for efficient and accurate estimation of flow distribution. In Proceedings of ACM SIGMETRICS/IFIP WG 7.3 Performance, 2004. Google ScholarDigital Library
- A. Kumar, M. Sung, J. Xu, and E. Zegura. A data streaming algorithm for estimating subpopulation flow size distribution. In Proceedings of ACM SIGMETRICS, 2005. Google ScholarDigital Library
- E. Kushilevitz and N. Nisan. Communication complexity. Cambridge University Press, New York, NY, USA, 1997. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In Proceedings of ACM SIGCOMM, 2004. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In Proceedings of ACM SIGCOMM, 2005. Google ScholarDigital Library
- K. Levchenko, R. Paturi, and G. Varghese. On the difficulty of scalably detecting network attacks. In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2004. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: algorithms and applications. http://athos.rutgers.edu/~muthu/stream-1-1.ps.Google Scholar
- Cisco Netflow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml.Google Scholar
- M. Roughan, A. Greenberg, C. Kalmanek, M. Rumsewicz, J. Yates, and Y. Zhang. Experience in measuring internet backbone traffic variability: Models, metrics, measurements and meaning. In Proceedings of International Teletraffic Congress (ITC), 2003. Google ScholarDigital Library
- S. Venkataraman, D. Song, P. B. Gibbons, and A. Blum. New Streaming Algorithms for Fast Detection of Superspreaders . In Proceedings of Network and Distributed System Security Symposium (NDSS), 2005.Google Scholar
- A. Wagner and B. Plattner. Entropy Based Worm and Anomaly Detection in Fast IP Networks. In Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises, 2005. Google ScholarDigital Library
- K. Xu, Z.-L. Zhang, and S. Bhattacharya. Profiling internet backbone traffic: Behavior models and applications. In Proceedings of ACM SIGCOMM, 2005. Google ScholarDigital Library
- Y. Zhang, Z. Ge, M. Roughan, and A. Greenberg. Network anomography. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2005. Google ScholarDigital Library
- Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund. Online identification of hierarchical heavy hitters: algorithms, evaluations, and applications. In Proceedings of ACM/USENIX Internet Measurement Conference (IMC), 2004. Google ScholarDigital Library
Index Terms
- Data streaming algorithms for estimating entropy of network traffic
Recommendations
Data streaming algorithms for efficient and accurate estimation of flow size distribution
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better ...
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06: Proceedings of the joint international conference on Measurement and modeling of computer systemsUsing entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in ...
A data streaming algorithm for estimating entropies of od flows
IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurementEntropy has recently gained considerable significance as an important metric for network measurement. Previous research has shown its utility in clustering traffic and detecting traffic anomalies. While measuring the entropy of the traffic observed at a ...
Comments