ABSTRACT
Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.
- P. Cheeseman and J. Strutz. Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining, AAI/MIT Press, USA, 1996. Google ScholarDigital Library
- A. P. Dempster, N. M. Paird, and D. B. Rubin. Maximum likelihood from incomeplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1): 1--38, 1977.Google Scholar
- C. Dews, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In IMC'03, Miami Beach, USA, Oct 27--29, 2003. Google ScholarDigital Library
- M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster Analysis and Display of Genome-wide Expression Patterns. Genetics, 95(1): 14863--15868, 1998.Google Scholar
- M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 96), Portland, USA, 1996.Google Scholar
- P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. In SIGCOMM'05 MineNet Workshop, Philadelphia, USA, August 22--26, 2005. Google ScholarDigital Library
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, USA, 1988. Google ScholarDigital Library
- T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy. Transport Layer Identification of P2P Traffic. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarDigital Library
- T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINK: Multilevel Traffic Classification in the Dark. In SIGCOMM'05, Philadelphia, USA, August 21--26, 2005. Google ScholarDigital Library
- A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In PAM 2004, Antibes Juan-les-Pins, France, April 19--20, 2004.Google Scholar
- A. W. Moore and K. Papagiannaki. Toward the Accurate Identification of Network Applications. In PAM 2005, Boston, USA, March 31-April 1, 2005. Google ScholarDigital Library
- A. W. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In SIGMETRIC'05, Banff, Canada, June 6--10, 2005. Google ScholarDigital Library
- V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Transactions on Networking, 2(4): 316--336, August 1998. Google ScholarDigital Library
- M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic Classification. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarDigital Library
- S. Sen, O. Spatscheck, and D. Wang. Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures. In WWW2005, New York, USA, May 17--22, 2004. Google ScholarDigital Library
- I. H. Witten and E. Frank. (2005) Data Mining: Pratical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarDigital Library
- S. Zander, T. Nguyen, and G. Armitage. Automated Traffic Classification and Application Identification using Machine Learning. In LCN'05, Sydney, Australia, Nov 15--17, 2005. Google ScholarDigital Library
Index Terms
- Traffic classification using clustering algorithms
Recommendations
Application of Clustering Algorithms in Ip Traffic Classification
GCIS '09: Proceedings of the 2009 WRI Global Congress on Intelligent Systems - Volume 02Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer(P2P) applications using dynamic port numbers, nat techniques,and encryption to avoid detection. An alternative ...
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms
This paper deals with the problem of clustering a data set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K-means algorithm, and the recently proposed Principal Direction ...
Clustering by competitive agglomeration
We present a new clustering algorithm called Competitive Agglomeration (CA), which minimizes an objective function that incorporates the advantages of both hierarchical and partitional clustering. The CA algorithm produces a sequence of partitions with ...
Comments