skip to main content
10.1145/1162678.1162679acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
Article
Free Access

Traffic classification using clustering algorithms

Published:11 September 2006Publication History

ABSTRACT

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

References

  1. P. Cheeseman and J. Strutz. Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining, AAI/MIT Press, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. P. Dempster, N. M. Paird, and D. B. Rubin. Maximum likelihood from incomeplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1): 1--38, 1977.Google ScholarGoogle Scholar
  3. C. Dews, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In IMC'03, Miami Beach, USA, Oct 27--29, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster Analysis and Display of Genome-wide Expression Patterns. Genetics, 95(1): 14863--15868, 1998.Google ScholarGoogle Scholar
  5. M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 96), Portland, USA, 1996.Google ScholarGoogle Scholar
  6. P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. In SIGCOMM'05 MineNet Workshop, Philadelphia, USA, August 22--26, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, USA, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy. Transport Layer Identification of P2P Traffic. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINK: Multilevel Traffic Classification in the Dark. In SIGCOMM'05, Philadelphia, USA, August 21--26, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In PAM 2004, Antibes Juan-les-Pins, France, April 19--20, 2004.Google ScholarGoogle Scholar
  11. A. W. Moore and K. Papagiannaki. Toward the Accurate Identification of Network Applications. In PAM 2005, Boston, USA, March 31-April 1, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. W. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In SIGMETRIC'05, Banff, Canada, June 6--10, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Transactions on Networking, 2(4): 316--336, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic Classification. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Sen, O. Spatscheck, and D. Wang. Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures. In WWW2005, New York, USA, May 17--22, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. H. Witten and E. Frank. (2005) Data Mining: Pratical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Zander, T. Nguyen, and G. Armitage. Automated Traffic Classification and Application Identification using Machine Learning. In LCN'05, Sydney, Australia, Nov 15--17, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Traffic classification using clustering algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data
      September 2006
      66 pages
      ISBN:159593569X
      DOI:10.1145/1162678

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 September 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader