Article

Free Access

Traffic classification using clustering algorithms

Authors:
Jeffrey Erman

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

,
Martin Arlitt

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

,
Anirban Mahanti

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network dataSeptember 2006Pages 281–286https://doi.org/10.1145/1162678.1162679

Published:11 September 2006Publication History

MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data

Pages 281–286

ABSTRACT

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

References

P. Cheeseman and J. Strutz. Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining, AAI/MIT Press, USA, 1996. Google ScholarDigital Library
A. P. Dempster, N. M. Paird, and D. B. Rubin. Maximum likelihood from incomeplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1): 1--38, 1977.Google Scholar
C. Dews, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In IMC'03, Miami Beach, USA, Oct 27--29, 2003. Google ScholarDigital Library
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster Analysis and Display of Genome-wide Expression Patterns. Genetics, 95(1): 14863--15868, 1998.Google Scholar
M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 96), Portland, USA, 1996.Google Scholar
P. Haffner, S. Sen, O. Spatscheck, and D. Wang. ACAS: Automated Construction of Application Signatures. In SIGCOMM'05 MineNet Workshop, Philadelphia, USA, August 22--26, 2005. Google ScholarDigital Library
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, USA, 1988. Google ScholarDigital Library
T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy. Transport Layer Identification of P2P Traffic. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarDigital Library
T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINK: Multilevel Traffic Classification in the Dark. In SIGCOMM'05, Philadelphia, USA, August 21--26, 2005. Google ScholarDigital Library
A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In PAM 2004, Antibes Juan-les-Pins, France, April 19--20, 2004.Google Scholar
A. W. Moore and K. Papagiannaki. Toward the Accurate Identification of Network Applications. In PAM 2005, Boston, USA, March 31-April 1, 2005. Google ScholarDigital Library
A. W. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In SIGMETRIC'05, Banff, Canada, June 6--10, 2005. Google ScholarDigital Library
V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Transactions on Networking, 2(4): 316--336, August 1998. Google ScholarDigital Library
M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic Classification. In IMC'04, Taormina, Italy, October 25--27, 2004. Google ScholarDigital Library
S. Sen, O. Spatscheck, and D. Wang. Accurate, Scalable In-Network Identification of P2P Traffic Using Application Signatures. In WWW2005, New York, USA, May 17--22, 2004. Google ScholarDigital Library
I. H. Witten and E. Frank. (2005) Data Mining: Pratical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarDigital Library
S. Zander, T. Nguyen, and G. Armitage. Automated Traffic Classification and Application Identification using Machine Learning. In LCN'05, Sydney, Australia, Nov 15--17, 2005. Google ScholarDigital Library

Index Terms

Traffic classification using clustering algorithms
1. Computing methodologies
  1. Machine learning

Recommendations

Application of Clustering Algorithms in Ip Traffic Classification
GCIS '09: Proceedings of the 2009 WRI Global Congress on Intelligent Systems - Volume 02

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer(P2P) applications using dynamic port numbers, nat techniques,and encryption to avoid detection. An alternative ...
Read More
A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

This paper deals with the problem of clustering a data set. In particular, the bisecting divisive partitioning approach is here considered. We focus on two algorithms: the celebrated K-means algorithm, and the recently proposed Principal Direction ...
Read More
Clustering by competitive agglomeration

We present a new clustering algorithm called Competitive Agglomeration (CA), which minimizes an objective function that incorporates the advantages of both hierarchical and partitional clustering. The CA algorithm produces a sequence of partitions with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data
September 2006
66 pages
ISBN:159593569X
DOI:10.1145/1162678

Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
machine learning
unsupervised clustering
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 518
  Total Citations
  View Citations
- 5,772
  Total Downloads
- Downloads (Last 12 months)487
- Downloads (Last 6 weeks)68
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Traffic classification using clustering algorithms

MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Application of Clustering Algorithms in Ip Traffic Classification

A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

Clustering by competitive agglomeration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Traffic classification using clustering algorithms

MineNet '06: Proceedings of the 2006 SIGCOMM workshop on Mining network data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Application of Clustering Algorithms in Ip Traffic Classification

A comparative analysis on the bisecting K-means and the PDDP clustering algorithms

Clustering by competitive agglomeration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media