research-article

PageRank in malware categorization

Authors:
BooJoong Kang

Queen's University Belfast, Belfast, Northern Ireland, United Kindom

Queen's University Belfast, Belfast, Northern Ireland, United Kindom
View Profile

,
Suleiman Yerima

Queen's University Belfast, Belfast, Northern Ireland, United Kindom

Queen's University Belfast, Belfast, Northern Ireland, United Kindom
View Profile

,
Kieran McLaughlin

Queen's University Belfast, Belfast, Northern Ireland, United Kindom

Queen's University Belfast, Belfast, Northern Ireland, United Kindom
View Profile

,
Sakir Sezer

Queen's University Belfast, Belfast, Northern Ireland, United Kindom

Queen's University Belfast, Belfast, Northern Ireland, United Kindom
View Profile

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systemsOctober 2015Pages 291–295https://doi.org/10.1145/2811411.2811514

Published:09 October 2015Publication History

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Pages 291–295

ABSTRACT

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

References

Ye, Y., Li, T., Chen, Y., and Jiang, Q. 2010. Automatic malware categorization using cluster ensemble. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC., USA, 95--104. Google ScholarDigital Library
Page, L., Brin, S., Motwani, R. and Winograd, T. 1998. The PageRank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab.Google Scholar
Kumar, G., Duhan, N., and Sharma, A. K. 2011. Page ranking based on number of visits of links of web page. In Proceedings of the 2nd International Conference on Computer and Communication Technology, Allahabad, India, 11--14.Google Scholar
Bilar, D. 2007. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 11(2), 156--168. Google ScholarDigital Library
Rad, B. B., and Masrom, M. 2010. Metamorphic virus variants classification using opcode frequency histogram. In Proceedings of the 14th WSEAS International Conference on COMPUTERS, Greece, 147--155. Google ScholarDigital Library
Santamarta, R. 2006. Generic detection and classification of polymorphic malware using neural pattern recognition. http://www.reversemode.com.Google Scholar
Kang, B., Han, K. S., Kang, B., and Im, E. G. 2014. Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering. Digital Investigation, 11(4), 323--335.Google ScholarDigital Library
Abou-Assaleh, T., Cercone, N., Keselj, V., and Sweidan, R. 2004. Detection of new malicious code using n-grams signatures. PST, 193--196.Google Scholar
Kolter, J. and Maloof, M. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 470--478. Google ScholarDigital Library
Reddy, K. and Pujari, A. 2006. N-gram analysis for computer virus detection. Journal in Computer Virology, 2, 231--239.Google ScholarCross Ref
Santos, I., Brezo, F., Ugrate-Pedrero, X., and Bringas, P. G. 2011. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Science, 231, 64--82. Google ScholarDigital Library
Gao, D., Reiter, M. K., and Song, D. 2008. BinHunt: automatically finding semantic differences in binary programs. Information and Communications Security, Lecture Notes in Computer Science, 5308, 238--255. Google ScholarDigital Library
Cesare, S. and Xiang, Y. 2010. Classification of malware using structured control flow. In Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing, 108, 61--70. Google ScholarDigital Library
Briones, I. and Gomez, A. 2008. Graphs, entropy and grid computing: automatic comparison of malware. In Proceedings of the 2008 Virus Bulletin Conference, 1--12.Google Scholar
Chae, D., Ha, J., Kim, S., Kang, B., Im, E. G. 2013. Software plagiarism detection: a graph-based approach., In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, New York, USA, 1577--1580. Google ScholarDigital Library
VxHeaven http://vxheaven.orgGoogle Scholar
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, W. J., and Hazelwood, K. 2005. Pin: building customized program analysis tool with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 190--200. Google ScholarDigital Library
Breiman, L. 2001. Random forests. Machine Learning, 45(1), 5--32. Google ScholarDigital Library
Breiman, L. 1996. Bagging predictors. Machine Learning, 24(2), 123--140. Google ScholarCross Ref
Freund, Y. and Schapire, R. E. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, San Francisco, USA, 148--156.Google Scholar
Webb, G. I. 2000. MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40(2). Google ScholarDigital Library
Hall, M., Frank, E., Holmes, G., Pfahriger,B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: an update. SIGKDD Explorations, 11(1). Google ScholarDigital Library
Han, J. and Kamber, M. 2006. Data mining: concepts and techniques (2nd edition). Morgan Kaufmann Publishers. Google ScholarDigital Library

Index Terms

PageRank in malware categorization
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering

The battle between malware developers and security analysts continues, and the number of malware and malware variants keeps increasing every year. Automated malware generation tools and various detection evasion techniques are also developed every year. ...
Read More
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide Web

Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
Read More
Towards an Automatic Method for API Association Extraction for PE-Malware Categorization
IPAC '15: Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication

Behavior-based malware detection techniques remain one of the most efficient protections against malicious programs. Such techniques are based on constructing models representing malicious and legitimate behaviors by analyzing the set of APIs (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems
October 2015
540 pages
ISBN:9781450337380
DOI:10.1145/2811411
Conference Chairs:
Esmaeil S. Nadimi
University of Southern Denmark, Denmark
,
Tomas Cerny
Czech Technical University, Czech Republic
,
Program Chairs:
Sung-Ryul Kim
Konkuk University, Korea
,
Wei Wang
San Diego State University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic analysis
malware categorization
malware classification
pagerank
Qualifiers
- research-article
Conference

Acceptance Rates
RACS '15 Paper Acceptance Rate75of309submissions,24%Overall Acceptance Rate393of1,581submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 192
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PageRank in malware categorization

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering

Beyond PageRank: machine learning for static ranking

Towards an Automatic Method for API Association Extraction for PE-Malware Categorization