skip to main content
10.1145/2811411.2811514acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

PageRank in malware categorization

Authors Info & Claims
Published:09 October 2015Publication History

ABSTRACT

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

References

  1. Ye, Y., Li, T., Chen, Y., and Jiang, Q. 2010. Automatic malware categorization using cluster ensemble. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC., USA, 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Page, L., Brin, S., Motwani, R. and Winograd, T. 1998. The PageRank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab.Google ScholarGoogle Scholar
  3. Kumar, G., Duhan, N., and Sharma, A. K. 2011. Page ranking based on number of visits of links of web page. In Proceedings of the 2nd International Conference on Computer and Communication Technology, Allahabad, India, 11--14.Google ScholarGoogle Scholar
  4. Bilar, D. 2007. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 11(2), 156--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rad, B. B., and Masrom, M. 2010. Metamorphic virus variants classification using opcode frequency histogram. In Proceedings of the 14th WSEAS International Conference on COMPUTERS, Greece, 147--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Santamarta, R. 2006. Generic detection and classification of polymorphic malware using neural pattern recognition. http://www.reversemode.com.Google ScholarGoogle Scholar
  7. Kang, B., Han, K. S., Kang, B., and Im, E. G. 2014. Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering. Digital Investigation, 11(4), 323--335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Abou-Assaleh, T., Cercone, N., Keselj, V., and Sweidan, R. 2004. Detection of new malicious code using n-grams signatures. PST, 193--196.Google ScholarGoogle Scholar
  9. Kolter, J. and Maloof, M. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 470--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Reddy, K. and Pujari, A. 2006. N-gram analysis for computer virus detection. Journal in Computer Virology, 2, 231--239.Google ScholarGoogle ScholarCross RefCross Ref
  11. Santos, I., Brezo, F., Ugrate-Pedrero, X., and Bringas, P. G. 2011. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Science, 231, 64--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gao, D., Reiter, M. K., and Song, D. 2008. BinHunt: automatically finding semantic differences in binary programs. Information and Communications Security, Lecture Notes in Computer Science, 5308, 238--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cesare, S. and Xiang, Y. 2010. Classification of malware using structured control flow. In Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing, 108, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Briones, I. and Gomez, A. 2008. Graphs, entropy and grid computing: automatic comparison of malware. In Proceedings of the 2008 Virus Bulletin Conference, 1--12.Google ScholarGoogle Scholar
  15. Chae, D., Ha, J., Kim, S., Kang, B., Im, E. G. 2013. Software plagiarism detection: a graph-based approach., In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, New York, USA, 1577--1580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. VxHeaven http://vxheaven.orgGoogle ScholarGoogle Scholar
  17. Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, W. J., and Hazelwood, K. 2005. Pin: building customized program analysis tool with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Breiman, L. 2001. Random forests. Machine Learning, 45(1), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Breiman, L. 1996. Bagging predictors. Machine Learning, 24(2), 123--140. Google ScholarGoogle ScholarCross RefCross Ref
  20. Freund, Y. and Schapire, R. E. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, San Francisco, USA, 148--156.Google ScholarGoogle Scholar
  21. Webb, G. I. 2000. MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40(2). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hall, M., Frank, E., Holmes, G., Pfahriger,B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: an update. SIGKDD Explorations, 11(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Han, J. and Kamber, M. 2006. Data mining: concepts and techniques (2nd edition). Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PageRank in malware categorization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems
      October 2015
      540 pages
      ISBN:9781450337380
      DOI:10.1145/2811411

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      RACS '15 Paper Acceptance Rate75of309submissions,24%Overall Acceptance Rate393of1,581submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader