skip to main content
research-article
Public Access

Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware

Authors Info & Claims
Published:12 January 2018Publication History
Skip Abstract Section

Abstract

The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.

To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.

References

  1. Android Trojan Looks, Acts Like Windows Malware. Retrieved from http://www.snoopwall.com/android-trojan-looks-acts-like-windows-malware/.Google ScholarGoogle Scholar
  2. Bitcoin-mining malware reportedly found on Google Play. Retrieved from http://www.cnet.com/news/bitcoin-mining-malware-reportedly-discovered-at-google-play/.Google ScholarGoogle Scholar
  3. Cisco 2014 Annual Security Report. Retrieved from http://www.cisco.com/web/offers/lp/2014-annual-security-report/index.html.Google ScholarGoogle Scholar
  4. RevealDroid. Retrieved from http://tiny.cc/revealdroid.Google ScholarGoogle Scholar
  5. Server-side polymorphic android applications. Retrieved from http://www.symantec.com/connect/blogs/server-side-polymorphic-android-applications.Google ScholarGoogle Scholar
  6. The Drebin Dataset. Retrieved from http://user.informatik.uni-goettingen.de/darp/drebin/.Google ScholarGoogle Scholar
  7. THREAT DESCRIPTION TROJAN:ANDROID/OLDBOOT.A. Retrieved from https://www.f-secure.com/v-descs/trojan_android_old boot_a.shtml.Google ScholarGoogle Scholar
  8. VirusShare.com. Retrieved from http://www.virusshare.com/.Google ScholarGoogle Scholar
  9. VirusTotal. Retrieved from https://www.virustotal.com/.Google ScholarGoogle Scholar
  10. 2015. Quick Heal Annual Threat Report 2015. Retrieved from http://www.quickheal.co.in/resources/threat-reports. (January 2015).Google ScholarGoogle Scholar
  11. 2017. 1.5. Stochastic Gradient Descent—scikit-learn 0.18.2 documentation. Retrieved from http://scikit-learn.org/stable/modules/sgd.html. (2017).Google ScholarGoogle Scholar
  12. Moutaz Alazab, Veelasha Monsamy, Lynn Batten, Patrik Lantz, and Ronghua Tian. 2012. Analysis of malicious and benign android applications. In Proceedings of the 2012 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW’12). IEEE, 608--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2015. Are Your Training Datasets Yet Relevant? Springer International Publishing, Cham, 51--67.Google ScholarGoogle Scholar
  14. Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 468--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mohamed Aly. 2005. Survey on multiclass classification methods. Neur. Netw. (2005), 1--9.Google ScholarGoogle Scholar
  16. Axelle Apvrille and Ruchna Nigam. 2014. Obfuscation in Android malware, and how to fight back.Virus Bull. (2014).Google ScholarGoogle Scholar
  17. Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceeedings of Network and Distributed System Security Symposium (NDSS’14).Google ScholarGoogle ScholarCross RefCross Ref
  18. Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alexandre Bartel, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2012. Dexpler: Converting android dalvik bytecode to jimple for static analysis with soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis. ACM, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google ScholarGoogle Scholar
  21. Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. 409--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Saurabh Chakradeo, Bradley Reaves, Patrick Traynor, and William Enck. 2013. MAST: Triage for market-scale mobile malware analysis. In Proceedings of the 6th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec’13). ACM, New York, NY, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15). USENIX Association, Washington, DC, 659--674. http://blogs.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/chen-kai. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Santanu Kumar Dash, Guillermo Suarez-Tangil, Salahuddin Khan, Kimberly Tam, Mansour Ahmadi, Johannes Kinder, and Lorenzo Cavallaro. 2016. Droidscribe: Classifying android malware based on runtime behavior. In Proceedings of the 2016 IEEE Security and Privacy Workshops (SPW’16). IEEE, 252--261.Google ScholarGoogle ScholarCross RefCross Ref
  25. William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. 2014. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. 32, 2 (2014), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. William Enck, Machigar Ongtang, and Patrick McDaniel. 2009. On lightweight mobile phone application certification. In Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM, 235--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of android malware through static analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). ACM, New York, NY, 576--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2016. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware. Technical Report UCI-ISR-16-2. Institute for Software Research, Irvine, California.Google ScholarGoogle Scholar
  29. Joshua Garcia, Mahmoud Hammad, Bahman Pedrood, Ali Bagheri-Khaligh, and Sam Malek. 2015. Obfuscation-Resilient, Efficient, and Accurate Detection and Family Identification of Android Malware. Technical Report GMU-CS-TR-2015-10. Department of CS, George Mason University, Fairfax, VA.Google ScholarGoogle Scholar
  30. Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. 2013. Structural detection of android malware using embedded call graphs. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec’13). ACM, New York, NY, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1025--1035. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. 2012. Riskranker: Scalable and accurate zero-day android malware detection. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services. ACM, 281--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, (Mar.2003), 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang. 2014. AsDroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1036--1046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449. Google ScholarGoogle ScholarCross RefCross Ref
  36. Joseph Chan Joo Keng, Tan Kiat Wee, Lingxiao Jiang, and Rajesh Krishna Balan. 2013. The case for mobile forensics of private data leaks: Towards large-scale user-oriented privacy protection. In Proceedings of the 4th Asia-Pacific Workshop on Systems. ACM, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel Mehta, and Riley Hassell. 2004. The shellcoder’s handbook. Edycja polska. Helion, Gliwice (2004).Google ScholarGoogle Scholar
  38. Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. 2015. IccTA: Detecting inter-component privacy leaks in android apps. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE’15). IEEE Press, Piscataway, NJ, 280--291. http://dl.acm.org/citation.cfm?id=2818754.2818791 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Benjamin Livshits, John Whaley, and Monica S. Lam. 2005. In Proceedings of the Programming Languages and Systems: Third Asian Symposium (APLAS’05). Springer Berlin Heidelberg, Berlin, Heidelberg, Chapter Reflection Analysis for Java, 139--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Exploring multiple execution paths for malware analysis. In Proceedings of the IEEE Symposium on Security and Privacy (SP’07). IEEE, 231--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hao Peng, Chris Gates, Bhaskar Sarma, Ninghui Li, Yuan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. 2012. Using probabilistic generative models for ranking risks of android apps. In Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute this! analyzing unsafe and malicious dynamic code loading in android applications. In Proceedings of the 20th Annual Network 8 Distributed System Security Symposium (NDSS’14).Google ScholarGoogle Scholar
  44. Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, and Eric Bodden. 2016. Harvesting runtime values in android applications that feature anti-analysis techniques. In Proceedings of the Network and Distributed System Security Symposium 2016.Google ScholarGoogle ScholarCross RefCross Ref
  45. Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. 2013. Droidchameleon: Evaluating android anti-malware against transformation attacks. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security. ACM, 329--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. V. Rastogi, Yan Chen, and Xuxian Jiang. 2014. Catch me if you can: Evaluating android anti-malware against transformation attacks. IEEE Trans. Inf. Forens. Secur. 9, 1 (Jan. 2014), 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Alessandro Reina, Aristide Fattori, and Lorenzo Cavallaro. 2013. A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors. In Proceedings of the European Workshop on Systems Security (EuroSec’13).Google ScholarGoogle Scholar
  48. Sankardas Roy, Jordan DeLoach, Yuping Li, Nic Herndon, Doina Caragea, Xinming Ou, Venkatesh Prasad Ranganath, Hongmin Li, and Nicolais Guevara. 2015. Experimental study with real-world data for android app security analysis using machine learning. In Proceedings of the 31st Annual Computer Security Applications Conference. ACM, 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 230--253.Google ScholarGoogle ScholarCross RefCross Ref
  50. Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Jorge Blasco. 2014. Dendroid: A text mining approach to analyzing and classifying code structures in android malware families. Expert Syst. Appl. 41, 4 (2014), 1104--1117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015. CopperDroid: Automatic reconstruction of android malware behaviors. In Proceedings of the Symposium on Network and Distributed System Security (NDSS’15).Google ScholarGoogle ScholarCross RefCross Ref
  52. Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot-a java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Fengguo Wei, Sankardas Roy, Xinming Ou, and Robby. 2014. Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14). ACM, New York, NY, 1329--1341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu. 2012. Droidmat: Android malware detection through manifest and API calls tracing. In Proceedings of the 2012 7th Asia Joint Conference on Information Security (Asia JCIS’12). IEEE, 62--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mingyuan Xia, Lu Gong, Yuanhao Lyu, Zhengwei Qi, and Xue Liu. 2015. Effective real-time android application auditing. In Proceedings of the IEEE Symposium on Security and Privacy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Eric P. Xing, Michael I. Jordan, Richard M. Karp, and others. 2001. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, Vol. 1. Citeseer, 601--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. 2015. AppContext: Differentiating malicious and benign mobile app behaviors using context. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE’15), Vol. 1. 303--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zhemin Yang, Min Yang, Yuan Zhang, Guofei Gu, Peng Ning, and X Sean Wang. 2013. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 1043--1054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: Towards obfuscation-resilient mobile application repackaging detection. In Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless 8 Mobile Networks. ACM, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware android malware classification using weighted contextual API dependency graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1105--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML’04). Omnipress, 919--926. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yuan Zhang, Min Yang, Bingquan Xu, Zhemin Yang, Guofei Gu, Peng Ning, X. Sean Wang, and Binyu Zang. 2013. Vetting undesirable behaviors in android apps with permission use analysis. In Proceedings of the 2013 ACM SIGSAC Conference on Computer Communications Security (CCS’13). ACM, New York, NY, 611--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Min Zheng, Patrick P. C. Lee, and John C. S. Lui. 2013. ADAM: An automatic and extensible platform to stress test android anti-virus systems. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 82--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Min Zheng, Mingshen Sun, and John Lui. 2013. Droid analytics: A signature based analytic system to collect, extract, analyze and associate android malware. In Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’13). IEEE, 163--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (SP’12). IEEE, 95--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. 2012. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of Network and Distributed System Security Symposium (NDSS’12).Google ScholarGoogle Scholar

Index Terms

  1. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 26, Issue 3
        July 2017
        111 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3177743
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 January 2018
        • Accepted: 1 October 2017
        • Revised: 1 August 2017
        • Received: 1 June 2016
        Published in tosem Volume 26, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader