Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware

Authors:
Joshua Garcia

Department of Informatics, University of California, Irvine, CA

Department of Informatics, University of California, Irvine, CA
View Profile

,
Mahmoud Hammad

Department of Informatics, University of California, Irvine, CA

Department of Informatics, University of California, Irvine, CA
View Profile

,
Sam Malek

Department of Informatics, University of California, Irvine, CA

Department of Informatics, University of California, Irvine, CA
View Profile

ACM Transactions on Software Engineering and Methodology Volume 26 Issue 3Article No.: 11pp 1–29https://doi.org/10.1145/3162625

Published:12 January 2018Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.

To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.

References

Android Trojan Looks, Acts Like Windows Malware. Retrieved from http://www.snoopwall.com/android-trojan-looks-acts-like-windows-malware/.Google Scholar
Bitcoin-mining malware reportedly found on Google Play. Retrieved from http://www.cnet.com/news/bitcoin-mining-malware-reportedly-discovered-at-google-play/.Google Scholar
Cisco 2014 Annual Security Report. Retrieved from http://www.cisco.com/web/offers/lp/2014-annual-security-report/index.html.Google Scholar
RevealDroid. Retrieved from http://tiny.cc/revealdroid.Google Scholar
Server-side polymorphic android applications. Retrieved from http://www.symantec.com/connect/blogs/server-side-polymorphic-android-applications.Google Scholar
The Drebin Dataset. Retrieved from http://user.informatik.uni-goettingen.de/darp/drebin/.Google Scholar
THREAT DESCRIPTION TROJAN:ANDROID/OLDBOOT.A. Retrieved from https://www.f-secure.com/v-descs/trojan_android_old boot_a.shtml.Google Scholar
VirusShare.com. Retrieved from http://www.virusshare.com/.Google Scholar
VirusTotal. Retrieved from https://www.virustotal.com/.Google Scholar
2015. Quick Heal Annual Threat Report 2015. Retrieved from http://www.quickheal.co.in/resources/threat-reports. (January 2015).Google Scholar
2017. 1.5. Stochastic Gradient Descent—scikit-learn 0.18.2 documentation. Retrieved from http://scikit-learn.org/stable/modules/sgd.html. (2017).Google Scholar
Moutaz Alazab, Veelasha Monsamy, Lynn Batten, Patrik Lantz, and Ronghua Tian. 2012. Analysis of malicious and benign android applications. In Proceedings of the 2012 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW’12). IEEE, 608--616. Google ScholarDigital Library
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2015. Are Your Training Datasets Yet Relevant? Springer International Publishing, Cham, 51--67.Google Scholar
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 468--471. Google ScholarDigital Library
Mohamed Aly. 2005. Survey on multiclass classification methods. Neur. Netw. (2005), 1--9.Google Scholar
Axelle Apvrille and Ruchna Nigam. 2014. Obfuscation in Android malware, and how to fight back.Virus Bull. (2014).Google Scholar
Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceeedings of Network and Distributed System Security Symposium (NDSS’14).Google ScholarCross Ref
Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarDigital Library
Alexandre Bartel, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2012. Dexpler: Converting android dalvik bytecode to jimple for static analysis with soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis. ACM, 27--38. Google ScholarDigital Library
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. 409--415. Google ScholarDigital Library
Saurabh Chakradeo, Bradley Reaves, Patrick Traynor, and William Enck. 2013. MAST: Triage for market-scale mobile malware analysis. In Proceedings of the 6th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec’13). ACM, New York, NY, 13--24. Google ScholarDigital Library
Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15). USENIX Association, Washington, DC, 659--674. http://blogs.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/chen-kai. Google ScholarDigital Library
Santanu Kumar Dash, Guillermo Suarez-Tangil, Salahuddin Khan, Kimberly Tam, Mansour Ahmadi, Johannes Kinder, and Lorenzo Cavallaro. 2016. Droidscribe: Classifying android malware based on runtime behavior. In Proceedings of the 2016 IEEE Security and Privacy Workshops (SPW’16). IEEE, 252--261.Google ScholarCross Ref
William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. 2014. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. 32, 2 (2014), 5. Google ScholarDigital Library
William Enck, Machigar Ongtang, and Patrick McDaniel. 2009. On lightweight mobile phone application certification. In Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM, 235--245. Google ScholarDigital Library
Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of android malware through static analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). ACM, New York, NY, 576--587. Google ScholarDigital Library
Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2016. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware. Technical Report UCI-ISR-16-2. Institute for Software Research, Irvine, California.Google Scholar
Joshua Garcia, Mahmoud Hammad, Bahman Pedrood, Ali Bagheri-Khaligh, and Sam Malek. 2015. Obfuscation-Resilient, Efficient, and Accurate Detection and Family Identification of Android Malware. Technical Report GMU-CS-TR-2015-10. Department of CS, George Mason University, Fairfax, VA.Google Scholar
Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. 2013. Structural detection of android malware using embedded call graphs. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec’13). ACM, New York, NY, 45--54. Google ScholarDigital Library
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1025--1035. Google ScholarDigital Library
Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. 2012. Riskranker: Scalable and accurate zero-day android malware detection. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services. ACM, 281--294. Google ScholarDigital Library
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, (Mar.2003), 1157--1182. Google ScholarDigital Library
Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang. 2014. AsDroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1036--1046. Google ScholarDigital Library
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449. Google ScholarCross Ref
Joseph Chan Joo Keng, Tan Kiat Wee, Lingxiao Jiang, and Rajesh Krishna Balan. 2013. The case for mobile forensics of private data leaks: Towards large-scale user-oriented privacy protection. In Proceedings of the 4th Asia-Pacific Workshop on Systems. ACM, 6. Google ScholarDigital Library
Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel Mehta, and Riley Hassell. 2004. The shellcoder’s handbook. Edycja polska. Helion, Gliwice (2004).Google Scholar
Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. 2015. IccTA: Detecting inter-component privacy leaks in android apps. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE’15). IEEE Press, Piscataway, NJ, 280--291. http://dl.acm.org/citation.cfm?id=2818754.2818791 Google ScholarDigital Library
Benjamin Livshits, John Whaley, and Monica S. Lam. 2005. In Proceedings of the Programming Languages and Systems: Third Asian Symposium (APLAS’05). Springer Berlin Heidelberg, Berlin, Heidelberg, Chapter Reflection Analysis for Java, 139--160. Google ScholarDigital Library
Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Exploring multiple execution paths for malware analysis. In Proceedings of the IEEE Symposium on Security and Privacy (SP’07). IEEE, 231--245. Google ScholarDigital Library
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825--2830. Google ScholarDigital Library
Hao Peng, Chris Gates, Bhaskar Sarma, Ninghui Li, Yuan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. 2012. Using probabilistic generative models for ranking risks of android apps. In Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, 241--252. Google ScholarDigital Library
Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute this&excl; analyzing unsafe and malicious dynamic code loading in android applications. In Proceedings of the 20th Annual Network 8 Distributed System Security Symposium (NDSS’14).Google Scholar
Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, and Eric Bodden. 2016. Harvesting runtime values in android applications that feature anti-analysis techniques. In Proceedings of the Network and Distributed System Security Symposium 2016.Google ScholarCross Ref
Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. 2013. Droidchameleon: Evaluating android anti-malware against transformation attacks. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security. ACM, 329--334. Google ScholarDigital Library
V. Rastogi, Yan Chen, and Xuxian Jiang. 2014. Catch me if you can: Evaluating android anti-malware against transformation attacks. IEEE Trans. Inf. Forens. Secur. 9, 1 (Jan. 2014), 99--108. Google ScholarDigital Library
Alessandro Reina, Aristide Fattori, and Lorenzo Cavallaro. 2013. A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors. In Proceedings of the European Workshop on Systems Security (EuroSec’13).Google Scholar
Sankardas Roy, Jordan DeLoach, Yuping Li, Nic Herndon, Doina Caragea, Xinming Ou, Venkatesh Prasad Ranganath, Hongmin Li, and Nicolais Guevara. 2015. Experimental study with real-world data for android app security analysis using machine learning. In Proceedings of the 31st Annual Computer Security Applications Conference. ACM, 81--90. Google ScholarDigital Library
Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 230--253.Google ScholarCross Ref
Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Jorge Blasco. 2014. Dendroid: A text mining approach to analyzing and classifying code structures in android malware families. Expert Syst. Appl. 41, 4 (2014), 1104--1117. Google ScholarDigital Library
Kimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015. CopperDroid: Automatic reconstruction of android malware behaviors. In Proceedings of the Symposium on Network and Distributed System Security (NDSS’15).Google ScholarCross Ref
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot-a java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 13. Google ScholarDigital Library
Fengguo Wei, Sankardas Roy, Xinming Ou, and Robby. 2014. Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14). ACM, New York, NY, 1329--1341. Google ScholarDigital Library
Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu. 2012. Droidmat: Android malware detection through manifest and API calls tracing. In Proceedings of the 2012 7th Asia Joint Conference on Information Security (Asia JCIS’12). IEEE, 62--69. Google ScholarDigital Library
Mingyuan Xia, Lu Gong, Yuanhao Lyu, Zhengwei Qi, and Xue Liu. 2015. Effective real-time android application auditing. In Proceedings of the IEEE Symposium on Security and Privacy. Google ScholarDigital Library
Eric P. Xing, Michael I. Jordan, Richard M. Karp, and others. 2001. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, Vol. 1. Citeseer, 601--608. Google ScholarDigital Library
W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. 2015. AppContext: Differentiating malicious and benign mobile app behaviors using context. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE’15), Vol. 1. 303--313. Google ScholarDigital Library
Zhemin Yang, Min Yang, Yuan Zhang, Guofei Gu, Peng Ning, and X Sean Wang. 2013. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 1043--1054. Google ScholarDigital Library
Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: Towards obfuscation-resilient mobile application repackaging detection. In Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless 8 Mobile Networks. ACM, 25--36. Google ScholarDigital Library
Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware android malware classification using weighted contextual API dependency graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1105--1116. Google ScholarDigital Library
Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML’04). Omnipress, 919--926. Google ScholarDigital Library
Yuan Zhang, Min Yang, Bingquan Xu, Zhemin Yang, Guofei Gu, Peng Ning, X. Sean Wang, and Binyu Zang. 2013. Vetting undesirable behaviors in android apps with permission use analysis. In Proceedings of the 2013 ACM SIGSAC Conference on Computer Communications Security (CCS’13). ACM, New York, NY, 611--622. Google ScholarDigital Library
Min Zheng, Patrick P. C. Lee, and John C. S. Lui. 2013. ADAM: An automatic and extensible platform to stress test android anti-virus systems. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 82--101. Google ScholarDigital Library
Min Zheng, Mingshen Sun, and John Lui. 2013. Droid analytics: A signature based analytic system to collect, extract, analyze and associate android malware. In Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’13). IEEE, 163--171. Google ScholarDigital Library
Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (SP’12). IEEE, 95--109. Google ScholarDigital Library
Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. 2012. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of Network and Distributed System Security Symposium (NDSS’12).Google Scholar

Index Terms

Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware
CODASPY '17: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy

With more than two million applications, Android marketplaces require automatic and scalable methods to efficiently vet apps for the absence of malicious threats. Recent techniques have successfully relied on the extraction of lightweight syntactic ...
Read More
Lightweight, obfuscation-resilient detection and family identification of Android malware
ICSE '18: Proceedings of the 40th International Conference on Software Engineering

The number of malicious Android apps has been and continues to increase rapidly. These malware can damage or alter other files or settings, install additional applications, obfuscate their behaviors, propagate quickly, and so on. To identify and handle ...
Read More
Permission based malware detection in android devices
SCA '18: Proceedings of the 3rd International Conference on Smart City Applications

The mobile operation system Android is one of the most OS's used in the entire world, which make it the target of many malware projects and the mission of detecting those malware applications is getting harder over time due to evaluation and development ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 26, Issue 3
July 2017
111 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3177743
Editor:
David S. Rosenblum
National University of Singapore, Singapore
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 January 2018
- Accepted: 1 October 2017
- Revised: 1 August 2017
- Received: 1 June 2016
Published in tosem Volume 26, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Android malware
lightweight
machine learning
native code
obfuscation
reflection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 116
  Total Citations
  View Citations
- 2,348
  Total Downloads
- Downloads (Last 12 months)404
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware

Lightweight, obfuscation-resilient detection and family identification of Android malware

Permission based malware detection in android devices