Abstract
The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.
To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.
- Android Trojan Looks, Acts Like Windows Malware. Retrieved from http://www.snoopwall.com/android-trojan-looks-acts-like-windows-malware/.Google Scholar
- Bitcoin-mining malware reportedly found on Google Play. Retrieved from http://www.cnet.com/news/bitcoin-mining-malware-reportedly-discovered-at-google-play/.Google Scholar
- Cisco 2014 Annual Security Report. Retrieved from http://www.cisco.com/web/offers/lp/2014-annual-security-report/index.html.Google Scholar
- RevealDroid. Retrieved from http://tiny.cc/revealdroid.Google Scholar
- Server-side polymorphic android applications. Retrieved from http://www.symantec.com/connect/blogs/server-side-polymorphic-android-applications.Google Scholar
- The Drebin Dataset. Retrieved from http://user.informatik.uni-goettingen.de/darp/drebin/.Google Scholar
- THREAT DESCRIPTION TROJAN:ANDROID/OLDBOOT.A. Retrieved from https://www.f-secure.com/v-descs/trojan_android_old boot_a.shtml.Google Scholar
- VirusShare.com. Retrieved from http://www.virusshare.com/.Google Scholar
- VirusTotal. Retrieved from https://www.virustotal.com/.Google Scholar
- 2015. Quick Heal Annual Threat Report 2015. Retrieved from http://www.quickheal.co.in/resources/threat-reports. (January 2015).Google Scholar
- 2017. 1.5. Stochastic Gradient Descent—scikit-learn 0.18.2 documentation. Retrieved from http://scikit-learn.org/stable/modules/sgd.html. (2017).Google Scholar
- Moutaz Alazab, Veelasha Monsamy, Lynn Batten, Patrik Lantz, and Ronghua Tian. 2012. Analysis of malicious and benign android applications. In Proceedings of the 2012 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW’12). IEEE, 608--616. Google ScholarDigital Library
- Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2015. Are Your Training Datasets Yet Relevant? Springer International Publishing, Cham, 51--67.Google Scholar
- Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 468--471. Google ScholarDigital Library
- Mohamed Aly. 2005. Survey on multiclass classification methods. Neur. Netw. (2005), 1--9.Google Scholar
- Axelle Apvrille and Ruchna Nigam. 2014. Obfuscation in Android malware, and how to fight back.Virus Bull. (2014).Google Scholar
- Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceeedings of Network and Distributed System Security Symposium (NDSS’14).Google ScholarCross Ref
- Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarDigital Library
- Alexandre Bartel, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2012. Dexpler: Converting android dalvik bytecode to jimple for static analysis with soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis. ACM, 27--38. Google ScholarDigital Library
- Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
- Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. 409--415. Google ScholarDigital Library
- Saurabh Chakradeo, Bradley Reaves, Patrick Traynor, and William Enck. 2013. MAST: Triage for market-scale mobile malware analysis. In Proceedings of the 6th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec’13). ACM, New York, NY, 13--24. Google ScholarDigital Library
- Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15). USENIX Association, Washington, DC, 659--674. http://blogs.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/chen-kai. Google ScholarDigital Library
- Santanu Kumar Dash, Guillermo Suarez-Tangil, Salahuddin Khan, Kimberly Tam, Mansour Ahmadi, Johannes Kinder, and Lorenzo Cavallaro. 2016. Droidscribe: Classifying android malware based on runtime behavior. In Proceedings of the 2016 IEEE Security and Privacy Workshops (SPW’16). IEEE, 252--261.Google ScholarCross Ref
- William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. 2014. TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. 32, 2 (2014), 5. Google ScholarDigital Library
- William Enck, Machigar Ongtang, and Patrick McDaniel. 2009. On lightweight mobile phone application certification. In Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM, 235--245. Google ScholarDigital Library
- Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of android malware through static analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). ACM, New York, NY, 576--587. Google ScholarDigital Library
- Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2016. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware. Technical Report UCI-ISR-16-2. Institute for Software Research, Irvine, California.Google Scholar
- Joshua Garcia, Mahmoud Hammad, Bahman Pedrood, Ali Bagheri-Khaligh, and Sam Malek. 2015. Obfuscation-Resilient, Efficient, and Accurate Detection and Family Identification of Android Malware. Technical Report GMU-CS-TR-2015-10. Department of CS, George Mason University, Fairfax, VA.Google Scholar
- Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. 2013. Structural detection of android malware using embedded call graphs. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec’13). ACM, New York, NY, 45--54. Google ScholarDigital Library
- Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1025--1035. Google ScholarDigital Library
- Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. 2012. Riskranker: Scalable and accurate zero-day android malware detection. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services. ACM, 281--294. Google ScholarDigital Library
- Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, (Mar.2003), 1157--1182. Google ScholarDigital Library
- Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang. 2014. AsDroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 1036--1046. Google ScholarDigital Library
- Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449. Google ScholarCross Ref
- Joseph Chan Joo Keng, Tan Kiat Wee, Lingxiao Jiang, and Rajesh Krishna Balan. 2013. The case for mobile forensics of private data leaks: Towards large-scale user-oriented privacy protection. In Proceedings of the 4th Asia-Pacific Workshop on Systems. ACM, 6. Google ScholarDigital Library
- Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel Mehta, and Riley Hassell. 2004. The shellcoder’s handbook. Edycja polska. Helion, Gliwice (2004).Google Scholar
- Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. 2015. IccTA: Detecting inter-component privacy leaks in android apps. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE’15). IEEE Press, Piscataway, NJ, 280--291. http://dl.acm.org/citation.cfm?id=2818754.2818791 Google ScholarDigital Library
- Benjamin Livshits, John Whaley, and Monica S. Lam. 2005. In Proceedings of the Programming Languages and Systems: Third Asian Symposium (APLAS’05). Springer Berlin Heidelberg, Berlin, Heidelberg, Chapter Reflection Analysis for Java, 139--160. Google ScholarDigital Library
- Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Exploring multiple execution paths for malware analysis. In Proceedings of the IEEE Symposium on Security and Privacy (SP’07). IEEE, 231--245. Google ScholarDigital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825--2830. Google ScholarDigital Library
- Hao Peng, Chris Gates, Bhaskar Sarma, Ninghui Li, Yuan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. 2012. Using probabilistic generative models for ranking risks of android apps. In Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, 241--252. Google ScholarDigital Library
- Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute this! analyzing unsafe and malicious dynamic code loading in android applications. In Proceedings of the 20th Annual Network 8 Distributed System Security Symposium (NDSS’14).Google Scholar
- Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, and Eric Bodden. 2016. Harvesting runtime values in android applications that feature anti-analysis techniques. In Proceedings of the Network and Distributed System Security Symposium 2016.Google ScholarCross Ref
- Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. 2013. Droidchameleon: Evaluating android anti-malware against transformation attacks. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security. ACM, 329--334. Google ScholarDigital Library
- V. Rastogi, Yan Chen, and Xuxian Jiang. 2014. Catch me if you can: Evaluating android anti-malware against transformation attacks. IEEE Trans. Inf. Forens. Secur. 9, 1 (Jan. 2014), 99--108. Google ScholarDigital Library
- Alessandro Reina, Aristide Fattori, and Lorenzo Cavallaro. 2013. A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors. In Proceedings of the European Workshop on Systems Security (EuroSec’13).Google Scholar
- Sankardas Roy, Jordan DeLoach, Yuping Li, Nic Herndon, Doina Caragea, Xinming Ou, Venkatesh Prasad Ranganath, Hongmin Li, and Nicolais Guevara. 2015. Experimental study with real-world data for android app security analysis using machine learning. In Proceedings of the 31st Annual Computer Security Applications Conference. ACM, 81--90. Google ScholarDigital Library
- Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 230--253.Google ScholarCross Ref
- Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Jorge Blasco. 2014. Dendroid: A text mining approach to analyzing and classifying code structures in android malware families. Expert Syst. Appl. 41, 4 (2014), 1104--1117. Google ScholarDigital Library
- Kimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015. CopperDroid: Automatic reconstruction of android malware behaviors. In Proceedings of the Symposium on Network and Distributed System Security (NDSS’15).Google ScholarCross Ref
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot-a java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 13. Google ScholarDigital Library
- Fengguo Wei, Sankardas Roy, Xinming Ou, and Robby. 2014. Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14). ACM, New York, NY, 1329--1341. Google ScholarDigital Library
- Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu. 2012. Droidmat: Android malware detection through manifest and API calls tracing. In Proceedings of the 2012 7th Asia Joint Conference on Information Security (Asia JCIS’12). IEEE, 62--69. Google ScholarDigital Library
- Mingyuan Xia, Lu Gong, Yuanhao Lyu, Zhengwei Qi, and Xue Liu. 2015. Effective real-time android application auditing. In Proceedings of the IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Eric P. Xing, Michael I. Jordan, Richard M. Karp, and others. 2001. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, Vol. 1. Citeseer, 601--608. Google ScholarDigital Library
- W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. 2015. AppContext: Differentiating malicious and benign mobile app behaviors using context. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE’15), Vol. 1. 303--313. Google ScholarDigital Library
- Zhemin Yang, Min Yang, Yuan Zhang, Guofei Gu, Peng Ning, and X Sean Wang. 2013. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC Conference on Computer 8 Communications Security. ACM, 1043--1054. Google ScholarDigital Library
- Fangfang Zhang, Heqing Huang, Sencun Zhu, Dinghao Wu, and Peng Liu. 2014. ViewDroid: Towards obfuscation-resilient mobile application repackaging detection. In Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless 8 Mobile Networks. ACM, 25--36. Google ScholarDigital Library
- Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware android malware classification using weighted contextual API dependency graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1105--1116. Google ScholarDigital Library
- Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML’04). Omnipress, 919--926. Google ScholarDigital Library
- Yuan Zhang, Min Yang, Bingquan Xu, Zhemin Yang, Guofei Gu, Peng Ning, X. Sean Wang, and Binyu Zang. 2013. Vetting undesirable behaviors in android apps with permission use analysis. In Proceedings of the 2013 ACM SIGSAC Conference on Computer Communications Security (CCS’13). ACM, New York, NY, 611--622. Google ScholarDigital Library
- Min Zheng, Patrick P. C. Lee, and John C. S. Lui. 2013. ADAM: An automatic and extensible platform to stress test android anti-virus systems. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 82--101. Google ScholarDigital Library
- Min Zheng, Mingshen Sun, and John Lui. 2013. Droid analytics: A signature based analytic system to collect, extract, analyze and associate android malware. In Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’13). IEEE, 163--171. Google ScholarDigital Library
- Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (SP’12). IEEE, 95--109. Google ScholarDigital Library
- Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. 2012. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of Network and Distributed System Security Symposium (NDSS’12).Google Scholar
Index Terms
- Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware
Recommendations
DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware
CODASPY '17: Proceedings of the Seventh ACM on Conference on Data and Application Security and PrivacyWith more than two million applications, Android marketplaces require automatic and scalable methods to efficiently vet apps for the absence of malicious threats. Recent techniques have successfully relied on the extraction of lightweight syntactic ...
Lightweight, obfuscation-resilient detection and family identification of Android malware
ICSE '18: Proceedings of the 40th International Conference on Software EngineeringThe number of malicious Android apps has been and continues to increase rapidly. These malware can damage or alter other files or settings, install additional applications, obfuscate their behaviors, propagate quickly, and so on. To identify and handle ...
Permission based malware detection in android devices
SCA '18: Proceedings of the 3rd International Conference on Smart City ApplicationsThe mobile operation system Android is one of the most OS's used in the entire world, which make it the target of many malware projects and the mission of detecting those malware applications is getting harder over time due to evaluation and development ...
Comments