ABSTRACT
Along with the increasing popularity of mobile devices, there exist severe security and privacy concerns for mobile apps. On Google Play, user reviews provide a unique understanding of security/privacy issues of mobile apps from users' perspective, and in fact they are valuable feedbacks from users by considering users' expectations. To best assist the end users, in this paper, we automatically learn the security/privacy related behaviors inferred from analysis on user reviews, which we call review-to-behavior fidelity. We design the system AUTOREB that automatically assesses the review-to-behavior fidelity of mobile apps. AUTOREB employs the state-of-the-art machine learning techniques to infer the relations between users' reviews and four categories of security-related behaviors. Moreover, it uses a crowdsourcing approach to automatically aggregate the security issues from review-level to app-level. To our knowledge, AUTOREB is the first work that explores the user review information and utilizes the review semantics to predict the risky behaviors at both review-level and app-level.
We crawled a real-world dataset of 2,614,186 users, 12,783 apps and 13,129,783 reviews from Google play, and use it to comprehensively evaluate AUTOREB. The experiment result shows that our method can predict the mobile app behaviors at user-review level with accuracy as high as 94.05%, and also it can predict the security issues at app-level by aggregating the predictions at review-level. Our research offers an insight into understanding the mobile app security concerns from users' perspective, and helps bridge the gap between the security issues and users' perception.
- S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004. Google ScholarCross Ref
- S. Chakradeo, B. Reaves, P. Traynor, and W. Enck. Mast: Triage for market-scale mobile malware analysis. In WiSec, pages 13--24, 2013. Google ScholarDigital Library
- R. Cochran, L. D'Antoni, B. Livshits, D. Molnar, and M. Veanes. Program boosting: Program synthesis via crowd-sourcing. In POPL, Jan. 2015. Google ScholarDigital Library
- A. Cotter, S. Shalev-shwartz, and N. Srebro. Learning optimally sparse support vector machines. In ICML, volume 28, pages 266--274, 2013.Google Scholar
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.Google Scholar
- R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Willey & Sons, New Yotk, 1973.Google Scholar
- W. Enck, P. Gilbert, B. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth. Taintdroid: An information-flow tracking system for realtime privacy monitoring on smartphones. In OSDI, 2010. Google ScholarDigital Library
- W. Enck, D. Octeau, P. Mcdaniel, and S. Chaudhuri. A study of android application security. In USENIX Security Symposium, 2011. Google ScholarDigital Library
- A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions demystified. In CCS, pages 627--638, 2011. Google ScholarDigital Library
- A. P. Felt, S. Egelman, and D. Wagner. I've got 99 problems, but vibration ain't one: A survey of smartphone users' concerns. In SPSM, pages 33--44, 2012. Google ScholarDigital Library
- A. P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, and D. Wagner. Android permissions: User attention, comprehension, and behavior. In SOUPS, 2012. Google ScholarDigital Library
- B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh. Why people hate your app: Making sense of user feedback in a mobile app store. In KDD, pages 1276--1284, 2013. Google ScholarDigital Library
- M. Gegick, P. Rotella, and T. Xie. Identifying security bug reports via text mining: An industrial case study. In MSR, pages 11--20, May 2010.Google ScholarCross Ref
- C. Gibler, J. Crussell, J. Erickson, and H. Chen. Androidleaks: Automatically detecting potential privacy leaks in android applications on a large scale. In TRUST, pages 291--307, 2012. Google ScholarDigital Library
- M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. Riskranker: Scalable and accurate zero-day android malware detection. In MobiSys, pages 281--294, 2012. Google ScholarDigital Library
- M. C. Grace, W. Zhou, X. Jiang, and A.-R. Sadeghi. Unsafe exposure analysis of mobile in-app advertisements. In WISEC, pages 101--112, 2012. Google ScholarDigital Library
- P. Hornyack, S. Han, J. Jung, S. Schechter, and D. Wetherall. These aren't the droids you're looking for: Retrofitting android to protect data from imperious applications. In CCS, pages 639--652, 2011. Google ScholarDigital Library
- H. Huang, K. Chen, C. Ren, P. Liu, S. Zhu, and D. Wu. Towards discovering and understanding unexpected hazards in tailoring antivirus software for android. In AsiaCCS, pages 7--18. ACM, 2015. Google ScholarDigital Library
- D. Kong and H. Jin. Towards permission request prediction on mobile apps via structure feature learning. In SDM, pages 604--612, 2015.Google ScholarCross Ref
- D. Kong and G. Yan. Discriminant malware distance learning on structural information for automated malware classification. In KDD, pages 1357--1365, 2013. Google ScholarDigital Library
- W. Lee and S. J. Stolfo. Data mining approaches for intrusion detection. In USENIX Security Symposium, pages 79--94, 1998. Google ScholarDigital Library
- J. Lin, S. Amini, J. I. Hong, N. Sadeh, J. Lindqvist, and J. Zhang. Expectation and purpose: Understanding users' mental models of mobile app privacy through crowdsourcing. In UbiComp, pages 501--510, 2012. Google ScholarDigital Library
- B. Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second Edition. Taylor and Francis Group, Boca, 2010.Google Scholar
- C. Liu, C. Chen, J. Han, and P. S. Yu. Gplag: Detection of software plagiarism by program dependence graph analysis. In KDD, pages 872--881, 2006. Google ScholarDigital Library
- B. Livshits and J. Jung. Automatic mediation of privacy-sensitive resource access in smartphone applications. In USENIX Security, pages 113--130. USENIX, 2013. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In FSE, ESEC/FSE-13, pages 296--305, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Beyond blacklists: Learning to detect malicious web sites from suspicious urls. In KDD, pages 1245--1254, 2009. Google ScholarDigital Library
- A. Muralidharan, Z. Gyongyi, and E. H. Chi. Social annotations in web search. In CHI, pages 1085--1094, New York, NY, 2012. Google ScholarDigital Library
- Y. Nan, M. Yang, Z. Yang, S. Zhou, G. Gu, and X. Wang. Uipicker: User-input privacy identification in mobile applications. In USENIX Security, pages 993--1008, 2015. Google ScholarDigital Library
- L. Nelson, C. Held, P. Pirolli, L. Hong, D. Schiano, and E. H. Chi. With a little help from my friends: Examining the impact of social annotations in sensemaking tasks. In CHI, pages 1795--1798, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Neugschwandtner, P. M. Comparetti, G. Jacob, and C. Kruegel. Forecast: Skimming off the malware cream. In ACSAC, pages 11--20, 2011. Google ScholarDigital Library
- J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In 2005 IEEE Security and Privacy (S&P), pages 226--241, 2005. Google ScholarDigital Library
- R. Pandita, X. Xiao, W. Yang, W. Enck, and T. Xie. Whyper: Towards automating risk assessment of mobile applications. In USENIX Security, pages 527--542, 2013. Google ScholarDigital Library
- R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language api descriptions. In ICSE, pages 815--825, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarDigital Library
- Z. Qi, F. Long, S. Achour, and M. Rinard. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In ISSTA, pages 24--36, 2015. Google ScholarDigital Library
- Z. Qu, V. Rastogi, X. Zhang, Y. Chen, T. Zhu, and Z. Chen. Autocog: Measuring the description-to-permission fidelity in android applications. In CCS, pages 1354--1365, 2014. Google ScholarDigital Library
- V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. The Journal of Machine Learning Research, 11:1297--1322, 2010. Google ScholarDigital Library
- K. Rieck, T. Krueger, and A. Dewald. Cujo: Efficient detection and prevention of drive-by-download attacks. In ACSAC, pages 31--39, 2010. Google ScholarDigital Library
- J. Slankas, X. Xiao, L. Williams, and T. Xie. Relation extraction for inferring access control rules from natural language artifacts. In ACSAC, pages 366--375, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /*icomment: Bugs or bad comments?*/. In SOSP, pages 145--158, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267--288, 1994.Google Scholar
- N. Wang, B. Zhang, B. Liu, and H. Jin. Investigating effects of control and ads awareness on android users' privacy behaviors and perceptions. In MobileHCI. ACM, 2015. Google ScholarDigital Library
- R. Wang, W. Enck, D. Reeves, X. Zhang, P. Ning, D. Xu, W. Zhou, and A. M. Azab. Easeandroid: Automatic policy analysis and refinement for security enhanced android via large-scale semi-supervised learning. In USENIX Security, Washington, D.C., 2015. USENIX Association. Google ScholarDigital Library
- J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR, pages 4--11, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
- G. Yan, N. Brown, and D. Kong. Exploring discriminatory features for automated malware classification. In DIMVA, pages 41--61, 2013. Google ScholarDigital Library
- L. K. Yan and H. Yin. Droidscope: Seamlessly reconstructing the os and dalvik semantic views for dynamic android malware analysis. In USENIX Security Symposium, 2012. Google ScholarDigital Library
- F. Zhang, H. Huang, S. Zhu, D. Wu, and P. Liu. Viewdroid: Towards obfuscation-resilient mobile application repackaging detection. In WISEC, pages 25--36. ACM, 2014. Google ScholarDigital Library
- M. Zhang, Y. Duan, H. Yin, and Z. Zhao. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In CCS, Scottsdale, AZ, November 2014. Google ScholarDigital Library
- Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In NDSS, 2012.Google Scholar
Index Terms
AUTOREB: Automatically Understanding the Review-to-Behavior Fidelity in Android Applications
Recommendations
To Update or Not to Update: Insights From a Two-Year Study of Android App Evolution
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications SecurityAlthough there are over 1,900,000 third-party Android apps in the Google Play Store, little is understood about how their security and privacy characteristics, such as dangerous permission usage and the vulnerabilities they contain, have evolved over ...
Enforcing fine-grained security and privacy policies in an ecosystem within an ecosystem
MobileDeLi 2015: Proceedings of the 3rd International Workshop on Mobile Development LifecycleSmart home automation and IoT promise to bring many advantages but they also expose their users to certain security and privacy vulnerabilities. For example, leaking the information about the absence of a person from home or the medicine somebody is ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide WebWith the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Comments