ABSTRACT
Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems (VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though "hidden", are nonetheless audible. In this work, we design a totally inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validated DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions, and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.
- Muhammad Taher Abuelma'atti. 2003. Analysis of the effect of radio frequency interference on the DC performance of bipolar operational amplifiers. IEEE Transactions on Electromagnetic Compatibility 45, 2 (2003), 453--458. Google ScholarCross Ref
- Akustica. 2014. AKU143 Top Port, analog silicon MEMS microphone. http://www.mouser.com/ds/2/720/DS37--1.01%20AKU143%20Datasheet-552974.pdf. (2014).Google Scholar
- Akustica. 2014. AKU242 digital silicon MEMS microphone. http://www.mouser.com/ds/2/720/PB24--1.0%20-%20AKU242%20Product%20Brief-770082.pdf. (2014).Google Scholar
- Amazon. 2017. Alexa. https://developer.amazon.com/alexa. (2017).Google Scholar
- Apple. 2017. iOS-Siri-Apple. https://www.apple.com/ios/siri/. (2017).Google Scholar
- Adam J. Aviv, Benjamin Sapp, Matt Blaze, and Jonathan M. Smith. 2012. Prac- ticality of accelerometer side channels on smartphones. In Proceedings of the Computer Security Applications Conference. 41--50.Google Scholar
- Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, and Caro- line Sporleder. 2010. Acoustic side-channel attacks on printers.. In Proceedings of the USENIX Security Symposium. 307--322.Google Scholar
- Baidu. 2017. Baidu Translate. http://fanyi.baidu.com/. (2017).Google Scholar
- Avisoft Bioacoustics. 2017. Ultrasonic Dynamic Speaker Vifa. http://www.avisoft. com/usg/vifa.htm. (2017).Google Scholar
- Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In Proceedings of the USENIX Security Symposium.Google ScholarDigital Library
- Simon Castro, Robert Dean, Grant Roth, George T Flowers, and Brian Grantham. 2007. Influence of acoustic noise on the dynamic performance of MEMS gyro- scopes. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers, 1825--1831.Google Scholar
- CereProc. 2017. CereProc Text-to-Speech. https://www.cereproc.com/. (2017).Google Scholar
- Gordon KC Chen and James J Whalen. 1981. Comparative RFI performance of bipolar operational amplifiers. In Proceedings of the IEEE International Symposium on Electromagnetic Compatibility. IEEE, 1--5.Google ScholarCross Ref
- Robert Neal Dean, Simon Thomas Castro, George T Flowers, Grant Roth, Anwar Ahmed, Alan Scottedward Hodel, Brian Eugene Grantham, David Allen Bittle, and James P Brunsch. 2011. A characterization of the performance of a MEMS gyroscope in acoustically harsh environments. IEEE Transactions on Industrial Electronics 58, 7 (2011), 2591--2596. Google ScholarCross Ref
- Robert N Dean, George T Flowers, A Scotte Hodel, Grant Roth, Simon Castro, Ran Zhou, Alfonso Moreira, Anwar Ahmed, Rifki Rifki, Brian E Grantham, et al. 2007. On the degradation of MEMS gyroscope performance in the presence of high power acoustic noise. In Proceedings of the IEEE International Symposium on Industrial Electronics. 1435--140.Google ScholarCross Ref
- Analog Devices. 2011. ADMP401: Omnidirectional microphone with bottom port and analog output obsolete data sheet. http://www.analog.com/media/en/technical-documentation/obsolete-data-sheets/ADMP401.pdf. (2011).Google Scholar
- Sanorita Dey, Nirupam Roy, Wenyuan Xu, Romit Roy Choudhury, and Srihari Nelakuditi. 2014. AccelPrint: Imperfections of Accelerometers Make Smart- phones Trackable.. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google Scholar
- Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. 2014. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. ACM, 63--74.Google ScholarDigital Library
- Aurélien Francillon, Boris Danev, and Srdjan Capkun. 2011. Relay attacks on passive keyless entry and start systems in modern cars. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google Scholar
- Javier Gago, Josep Balcells, David GonzÁlez, Manuel Lamich, Juan Mon, and Alfonso Santolaria. 2007. EMI susceptibility model of signal conditioning circuits based on operational amplifiers. IEEE Transactions on Electromagnetic Compatibility 49, 4 (2007), 849--859. Google ScholarCross Ref
- Google. 2016. Google Now. http://www.androidcentral.com/google-now. (2016).Google Scholar
- Acapela Group. 2017. Acapela text to speech demo. http://www.acapela-group.com/. (2017).Google Scholar
- Carnegie Mellon University Speech Group. 2012. Statistical parametirc sythesis and voice conversion techniques. http://festvox.org/11752/slides/lecture11a.pdf. (2012).Google Scholar
- Weixi Gu, Zheng Yang, Longfei Shangguan, Xiaoyu Ji, and Yiyang Zhao. 2014. Toauth: Towards automatic near field authentication for smartphones. In Proceedings of the IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 229--236.Google ScholarDigital Library
- Paul Horowitz and Winfield Hill. 1989. The art of electronics. Cambridge Univ. Press.Google Scholar
- Rob Millerb Ishtiaq Roufa, Hossen Mustafaa, Sangho Ohb Travis Taylora, Wenyuan Xua, Marco Gruteserb, Wade Trappeb, and Ivan Seskarb. 2010. Se- curity and privacy vulnerabilities of in-car wireless networks: A tire pressure monitoring system case study. In Proceedings of the USENIX Security Symposium. 11--13.Google Scholar
- Chadawan Ittichaichareon, Siwat Suksri, and Thaweesak Yingthawornsuk. 2012. Speech recognition using MFCC. In Proceedings of the International Conference on Computer Graphics, Simulation and Modeling (ICGSM). 28--29.Google Scholar
- Chaouki Kasmi and Jose Lopes Esteves. 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Transactions on Electromagnetic Compatibility 57, 6 (2015), 1752--1755. Google ScholarCross Ref
- Knowles. 2013. SPU0410LR5H-QB Zero-Height SiSonicTM Microphone. http://www.mouser.com/ds/2/218/-532675.pdf. (2013).Google Scholar
- Dexus Pawel Krzywdzinski. 2017. Ultrasonic analyzer for iPad and iPhone. http://iaudioapps.com/page1/page1.html. (2017).Google Scholar
- Denis Foo Kune, John Backes, Shane S Clark, Daniel Kramer, Matthew Reynolds, Kevin Fu, Yongdae Kim, and Wenyuan Xu. 2013. Ghost talk: Mitigating EMI signal injection attacks against analog sensors. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 145--159.Google ScholarDigital Library
- Hyewon Lee, Tae Hyun Kim, Jun Won Choi, and Sunghyun Choi. 2015. Chirp signal-based aerial acoustic communication for smart devices. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM). IEEE, 2407--2415. Google ScholarCross Ref
- Xiaopeng Li, Wenyuan Xu, Song Wang, and Xianshan Qu. 2017. Are You Lying: Validating the Time-Location of Outdoor Images. In Proceedings of the International Conference on Applied Cryptography and Network Security. Springer, 103--123. Google ScholarCross Ref
- Dog Park Software Ltd. 2017. iSpectrum - Macintosh Audio Spectrum Analyzer. https://dogparksoftware.com/iSpectrum.html. (2017).Google Scholar
- Ivo Mateljan. 2017. Audio measurement and analysis software. http://www.artalabs.hr/. (2017).Google Scholar
- Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proceedings of the USENIX Security Symposium. 1053--1067.Google Scholar
- Microsoft. 2017. What is Cortana? https://support.microsoft.com/en-us/help/17214/windows-10-what-is. (2017).Google Scholar
- Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All your voices are belong to us: Stealing voices to fool humans and machines. In Proceedings of the European Symposium on Research in Computer Security. Springer, 599--621. Google ScholarCross Ref
- NeoSpeech. 2017. NeoSpeech Text-to-Speech. http://www.neospeech.com/. (2017).Google Scholar
- Emmanuel Owusu, Jun Han, Sauvik Das, Adrian Perrig, and Joy Zhang. 2006. ACCessory: password inference using accelerometers on smartphones. (2006).Google Scholar
- Carl Reinke. 2017. Spectroid. https://play.google.com/store/apps/details?id=org.intoorbit.spectrum&hl=en. (2017).Google Scholar
- Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury. 2017. Backdoor: Making microphones hear inaudible sounds. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 2--14. Google ScholarDigital Library
- Samsung. 2017. What is S Voice? http://www.samsung.com/global/galaxy/what-is/s-voice/. (2017).Google Scholar
- Roman Schlegel, Kehuan Zhang, Xiao-yong Zhou, Mehool Intwala, Apu Kapadia, and XiaoFeng Wang. 2011. Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smartphones. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Vol. 11. 17--33.Google Scholar
- Sestek. 2017. Sestek TTS. http://www.sestek.com/. (2017).Google Scholar
- Hocheol Shin, Yunmok Son, Youngseok Park, Yujin Kwon, and Yongdae Kim. 2016. Sampling race: bypassing timing-based analog active sensor spoofing detection on analog-digital systems. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT). USENIX Association.Google Scholar
- Yasser Shoukry, Paul Martin, Paulo Tabuada, and Mani Srivastava. 2013. Non- invasive spoofing attacks for anti-lock braking systems. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 55--72. Google ScholarDigital Library
- Laurent Simon and Ross Anderson. 2013. PIN skimmer: inferring PINs through the camera and microphone. In Proceedings of the ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. 67--78. Google ScholarDigital Library
- Yunmok Son, Hocheol Shin, Dongkwan Kim, Young-Seok Park, Juhwan Noh, Kibum Choi, Jungwoo Choi, Yongdae Kim, et al. 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In Proceedings of the USENIX Security Symposium. 881--896.Google Scholar
- Cry Sound. 2017. CRY343 free field measurment microphone. http://www.crysound.com/product_info.php?4/35/63. (2017).Google Scholar
- Selvy Speech. 2017. Demo-Selvy TTS. http://speech.selvasai.com/en/text-to-speech-demonstration.php. (2017).Google Scholar
- STMicroelectronics. 2014. MP23AB02BTR MEMS audio sensor, high- performance analog bottom-port microphone. http://www.mouser.com/ds/2/389/mp23ab02b-955093.pdf. (2014).Google Scholar
- STMicroelectronics. 2016. MP34DB02 MEMS audio sensor omnidirectional digital microphone. http://www.mouser.com/ds/2/389/mp34db02--955149.pdf. (2016).Google Scholar
- STMicroelectronics. 2017. Tutorial for MEMS microphones. http://www.st.com/content/ccc/resource/technical/document/application_note/46/0b/3e/74/cf/fb/4b/13/DM00103199.pdf/files/DM00103199.pdf/jcr:content/translations/en.DM00103199.pdf. (2017).Google Scholar
- Jingchao Sun, Xiaocong Jin, Yimin Chen, Jinxue Zhang, Yanchao Zhang, and Rui Zhang. 2016. VISIBLE: Video-Assisted keystroke inference from tablet backside motion. In Proceedings of the Network and Distributed System Security Symposium (NDSS). Google ScholarCross Ref
- Jinci Technologies. 2017. Open structure product review. http://www.jinci.cn/en/goods/112.html. (2017).Google Scholar
- Keysight Technologies. 2017. N5172B EXG X-Series RF Vector Signal Generator, 9 kHz to 6 GHz. http://www.keysight.com/en/pdx-x201910-pn-N5172B. (2017).Google Scholar
- From Text to Speech. 2017. Free online TTS service. http://www.fromtexttospeech.com/. (2017).Google Scholar
- Innoetics Text to Speech Technologies. 2017. Innoetics Text-to-Speech. https://www.innoetics.com/. (2017).Google Scholar
- Timothy Trippel, Ofir Weisse, Wenyuan Xu, Peter Honeyman, and Kevin Fu. 2017. WALNUT: Waging doubt on the integrity of mems accelerometers with acoustic injection attacks. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 3--18.Google ScholarCross Ref
- Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine Noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT). USENIX Association.Google Scholar
- Olli Viikki and Kari Laurila. 1998. Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 1 (1998), 133--147. Google ScholarDigital Library
- Vocalware. 2017. Vocalware TTS. https://www.vocalware.com/. (2017).Google Scholar
- Xiaohui Wang, Yanjing Wu, and Wenyuan Xu. 2016. WindCompass: Determine Wind Direction Using Smartphones. In Proceedings of the 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1--9. Google ScholarDigital Library
- Xdadevelopers. 2017. HiVoice app, what is it for? https://forum.xda-developers.com/honor-7/general/hivoice-app-t3322763. (2017).Google Scholar
- Chen Yan, Wenyuan Xu, and Jianhao Liu. 2016. Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle. DEF CON (2016).Google Scholar
Index Terms
- DolphinAttack: Inaudible Voice Commands
Recommendations
Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition
We study the low-variance and robust features for speech recognition system on the AURORA-4 corpus.We propose to compute cepstral features from a regularized MVDR (RMVDR) spectral estimates, denoted as RMVDR-based Cepstral Coefficient (RMCC) features.A ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System
Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Syllable-based automatic arabic speech recognition in noisy-telephone channel
The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of ...
Comments