skip to main content
10.1145/3133956.3134052acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access
Best Paper

DolphinAttack: Inaudible Voice Commands

Published:30 October 2017Publication History

ABSTRACT

Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems (VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though "hidden", are nonetheless audible. In this work, we design a totally inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validated DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions, and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.

References

  1. Muhammad Taher Abuelma'atti. 2003. Analysis of the effect of radio frequency interference on the DC performance of bipolar operational amplifiers. IEEE Transactions on Electromagnetic Compatibility 45, 2 (2003), 453--458. Google ScholarGoogle ScholarCross RefCross Ref
  2. Akustica. 2014. AKU143 Top Port, analog silicon MEMS microphone. http://www.mouser.com/ds/2/720/DS37--1.01%20AKU143%20Datasheet-552974.pdf. (2014).Google ScholarGoogle Scholar
  3. Akustica. 2014. AKU242 digital silicon MEMS microphone. http://www.mouser.com/ds/2/720/PB24--1.0%20-%20AKU242%20Product%20Brief-770082.pdf. (2014).Google ScholarGoogle Scholar
  4. Amazon. 2017. Alexa. https://developer.amazon.com/alexa. (2017).Google ScholarGoogle Scholar
  5. Apple. 2017. iOS-Siri-Apple. https://www.apple.com/ios/siri/. (2017).Google ScholarGoogle Scholar
  6. Adam J. Aviv, Benjamin Sapp, Matt Blaze, and Jonathan M. Smith. 2012. Prac- ticality of accelerometer side channels on smartphones. In Proceedings of the Computer Security Applications Conference. 41--50.Google ScholarGoogle Scholar
  7. Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, and Caro- line Sporleder. 2010. Acoustic side-channel attacks on printers.. In Proceedings of the USENIX Security Symposium. 307--322.Google ScholarGoogle Scholar
  8. Baidu. 2017. Baidu Translate. http://fanyi.baidu.com/. (2017).Google ScholarGoogle Scholar
  9. Avisoft Bioacoustics. 2017. Ultrasonic Dynamic Speaker Vifa. http://www.avisoft. com/usg/vifa.htm. (2017).Google ScholarGoogle Scholar
  10. Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In Proceedings of the USENIX Security Symposium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Simon Castro, Robert Dean, Grant Roth, George T Flowers, and Brian Grantham. 2007. Influence of acoustic noise on the dynamic performance of MEMS gyro- scopes. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers, 1825--1831.Google ScholarGoogle Scholar
  12. CereProc. 2017. CereProc Text-to-Speech. https://www.cereproc.com/. (2017).Google ScholarGoogle Scholar
  13. Gordon KC Chen and James J Whalen. 1981. Comparative RFI performance of bipolar operational amplifiers. In Proceedings of the IEEE International Symposium on Electromagnetic Compatibility. IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  14. Robert Neal Dean, Simon Thomas Castro, George T Flowers, Grant Roth, Anwar Ahmed, Alan Scottedward Hodel, Brian Eugene Grantham, David Allen Bittle, and James P Brunsch. 2011. A characterization of the performance of a MEMS gyroscope in acoustically harsh environments. IEEE Transactions on Industrial Electronics 58, 7 (2011), 2591--2596. Google ScholarGoogle ScholarCross RefCross Ref
  15. Robert N Dean, George T Flowers, A Scotte Hodel, Grant Roth, Simon Castro, Ran Zhou, Alfonso Moreira, Anwar Ahmed, Rifki Rifki, Brian E Grantham, et al. 2007. On the degradation of MEMS gyroscope performance in the presence of high power acoustic noise. In Proceedings of the IEEE International Symposium on Industrial Electronics. 1435--140.Google ScholarGoogle ScholarCross RefCross Ref
  16. Analog Devices. 2011. ADMP401: Omnidirectional microphone with bottom port and analog output obsolete data sheet. http://www.analog.com/media/en/technical-documentation/obsolete-data-sheets/ADMP401.pdf. (2011).Google ScholarGoogle Scholar
  17. Sanorita Dey, Nirupam Roy, Wenyuan Xu, Romit Roy Choudhury, and Srihari Nelakuditi. 2014. AccelPrint: Imperfections of Accelerometers Make Smart- phones Trackable.. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle Scholar
  18. Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. 2014. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. ACM, 63--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Aurélien Francillon, Boris Danev, and Srdjan Capkun. 2011. Relay attacks on passive keyless entry and start systems in modern cars. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle Scholar
  20. Javier Gago, Josep Balcells, David GonzÁlez, Manuel Lamich, Juan Mon, and Alfonso Santolaria. 2007. EMI susceptibility model of signal conditioning circuits based on operational amplifiers. IEEE Transactions on Electromagnetic Compatibility 49, 4 (2007), 849--859. Google ScholarGoogle ScholarCross RefCross Ref
  21. Google. 2016. Google Now. http://www.androidcentral.com/google-now. (2016).Google ScholarGoogle Scholar
  22. Acapela Group. 2017. Acapela text to speech demo. http://www.acapela-group.com/. (2017).Google ScholarGoogle Scholar
  23. Carnegie Mellon University Speech Group. 2012. Statistical parametirc sythesis and voice conversion techniques. http://festvox.org/11752/slides/lecture11a.pdf. (2012).Google ScholarGoogle Scholar
  24. Weixi Gu, Zheng Yang, Longfei Shangguan, Xiaoyu Ji, and Yiyang Zhao. 2014. Toauth: Towards automatic near field authentication for smartphones. In Proceedings of the IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 229--236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Paul Horowitz and Winfield Hill. 1989. The art of electronics. Cambridge Univ. Press.Google ScholarGoogle Scholar
  26. Rob Millerb Ishtiaq Roufa, Hossen Mustafaa, Sangho Ohb Travis Taylora, Wenyuan Xua, Marco Gruteserb, Wade Trappeb, and Ivan Seskarb. 2010. Se- curity and privacy vulnerabilities of in-car wireless networks: A tire pressure monitoring system case study. In Proceedings of the USENIX Security Symposium. 11--13.Google ScholarGoogle Scholar
  27. Chadawan Ittichaichareon, Siwat Suksri, and Thaweesak Yingthawornsuk. 2012. Speech recognition using MFCC. In Proceedings of the International Conference on Computer Graphics, Simulation and Modeling (ICGSM). 28--29.Google ScholarGoogle Scholar
  28. Chaouki Kasmi and Jose Lopes Esteves. 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Transactions on Electromagnetic Compatibility 57, 6 (2015), 1752--1755. Google ScholarGoogle ScholarCross RefCross Ref
  29. Knowles. 2013. SPU0410LR5H-QB Zero-Height SiSonicTM Microphone. http://www.mouser.com/ds/2/218/-532675.pdf. (2013).Google ScholarGoogle Scholar
  30. Dexus Pawel Krzywdzinski. 2017. Ultrasonic analyzer for iPad and iPhone. http://iaudioapps.com/page1/page1.html. (2017).Google ScholarGoogle Scholar
  31. Denis Foo Kune, John Backes, Shane S Clark, Daniel Kramer, Matthew Reynolds, Kevin Fu, Yongdae Kim, and Wenyuan Xu. 2013. Ghost talk: Mitigating EMI signal injection attacks against analog sensors. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 145--159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hyewon Lee, Tae Hyun Kim, Jun Won Choi, and Sunghyun Choi. 2015. Chirp signal-based aerial acoustic communication for smart devices. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM). IEEE, 2407--2415. Google ScholarGoogle ScholarCross RefCross Ref
  33. Xiaopeng Li, Wenyuan Xu, Song Wang, and Xianshan Qu. 2017. Are You Lying: Validating the Time-Location of Outdoor Images. In Proceedings of the International Conference on Applied Cryptography and Network Security. Springer, 103--123. Google ScholarGoogle ScholarCross RefCross Ref
  34. Dog Park Software Ltd. 2017. iSpectrum - Macintosh Audio Spectrum Analyzer. https://dogparksoftware.com/iSpectrum.html. (2017).Google ScholarGoogle Scholar
  35. Ivo Mateljan. 2017. Audio measurement and analysis software. http://www.artalabs.hr/. (2017).Google ScholarGoogle Scholar
  36. Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proceedings of the USENIX Security Symposium. 1053--1067.Google ScholarGoogle Scholar
  37. Microsoft. 2017. What is Cortana? https://support.microsoft.com/en-us/help/17214/windows-10-what-is. (2017).Google ScholarGoogle Scholar
  38. Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All your voices are belong to us: Stealing voices to fool humans and machines. In Proceedings of the European Symposium on Research in Computer Security. Springer, 599--621. Google ScholarGoogle ScholarCross RefCross Ref
  39. NeoSpeech. 2017. NeoSpeech Text-to-Speech. http://www.neospeech.com/. (2017).Google ScholarGoogle Scholar
  40. Emmanuel Owusu, Jun Han, Sauvik Das, Adrian Perrig, and Joy Zhang. 2006. ACCessory: password inference using accelerometers on smartphones. (2006).Google ScholarGoogle Scholar
  41. Carl Reinke. 2017. Spectroid. https://play.google.com/store/apps/details?id=org.intoorbit.spectrum&hl=en. (2017).Google ScholarGoogle Scholar
  42. Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury. 2017. Backdoor: Making microphones hear inaudible sounds. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 2--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Samsung. 2017. What is S Voice? http://www.samsung.com/global/galaxy/what-is/s-voice/. (2017).Google ScholarGoogle Scholar
  44. Roman Schlegel, Kehuan Zhang, Xiao-yong Zhou, Mehool Intwala, Apu Kapadia, and XiaoFeng Wang. 2011. Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smartphones. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Vol. 11. 17--33.Google ScholarGoogle Scholar
  45. Sestek. 2017. Sestek TTS. http://www.sestek.com/. (2017).Google ScholarGoogle Scholar
  46. Hocheol Shin, Yunmok Son, Youngseok Park, Yujin Kwon, and Yongdae Kim. 2016. Sampling race: bypassing timing-based analog active sensor spoofing detection on analog-digital systems. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT). USENIX Association.Google ScholarGoogle Scholar
  47. Yasser Shoukry, Paul Martin, Paulo Tabuada, and Mani Srivastava. 2013. Non- invasive spoofing attacks for anti-lock braking systems. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 55--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Laurent Simon and Ross Anderson. 2013. PIN skimmer: inferring PINs through the camera and microphone. In Proceedings of the ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yunmok Son, Hocheol Shin, Dongkwan Kim, Young-Seok Park, Juhwan Noh, Kibum Choi, Jungwoo Choi, Yongdae Kim, et al. 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In Proceedings of the USENIX Security Symposium. 881--896.Google ScholarGoogle Scholar
  50. Cry Sound. 2017. CRY343 free field measurment microphone. http://www.crysound.com/product_info.php?4/35/63. (2017).Google ScholarGoogle Scholar
  51. Selvy Speech. 2017. Demo-Selvy TTS. http://speech.selvasai.com/en/text-to-speech-demonstration.php. (2017).Google ScholarGoogle Scholar
  52. STMicroelectronics. 2014. MP23AB02BTR MEMS audio sensor, high- performance analog bottom-port microphone. http://www.mouser.com/ds/2/389/mp23ab02b-955093.pdf. (2014).Google ScholarGoogle Scholar
  53. STMicroelectronics. 2016. MP34DB02 MEMS audio sensor omnidirectional digital microphone. http://www.mouser.com/ds/2/389/mp34db02--955149.pdf. (2016).Google ScholarGoogle Scholar
  54. STMicroelectronics. 2017. Tutorial for MEMS microphones. http://www.st.com/content/ccc/resource/technical/document/application_note/46/0b/3e/74/cf/fb/4b/13/DM00103199.pdf/files/DM00103199.pdf/jcr:content/translations/en.DM00103199.pdf. (2017).Google ScholarGoogle Scholar
  55. Jingchao Sun, Xiaocong Jin, Yimin Chen, Jinxue Zhang, Yanchao Zhang, and Rui Zhang. 2016. VISIBLE: Video-Assisted keystroke inference from tablet backside motion. In Proceedings of the Network and Distributed System Security Symposium (NDSS). Google ScholarGoogle ScholarCross RefCross Ref
  56. Jinci Technologies. 2017. Open structure product review. http://www.jinci.cn/en/goods/112.html. (2017).Google ScholarGoogle Scholar
  57. Keysight Technologies. 2017. N5172B EXG X-Series RF Vector Signal Generator, 9 kHz to 6 GHz. http://www.keysight.com/en/pdx-x201910-pn-N5172B. (2017).Google ScholarGoogle Scholar
  58. From Text to Speech. 2017. Free online TTS service. http://www.fromtexttospeech.com/. (2017).Google ScholarGoogle Scholar
  59. Innoetics Text to Speech Technologies. 2017. Innoetics Text-to-Speech. https://www.innoetics.com/. (2017).Google ScholarGoogle Scholar
  60. Timothy Trippel, Ofir Weisse, Wenyuan Xu, Peter Honeyman, and Kevin Fu. 2017. WALNUT: Waging doubt on the integrity of mems accelerometers with acoustic injection attacks. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 3--18.Google ScholarGoogle ScholarCross RefCross Ref
  61. Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine Noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT). USENIX Association.Google ScholarGoogle Scholar
  62. Olli Viikki and Kari Laurila. 1998. Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 1 (1998), 133--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Vocalware. 2017. Vocalware TTS. https://www.vocalware.com/. (2017).Google ScholarGoogle Scholar
  64. Xiaohui Wang, Yanjing Wu, and Wenyuan Xu. 2016. WindCompass: Determine Wind Direction Using Smartphones. In Proceedings of the 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Xdadevelopers. 2017. HiVoice app, what is it for? https://forum.xda-developers.com/honor-7/general/hivoice-app-t3322763. (2017).Google ScholarGoogle Scholar
  66. Chen Yan, Wenyuan Xu, and Jianhao Liu. 2016. Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle. DEF CON (2016).Google ScholarGoogle Scholar

Index Terms

  1. DolphinAttack: Inaudible Voice Commands

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
          October 2017
          2682 pages
          ISBN:9781450349468
          DOI:10.1145/3133956

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 October 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader