skip to main content
10.1145/3225153.3225166acmconferencesArticle/Chapter ViewAbstractPublication PagessapConference Proceedingsconference-collections
research-article

Effects of virtual acoustics on target-word identification performance in multi-talker environments

Published:10 August 2018Publication History

ABSTRACT

Many virtual reality applications let multiple users communicate in a multi-talker environment, recreating the classic cocktail-party effect. While there is a vast body of research focusing on the perception and intelligibility of human speech in real-world scenarios with cocktail party effects, there is little work in accurately modeling and evaluating the effect in virtual environments. Given the goal of evaluating the impact of virtual acoustic simulation on the cocktail party effect, we conducted experiments to establish the signal-to-noise ratio (SNR) thresholds for target-word identification performance. Our evaluation was performed for sentences from the coordinate response measure corpus in presence of multi-talker babble. The thresholds were established under varying sound propagation and spatialization conditions. We used a state-of-the-art geometric acoustic system integrated into the Unity game engine to simulate varying conditions of reverberance (direct sound, direct sound & early reflections, direct sound and early reflections and late reverberation) and spatialization (mono, stereo, and binaural). Our results show that spatialization has the biggest effect on the ability of listeners to discern the target words in multi-talker virtual environments. Reverberance, on the other hand, slightly affects the target word discerning ability negatively.

Skip Supplemental Material Section

Supplemental Material

a16-rungta.avi

avi

16.8 MB

References

  1. V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano. 2001. The cipic hrtf database. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the. IEEE, 99--102.Google ScholarGoogle Scholar
  2. Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multimedia. (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher C Berger, Mar Gonzalez-Franco, Ana Tajadura-Jiménez, Dinei Florencio, and Zhengyou Zhang. 2018. Generic HRTFs may be good enough in Virtual Reality. Improving source localization through cross-modal plasticity. Frontiers in Neuroscience 12 (2018), 21.Google ScholarGoogle ScholarCross RefCross Ref
  4. Robert S Bolia, W Todd Nelson, Mark A Ericson, and Brian D Simpson. 2000. A speech corpus for multitalker communications research. The Journal of the Acoustical Society of America 107, 2 (2000), 1065--1066.Google ScholarGoogle ScholarCross RefCross Ref
  5. Radamis Botros, Onsy Abdel-Alim, and Peter Damaske. 1986. Stereophonic speech teleconferencing. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86., Vol. 11. IEEE, 1321--1324.Google ScholarGoogle Scholar
  6. JS Bradley, Hiroshi Sato, and M Picard. 2003. On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America 113, 6 (2003), 3233--3244.Google ScholarGoogle ScholarCross RefCross Ref
  7. AW Bronkhorst and R Plomp. 1988. The effect of head-induced interaural time and level differences on speech intelligibility in noise. The Journal of the Acoustical Society of America 83, 4 (1988), 1508--1516.Google ScholarGoogle ScholarCross RefCross Ref
  8. AW Bronkhorst and R Plomp. 1992. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. The Journal of the Acoustical Society of America 92, 6 (1992), 3132--3139.Google ScholarGoogle ScholarCross RefCross Ref
  9. Adelbert W Bronkhorst. 2015. The cocktail-party problem revisited: early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics 77, 5 (2015), 1465--1487.Google ScholarGoogle ScholarCross RefCross Ref
  10. Douglas S Brungart. 2001. Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America 109, 3 (2001), 1101--1109.Google ScholarGoogle ScholarCross RefCross Ref
  11. Anish Chandak, Christian Lauterbach, Micah Taylor, Zhimin Ren, and Dinesh Manocha. 2008. Ad-frustum: Adaptive frustum tracing for interactive sound propagation. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1707--1722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E Colin Cherry. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America 25, 5 (1953), 975--979.Google ScholarGoogle ScholarCross RefCross Ref
  13. Alain de Cheveigné. 1997. Concurrent vowel identification. III. A neural model of harmonic interference cancellation. The Journal of the Acoustical Society of America 101, 5 (1997), 2857--2865.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kai Crispien and Tasso Ehrenberg. 1995. Evaluation of the-cocktail-party effect-for multiple speech stimuli within a spatial auditory display. Journal of the Audio Engineering Society 43, 11 (1995), 932--941.Google ScholarGoogle Scholar
  15. John F Culling and Quentin Summerfield. 1995. Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. The Journal of the Acoustical Society of America 98, 2 (1995), 785--797.Google ScholarGoogle ScholarCross RefCross Ref
  16. Rob Drullman and Adelbert W Bronkhorst. 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. The Journal of the Acoustical Society of America 107, 4 (2000), 2224--2235.Google ScholarGoogle ScholarCross RefCross Ref
  17. Joost M Festen. 1993. Contributions of comodulation masking release and temporal resolution to the speech-reception threshold masked by an interfering voice. The Journal of the Acoustical Society of America 94, 3 (1993), 1295--1300.Google ScholarGoogle ScholarCross RefCross Ref
  18. Thomas Funkhouser, Ingrid Carlbom, Gary Elko, Gopal Pingali, Mohan Sondhi, and Jim West. 1998. A beam tracing approach to acoustic modeling for interactive virtual environments. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, 21--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. William G Gardner and Keith D Martin. 1995. HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97, 6 (1995), 3907--3908.Google ScholarGoogle ScholarCross RefCross Ref
  20. Michael A Gerzon. 1973. Periphony: With-height sound reproduction. Journal of the Audio Engineering Society 21, 1 (1973), 2--10.Google ScholarGoogle Scholar
  21. Mar Gonzalez-Franco, Antonella Maselli, Dinei Florencio, Nikolai Smolyanskiy, and Zhengyou Zhang. 2017. Concurrent talking in immersive virtual reality: on the dominance of visual speech cues. Scientific Reports 7, 1 (2017), 3817.Google ScholarGoogle ScholarCross RefCross Ref
  22. Monica L Hawley, Ruth Y Litovsky, and John F Culling. 2004. The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. The Journal of the Acoustical Society of America 115, 2 (2004), 833--843.Google ScholarGoogle ScholarCross RefCross Ref
  23. Brian Hilburn. 2004. Cognitive complexity in air traffic control: A literature review. EEC note 4, 04 (2004).Google ScholarGoogle Scholar
  24. Brian FG Katz. 2001. Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. The Journal of the Acoustical Society of America 110, 5 (2001), 2440--2448.Google ScholarGoogle ScholarCross RefCross Ref
  25. Janet Koehnke and Joan M Besing. 1996. A procedure for testing speech intelligibility in a virtual listening environment. Ear and Hearing 17, 3 (1996), 211--217.Google ScholarGoogle ScholarCross RefCross Ref
  26. Asbjørn Krokstad, Staffan Strom, and Svein Sørsdal. 1968. Calculating the acoustical room response by the use of a ray tracing technique. Journal of Sound and Vibration 8, 1 (1968), 118--125.Google ScholarGoogle ScholarCross RefCross Ref
  27. Christian Lauterbach, Anish Chandak, and Dinesh Manocha. 2007. Interactive sound rendering in complex and dynamic scenes using frustum tracing. IEEE Transactions on Visualization and Computer Graphics 13, 6 (2007), 1672--1679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HCCH Levitt. 1971. Transformed up-down methods in psychoacoustics. The Journal of the Acoustical society of America 49, 2B (1971), 467--477.Google ScholarGoogle ScholarCross RefCross Ref
  29. Robert A Lutfi. 1990. How much masking is informational masking? The Journal of the Acoustical Society of America 88, 6 (1990), 2607--2610.Google ScholarGoogle ScholarCross RefCross Ref
  30. John MacDonald and Harry McGurk. 1978. Visual influences on speech perception processes. Perception & Psychophysics 24, 3 (1978), 253--257.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ravish Mehra, Nikunj Raghuvanshi, Lakulish Antani, Anish Chandak, Sean Curtis, and Dinesh Manocha. 2013. Wave-based sound propagation in large open scenes using an equivalent source formulation. ACM Transactions on Graphics (TOG) 32, 2 (2013), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ravish Mehra, Atul Rungta, Abhinav Golas, Ming Lin, and Dinesh Manocha. 2015. Wave: Interactive wave-based sound propagation for virtual environments. IEEE transactions on visualization and computer graphics 21, 4 (2015), 434--442.Google ScholarGoogle Scholar
  33. Alok Meshram, Ravish Mehra, Hongsheng Yang, Enrique Dunn, Jan-Michael Franm, and Dinesh Manocha. 2014. P-HRTF: Efficient personalized HRTF computation for high-fidelity spatial sound. In Mixed and Augmented Reality (ISMAR), 2014 IEEE International Symposium on. IEEE, 53--61.Google ScholarGoogle ScholarCross RefCross Ref
  34. John P Moncur and Donald Dirks. 1967. Binaural and monaural speech intelligibility in reverberation. Journal of Speech, Language, and Hearing Research 10, 2 (1967), 186--195.Google ScholarGoogle ScholarCross RefCross Ref
  35. W Todd Nelson, Robert S Bolia, Mark A Ericson, and Richard L McKinley. 1999. Spatial audio displays for speech communications: A comparison of free field and virtual acoustic environments. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 43. SAGE Publications Sage CA: Los Angeles, CA, 1202--1205.Google ScholarGoogle ScholarCross RefCross Ref
  36. Irwin Pollack and JM Pickett. 1958. Stereophonic listening and speech intelligibility against voice babble. The Journal of the Acoustical Society of America 30, 2 (1958), 131--133.Google ScholarGoogle ScholarCross RefCross Ref
  37. Ville Pulkki. 1997. Virtual sound source positioning using vector base amplitude panning. Journal of the audio engineering society 45, 6 (1997), 456--466.Google ScholarGoogle Scholar
  38. Nikunj Raghuvanshi, Rahul Narain, and Ming C Lin. 2009. Efficient and accurate sound propagation using adaptive rectangular decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Atul Rungta, Carl Schissler, Nicholas Rewkowski, Ravish Mehra, and Dinesh Manocha. 2018. Diffraction Kernels for Interactive Sound Propagation in Dynamic Environments. IEEE transactions on visualization and computer graphics (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Carl Schissler and Dinesh Manocha. 2017. Interactive sound propagation and rendering for large multi-source scenes. ACM Transactions on Graphics (TOG) 36, 1 (2017), 2.Google ScholarGoogle ScholarCross RefCross Ref
  41. Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. ACM Transactions on Graphics (TOG) 33, 4 (2014), 39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Barbara G Shinn-Cunningham, Jason Schickler, Norbert Kopčo, and Ruth Litovsky. 2001. Spatial unmasking of nearby speech sources in a simulated anechoic environment. The Journal of the Acoustical Society of America 110, 2 (2001), 1118--1129.Google ScholarGoogle ScholarCross RefCross Ref
  43. Samuel Siltanen, Tapio Lokki, Sami Kiminki, and Lauri Savioja. 2007. The room acoustic rendering equation. The Journal of the Acoustical Society of America 122, 3 (September 2007), 1624--1635.Google ScholarGoogle ScholarCross RefCross Ref
  44. A. Southern, S. Siltanen, D. T. Murphy, and L. Savioja. 2013. Room Impulse Response Synthesis and Validation Using a Hybrid Acoustic Model. IEEE Transactions on Audio, Speech, and Language Processing 21, 9 (2013), 1940--1952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Quentin Summerfield and John F Culling. 1992. Periodicity of maskers not targets determines ease of perceptual segregation using differences in fundamental frequency. The Journal of the Acoustical Society of America 92, 4 (1992), 2317--2317.Google ScholarGoogle ScholarCross RefCross Ref
  46. Michael Vorländer. 1989. Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm. The Journal of the Acoustical Society of America 86, 1 (1989), 172--178.Google ScholarGoogle ScholarCross RefCross Ref
  47. Frederic L Wightman and Doris J Kistler. 1989. Headphone simulation of free-field listening. I: stimulus synthesis. The Journal of the Acoustical Society of America 85, 2 (1989), 858--867.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Effects of virtual acoustics on target-word identification performance in multi-talker environments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAP '18: Proceedings of the 15th ACM Symposium on Applied Perception
      August 2018
      162 pages
      ISBN:9781450358941
      DOI:10.1145/3225153

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate43of94submissions,46%

      Upcoming Conference

      SAP '24
      ACM Symposium on Applied Perception 2024
      August 30 - 31, 2024
      Dublin , Ireland

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader