research-article

Effects of virtual acoustics on target-word identification performance in multi-talker environments

Authors:
Atul Rungta

University of North Carolina

University of North Carolina
View Profile

,
Nicholas Rewkowski

University of North Carolina

University of North Carolina
View Profile

,
Carl Schissler

Oculus & Facebook

Oculus & Facebook
View Profile

,
Philip Robinson

Oculus & Facebook

Oculus & Facebook
View Profile

,
Ravish Mehra

Oculus & Facebook

Oculus & Facebook
View Profile

,
Dinesh Manocha

University of Maryland

University of Maryland
View Profile

SAP '18: Proceedings of the 15th ACM Symposium on Applied PerceptionAugust 2018Article No.: 16Pages 1–8https://doi.org/10.1145/3225153.3225166

Published:10 August 2018Publication History

SAP '18: Proceedings of the 15th ACM Symposium on Applied Perception

Pages 1–8

ABSTRACT

Many virtual reality applications let multiple users communicate in a multi-talker environment, recreating the classic cocktail-party effect. While there is a vast body of research focusing on the perception and intelligibility of human speech in real-world scenarios with cocktail party effects, there is little work in accurately modeling and evaluating the effect in virtual environments. Given the goal of evaluating the impact of virtual acoustic simulation on the cocktail party effect, we conducted experiments to establish the signal-to-noise ratio (SNR) thresholds for target-word identification performance. Our evaluation was performed for sentences from the coordinate response measure corpus in presence of multi-talker babble. The thresholds were established under varying sound propagation and spatialization conditions. We used a state-of-the-art geometric acoustic system integrated into the Unity game engine to simulate varying conditions of reverberance (direct sound, direct sound & early reflections, direct sound and early reflections and late reverberation) and spatialization (mono, stereo, and binaural). Our results show that spatialization has the biggest effect on the ability of listeners to discern the target words in multi-talker virtual environments. Reverberance, on the other hand, slightly affects the target word discerning ability negatively.

Supplemental Material

a16-rungta.avi

avi

16.8 MB

Download

References

V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano. 2001. The cipic hrtf database. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the. IEEE, 99--102.Google Scholar
Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multimedia. (2000). Google ScholarDigital Library
Christopher C Berger, Mar Gonzalez-Franco, Ana Tajadura-Jiménez, Dinei Florencio, and Zhengyou Zhang. 2018. Generic HRTFs may be good enough in Virtual Reality. Improving source localization through cross-modal plasticity. Frontiers in Neuroscience 12 (2018), 21.Google ScholarCross Ref
Robert S Bolia, W Todd Nelson, Mark A Ericson, and Brian D Simpson. 2000. A speech corpus for multitalker communications research. The Journal of the Acoustical Society of America 107, 2 (2000), 1065--1066.Google ScholarCross Ref
Radamis Botros, Onsy Abdel-Alim, and Peter Damaske. 1986. Stereophonic speech teleconferencing. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86., Vol. 11. IEEE, 1321--1324.Google Scholar
JS Bradley, Hiroshi Sato, and M Picard. 2003. On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America 113, 6 (2003), 3233--3244.Google ScholarCross Ref
AW Bronkhorst and R Plomp. 1988. The effect of head-induced interaural time and level differences on speech intelligibility in noise. The Journal of the Acoustical Society of America 83, 4 (1988), 1508--1516.Google ScholarCross Ref
AW Bronkhorst and R Plomp. 1992. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. The Journal of the Acoustical Society of America 92, 6 (1992), 3132--3139.Google ScholarCross Ref
Adelbert W Bronkhorst. 2015. The cocktail-party problem revisited: early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics 77, 5 (2015), 1465--1487.Google ScholarCross Ref
Douglas S Brungart. 2001. Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America 109, 3 (2001), 1101--1109.Google ScholarCross Ref
Anish Chandak, Christian Lauterbach, Micah Taylor, Zhimin Ren, and Dinesh Manocha. 2008. Ad-frustum: Adaptive frustum tracing for interactive sound propagation. IEEE Transactions on Visualization and Computer Graphics 14, 6 (2008), 1707--1722. Google ScholarDigital Library
E Colin Cherry. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America 25, 5 (1953), 975--979.Google ScholarCross Ref
Alain de Cheveigné. 1997. Concurrent vowel identification. III. A neural model of harmonic interference cancellation. The Journal of the Acoustical Society of America 101, 5 (1997), 2857--2865.Google ScholarCross Ref
Kai Crispien and Tasso Ehrenberg. 1995. Evaluation of the-cocktail-party effect-for multiple speech stimuli within a spatial auditory display. Journal of the Audio Engineering Society 43, 11 (1995), 932--941.Google Scholar
John F Culling and Quentin Summerfield. 1995. Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. The Journal of the Acoustical Society of America 98, 2 (1995), 785--797.Google ScholarCross Ref
Rob Drullman and Adelbert W Bronkhorst. 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. The Journal of the Acoustical Society of America 107, 4 (2000), 2224--2235.Google ScholarCross Ref
Joost M Festen. 1993. Contributions of comodulation masking release and temporal resolution to the speech-reception threshold masked by an interfering voice. The Journal of the Acoustical Society of America 94, 3 (1993), 1295--1300.Google ScholarCross Ref
Thomas Funkhouser, Ingrid Carlbom, Gary Elko, Gopal Pingali, Mohan Sondhi, and Jim West. 1998. A beam tracing approach to acoustic modeling for interactive virtual environments. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, 21--32. Google ScholarDigital Library
William G Gardner and Keith D Martin. 1995. HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97, 6 (1995), 3907--3908.Google ScholarCross Ref
Michael A Gerzon. 1973. Periphony: With-height sound reproduction. Journal of the Audio Engineering Society 21, 1 (1973), 2--10.Google Scholar
Mar Gonzalez-Franco, Antonella Maselli, Dinei Florencio, Nikolai Smolyanskiy, and Zhengyou Zhang. 2017. Concurrent talking in immersive virtual reality: on the dominance of visual speech cues. Scientific Reports 7, 1 (2017), 3817.Google ScholarCross Ref
Monica L Hawley, Ruth Y Litovsky, and John F Culling. 2004. The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. The Journal of the Acoustical Society of America 115, 2 (2004), 833--843.Google ScholarCross Ref
Brian Hilburn. 2004. Cognitive complexity in air traffic control: A literature review. EEC note 4, 04 (2004).Google Scholar
Brian FG Katz. 2001. Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. The Journal of the Acoustical Society of America 110, 5 (2001), 2440--2448.Google ScholarCross Ref
Janet Koehnke and Joan M Besing. 1996. A procedure for testing speech intelligibility in a virtual listening environment. Ear and Hearing 17, 3 (1996), 211--217.Google ScholarCross Ref
Asbjørn Krokstad, Staffan Strom, and Svein Sørsdal. 1968. Calculating the acoustical room response by the use of a ray tracing technique. Journal of Sound and Vibration 8, 1 (1968), 118--125.Google ScholarCross Ref
Christian Lauterbach, Anish Chandak, and Dinesh Manocha. 2007. Interactive sound rendering in complex and dynamic scenes using frustum tracing. IEEE Transactions on Visualization and Computer Graphics 13, 6 (2007), 1672--1679. Google ScholarDigital Library
HCCH Levitt. 1971. Transformed up-down methods in psychoacoustics. The Journal of the Acoustical society of America 49, 2B (1971), 467--477.Google ScholarCross Ref
Robert A Lutfi. 1990. How much masking is informational masking? The Journal of the Acoustical Society of America 88, 6 (1990), 2607--2610.Google ScholarCross Ref
John MacDonald and Harry McGurk. 1978. Visual influences on speech perception processes. Perception & Psychophysics 24, 3 (1978), 253--257.Google ScholarCross Ref
Ravish Mehra, Nikunj Raghuvanshi, Lakulish Antani, Anish Chandak, Sean Curtis, and Dinesh Manocha. 2013. Wave-based sound propagation in large open scenes using an equivalent source formulation. ACM Transactions on Graphics (TOG) 32, 2 (2013), 19. Google ScholarDigital Library
Ravish Mehra, Atul Rungta, Abhinav Golas, Ming Lin, and Dinesh Manocha. 2015. Wave: Interactive wave-based sound propagation for virtual environments. IEEE transactions on visualization and computer graphics 21, 4 (2015), 434--442.Google Scholar
Alok Meshram, Ravish Mehra, Hongsheng Yang, Enrique Dunn, Jan-Michael Franm, and Dinesh Manocha. 2014. P-HRTF: Efficient personalized HRTF computation for high-fidelity spatial sound. In Mixed and Augmented Reality (ISMAR), 2014 IEEE International Symposium on. IEEE, 53--61.Google ScholarCross Ref
John P Moncur and Donald Dirks. 1967. Binaural and monaural speech intelligibility in reverberation. Journal of Speech, Language, and Hearing Research 10, 2 (1967), 186--195.Google ScholarCross Ref
W Todd Nelson, Robert S Bolia, Mark A Ericson, and Richard L McKinley. 1999. Spatial audio displays for speech communications: A comparison of free field and virtual acoustic environments. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 43. SAGE Publications Sage CA: Los Angeles, CA, 1202--1205.Google ScholarCross Ref
Irwin Pollack and JM Pickett. 1958. Stereophonic listening and speech intelligibility against voice babble. The Journal of the Acoustical Society of America 30, 2 (1958), 131--133.Google ScholarCross Ref
Ville Pulkki. 1997. Virtual sound source positioning using vector base amplitude panning. Journal of the audio engineering society 45, 6 (1997), 456--466.Google Scholar
Nikunj Raghuvanshi, Rahul Narain, and Ming C Lin. 2009. Efficient and accurate sound propagation using adaptive rectangular decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789--801. Google ScholarDigital Library
Atul Rungta, Carl Schissler, Nicholas Rewkowski, Ravish Mehra, and Dinesh Manocha. 2018. Diffraction Kernels for Interactive Sound Propagation in Dynamic Environments. IEEE transactions on visualization and computer graphics (2018). Google ScholarDigital Library
Carl Schissler and Dinesh Manocha. 2017. Interactive sound propagation and rendering for large multi-source scenes. ACM Transactions on Graphics (TOG) 36, 1 (2017), 2.Google ScholarCross Ref
Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. ACM Transactions on Graphics (TOG) 33, 4 (2014), 39. Google ScholarDigital Library
Barbara G Shinn-Cunningham, Jason Schickler, Norbert Kopčo, and Ruth Litovsky. 2001. Spatial unmasking of nearby speech sources in a simulated anechoic environment. The Journal of the Acoustical Society of America 110, 2 (2001), 1118--1129.Google ScholarCross Ref
Samuel Siltanen, Tapio Lokki, Sami Kiminki, and Lauri Savioja. 2007. The room acoustic rendering equation. The Journal of the Acoustical Society of America 122, 3 (September 2007), 1624--1635.Google ScholarCross Ref
A. Southern, S. Siltanen, D. T. Murphy, and L. Savioja. 2013. Room Impulse Response Synthesis and Validation Using a Hybrid Acoustic Model. IEEE Transactions on Audio, Speech, and Language Processing 21, 9 (2013), 1940--1952. Google ScholarDigital Library
Quentin Summerfield and John F Culling. 1992. Periodicity of maskers not targets determines ease of perceptual segregation using differences in fundamental frequency. The Journal of the Acoustical Society of America 92, 4 (1992), 2317--2317.Google ScholarCross Ref
Michael Vorländer. 1989. Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm. The Journal of the Acoustical Society of America 86, 1 (1989), 172--178.Google ScholarCross Ref
Frederic L Wightman and Doris J Kistler. 1989. Headphone simulation of free-field listening. I: stimulus synthesis. The Journal of the Acoustical Society of America 85, 2 (1989), 858--867.Google ScholarCross Ref

Index Terms

Effects of virtual acoustics on target-word identification performance in multi-talker environments
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation evaluation

Recommendations

Psychoacoustic Characterization of Propagation Effects in Virtual Environments
Special Issue SAP 2016

As sound propagation algorithms become faster and more accurate, the question arises as to whether the additional efforts to improve fidelity actually offer perceptual benefits over existing techniques. Could environmental sound effects go the way of ...
Read More
Data-driven feedback delay network construction for real-time virtual room acoustics
AM '20: Proceedings of the 15th International Audio Mostly Conference

For virtual and augmented reality applications, it is desirable to render audio sources in the space the user is in, in real-time without sacrificing the perceptual quality of the sound. One aspect of the rendering that is perceptually important for a ...
Read More
Using bat-modelled sonar as a navigational tool in virtual environments

Bats are able to use active sonar as a mechanism for locating object in three dimensions and for generating spatial maps of their environments. Humans use passive sound cues to detect features of the space they occupy, as well as react to the spatial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAP '18: Proceedings of the 15th ACM Symposium on Applied Perception
August 2018
162 pages
ISBN:9781450358941
DOI:10.1145/3225153
Conference Chairs:
Cindy Grimm
Oregon State University
,
Peter Willemsen
University of Minnesota
,
Program Chairs:
Joseph Kearney
University of Iowa
,
Bernhard Riecke
Simon Fraser University
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
auditory perception
virtual acoustics
virtual environments/reality
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate43of94submissions,46%
Upcoming Conference
SAP '24

Sponsor:

siggraph

ACM Symposium on Applied Perception 2024

August 30 - 31, 2024

Dublin , Ireland
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 128
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effects of virtual acoustics on target-word identification performance in multi-talker environments

SAP '18: Proceedings of the 15th ACM Symposium on Applied Perception

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Psychoacoustic Characterization of Propagation Effects in Virtual Environments

Data-driven feedback delay network construction for real-time virtual room acoustics

Using bat-modelled sonar as a navigational tool in virtual environments