Abstract
Sound synthesis is the process of generating artificial sounds through some form of simulation or modelling. This article aims to identify which sound synthesis methods achieve the goal of producing a believable audio sample that may replace a recorded sound sample. A perceptual evaluation experiment of five different sound synthesis techniques was undertaken. Additive synthesis, statistical modelling synthesis with two different feature sets, physically inspired synthesis, concatenative synthesis, and sinusoidal modelling synthesis were all compared. Evaluation using eight different sound class stimuli and 66 different samples was undertaken. The additive synthesizer is the only synthesis method not considered significantly different from the reference sample across all sounds classes. The results demonstrate that sound synthesis can be considered as realistic as a recorded sample and makes recommendations for use of synthesis methods, given different sound class contexts.
- Xavier Amatriain, Jordi Bonada, Alex Loscos, and Xavier Serra. 2002. Spectral processing. In DAFx: Digital Audio Effects, Udo Zölzer (Ed.). John Wiley and Sons, Ltd., Chichester, UK, Chapter 10, 373--438.Google Scholar
- Mitsuko Aramaki, Richard Kronland-Martinet, and Sølvi Ystad. 2012. Perceptual control of environmental sound synthesis. In Speech, Sound and Music Processing: Embracing Research in India. Springer, Berlin, 172--186. Google ScholarDigital Library
- James A. Ballas. 1993. Common factors in the identification of an assortment of brief everyday sounds. J. Exp. Psychol. Hum. Percept. Perf. 19, 2 (1993), 250.Google ScholarCross Ref
- Stefan Bilbao. 2009. Numerical Sound Synthesis: Finite Difference Schemes and Simulations in Musical Acoustics. Wiley Online Library. Google ScholarDigital Library
- Stefan Bilbao and John Chick. 2013. Finite difference time domain simulation for the brass instrument bore. J. Acoust. Soc. Am. 134, 5 (2013), 3860--3871.Google ScholarCross Ref
- Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma, Justin Salamon, José R. Zapata, and Xavier Serra. 2013. Essentia: An audio analysis library for music information retrieval. In Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR’13). 493--498.Google Scholar
- Terri L. Bonebright, Nadine E. Miner, Timothy E. Goldsmith, and Thomas P. Caudell. 2005. Data collection and analysis techniques for evaluating the perceptual qualities of auditory stimuli. ACM Trans. Appl. Percept. 2, 4 (2005), 505--516. Google ScholarDigital Library
- Niels Böttcher, Héctor P. Martínez, and Stefania Serafin. 2013. Procedural audio in computer games using motion controllers: An evaluation on the effect and perception. International Journal of Computer Games Technology 2013 (2013), Article ID 371374, 16 pages. Google ScholarDigital Library
- Niels Böttcher and Stefania Serafin. 2009. Design and evaluation of physically inspired models of sound effects in computer games. In Proceedings of the 35th International Conference of the Audio Engineering Society Conference: Audio for Games. AES, London.Google Scholar
- B. Caramiaux, F. Bevilacqua, T. Bianco, N. Schnell, O. Houix, and P. Susini. 2014. The role of sound source perception in gestural sound description. ACM Trans. Appl. Percept. 11, 1 (Apr. 2014), 1:1--1:19. Google ScholarDigital Library
- Perry R. Cook. 2007. Real sound synthesis for interactive applications. Google ScholarDigital Library
- Andy Farnell. 2010. Designing Sound. MIT Press Cambridge, UK. Google ScholarDigital Library
- Martin Fröjd and Andrew Horner. 2009. Sound texture synthesis using an overlap--add/granular synthesis approach. J. Audio Eng. Soc. 57, 1/2 (2009), 29--37.Google Scholar
- Leonardo Gabrielli, Stefano Squartini, and Vesa Välimäki. 2011. A subjective validation method for musical instrument emulation. In Proceedings of the 131st Audio Engineering Society Convention.Google Scholar
- Henrik Hahn. 2015. Expressive Sampling Synthesis-Learning Extended Source--Filter Models from Instrument Sound Databases for Expressive Sample Manipulations. Ph.D. Dissertation. UPMC Université Paris VI.Google Scholar
- Brahim Hamadicharef and Emmanuel Ifeachor. 2003. Objective prediction of sound synthesis quality. In Proceedings of the 115th Audio Engineering Society Convention.Google Scholar
- Brahim Hamadicharef and Emmanuel Ifeachor. 2005. Perceptual modeling of piano tones. In Proceedings of the Audio Engineering Society Convention 119.Google Scholar
- Christian Heinrichs and Andrew McPherson. 2014. Mapping and interaction strategies for performing environmental sound. In Proceedings of the 1st Workshop on Sonic Interactions for Virtual Environments at IEEE VR 2014.Google ScholarCross Ref
- Sebastian Heise, Michael Hlatky, and Jörn Loviscach. 2009. Automatic cloning of recorded sounds by software synthesizers. In Proceedings of the Audio Engineering Society Convention 127. AES, New York, NY.Google Scholar
- Simon Hendry and Joshua D. Reiss. 2010. Physical modeling and synthesis of motor noise for replication of a sound effects library. In Proceedings of the Audio Engineering Society Convention 129.Google Scholar
- Matthew D. Hoffman and Perry R. Cook. 2006a. Feature-based synthesis: A tool for evaluating, designing, and interacting with music IR systems. In Proceedings of the International Symposium on Music Information Retrieval (ISMIR’06). 361--362.Google Scholar
- Matthew D. Hoffman and Perry R. Cook. 2006b. Feature-based synthesis: Mapping acoustic and perceptual features onto synthesis parameters. In Proceedings of the International Computer Music Conference (ICMC’06).Google Scholar
- Andrew Horner and Simon Wun. 2006. Evaluation of iterative matching for scalable wavetable synthesis. In Proceedings of the 29th International Conference of the Audio Engineering Society : Audio for Mobile and Handheld Devices.Google Scholar
- ITU-R BS.1387-1. 1998. BS. 1387, Method for Objective Measurements of Perceived Audio Quality. Technical Report. ITU-R.Google Scholar
- ITU-R BS.1534-3. 2015. BS. 1534, Method for Subjective Assessment of Intermediate Quality Level of Audio Systems. Technical Report. ITU-R.Google Scholar
- David A. Jaffe. 1995. Ten criteria for evaluating synthesis techniques. Comput. Music J. 19, 1 (1995), 76--87.Google ScholarCross Ref
- Hanna Järveläinen, Tony Verma, and Vesa Välimäki. 2002. Perception and adjustment of pitch in inharmonic string instrument tones. J. New Music Res. 31, 4 (2002), 311--319.Google ScholarCross Ref
- Nicholas Jillings, Brecht De Man, David Moffat, and Joshua D. Reiss. 2015. Web audio evaluation tool: A browser-based listening test environment. In Proceedings of the Conference on Sound and Music Computing 2015.Google Scholar
- Nicholas Jillings, Brecht De Man, David Moffat, and Joshua D. Reiss. 2016. Web audio evaluation tool: A framework for subjective assessment of audio. In Proceedings of the 2nd Web Audio Conference.Google Scholar
- Stephen Lakatos, Stephen McAdams, and René Caussé. 1997. The representation of auditory source characteristics: Simple geometric form. Attention Percept. Psychophys. 59, 8 (1997), 1180--1190.Google ScholarCross Ref
- Xiaojuan Ma, Christiane Fellbaum, and Perry R. Cook. 2010. SoundNet: Investigating a language composed of environmental sounds. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1945--1954. Google ScholarDigital Library
- Josh H. McDermott, Andrew J. Oxenham, and Eero P. Simoncelli. 2009. Sound texture synthesis via filter statistics. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009 (WASPAA’09). New Paltz, NY, 297--300.Google Scholar
- Josh H. McDermott and Eero P. Simoncelli. 2011. Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis. Neuron 71, 5 (2011), 926--940.Google ScholarCross Ref
- Lucas Mengual, David Moffat, and Joshua D. Reiss. 2016. Modal synthesis of weapon sounds. In Proceedings of the 61st International Conference of the Audio Engineering Society: Audio for Games. Audio Engineering Society, London.Google Scholar
- Adrien Merer, Mitsuko Aramaki, Sølvi Ystad, and Richard Kronland-Martinet. 2013. Perceptual characterization of motion evoked by sounds for synthesis control purposes. ACM Trans. Appl. Percept. 10, 1 (Mar. 2013), 1--24. Google ScholarDigital Library
- Adrien Merer, Sølvi Ystad, Richard Kronland-Martinet, and Mitsuko Aramaki. 2011. Abstract sounds and their applications in audio and perception research. International Symposium on Computer Music Modeling and Retrieval CMMR 2010: Exploring Music Contents (2011), 176–187. Google ScholarDigital Library
- Nadine E. Miner and Thomas P. Caudell. 2005. Using wavelets to synthesize stochastic-based sounds for immersive virtual environments. ACM Trans. Appl. Percept. 2, 4 (Oct. 2005), 521--528. Google ScholarDigital Library
- A. Misra and P. R. Cook. 2009. Toward synthesized environments: A survey of analysis and synthesis methods for sound designers and composers. In Proceedings of the International Computer Music Conference (ICMC’09).Google Scholar
- David Moffat, David Ronan, and Joshusa D. Reiss. 2015. An evaluation of audio feature extraction toolboxes. In Proceedings of the 18th International Conference on Digital Audio Effects (DAFx’15).Google Scholar
- David Moffat, David Ronan, and Joshusa D. Reiss. 2017. Unsupervised taxonomy of sound effects. In Proceedings of the 20th International Conference on Digital Audio Effects (DAFx’17).Google Scholar
- Emma Murphy, Mathieu Lagrange, Gary Scavone, Philippe Depalle, and Catherine Guastavino. 2008. Perceptual evaluation of a real-time synthesis technique for rolling sounds. In Proceedings of the Conference on Enactive Interfaces. Interactive Design Foundation, Pisa, Italy.Google Scholar
- Rolf Nordahl, Stefania Serafin, and Luca Turchet. 2010. Sound synthesis and evaluation of interactive footsteps for virtual reality applications. In Proceedings of the IEEE Virtual Reality Conference. IEEE, 147--153. Google ScholarDigital Library
- Sean O’Leary and Axel Robel. 2014. A montage approach to sound texture synthesis. In Proceedings of the 22nd European Signal Processing Conference (EUSIPCO’14). IEEE, 939--943.Google Scholar
- Juan Pampin. 2004. ATS: A system for sound analysis transformation and synthesis based on a sinusoidal plus critical-band noise model and psychoacoustics. In Proceedings of the International Computer Music Conference, Vol. 1001. 402--405.Google Scholar
- Leevi Peltola, Cumhur Erkut, P. R. Cook, and Vesa Valimaki. 2007. Synthesis of hand clapping sounds. IEEE Trans. Audio Speech Lang. Process. 15, 3 (2007), 1021--1029. Google ScholarDigital Library
- Vytis Puronas. 2014. Sonic hyperrealism: Illusions of a non-existent aural reality. New Soundtr. 4, 2 (2014), 181--194.Google ScholarCross Ref
- Davide Rocchesso, Roberto Bresin, and Mikael Fernstrom. 2003. Sounding objects. IEEE MultiMedia 10, 2 (2003), 42--52. Google ScholarDigital Library
- Davide Rocchesso and Federico Fontana. 2003. The Sounding Object. Mondo estremo.Google Scholar
- G. Scavone, Stephen Lakatos, P. Cook, and Colin Harbke. 2001. Perceptual spaces for sound effects obtained with an interactive similarity rating program. In Proceedings of International Symposium on Musical Acoustics.Google Scholar
- Diemo Schwarz. 2011. State of the art in sound texture synthesis. In Proceedings of the 14th International Conference Digital Audio Effects (DAFx’11). 221--231.Google Scholar
- Diemo Schwarz, Axel Roebel, Hengchin Yeh, and Amaury La Burthe. 2016. Concatenative sound texture synthesis methods and evaluation. In Proceedings of the 19th International Conference on Digital Audio Effects (DAFx’16).Google Scholar
- Rod Selfridge, David Moffat, Eldad J. Avital, and Joshua D. Reiss. 2017d. Creating real-time aeroacoustic sound effects using physically derived models. (Unpublished).Google Scholar
- Rod Selfridge, David Moffat, and Joshua D. Reiss. 2017a. Physically derived sound synthesis model of a propeller. In Proceedings of the 12th International Audio Mostly Conference. ACM. Google ScholarDigital Library
- Rod Selfridge, David Moffat, and Joshua D. Reiss. 2017b. Real-time physical model for synthesis of sword swing sounds. In Proceedings of the International Conference on Sound and Music Computing (SMC’17). Espoo, Finland.Google Scholar
- Rod Selfridge, David Moffat, and Joshua D. Reiss. 2017c. Sound synthesis of objects swinging through air using physical models. Applied Sciences.Google Scholar
- Rod Selfridge, David Moffat, Joshua D. Reiss, and Eldad J. Avital. 2017e. Real-time physical model for an aeolian harp. In Proceedings of the International Congress on Sound and Vibration. London, UK.Google Scholar
- Xavier Serra and Julius Smith. 1990. Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Comput. Music J. 14, 4 (1990), 12--24.Google ScholarCross Ref
- Thilo Thiede, William C. Treurniet, and others. 2000. PEAQ-The ITU standard for objective measurement of perceived audio quality. J. Audio Eng. Soc. 48, 1/2 (2000), 3--29.Google Scholar
- Tero Tolonen, Vesa Välimäki, and Matti Karjalainen. 1998. Evaluation of Modern Sound Synthesis Methods. Technical Report. Helsinki University of Technology.Google Scholar
- Charles Verron, Mitsuko Aramaki, and others. 2010. A 3D immersive synthesizer for environmental sounds. IEEE Trans. Audio. Speech Lang. Process. 18, 6 (2010), 1550--1561.Google ScholarDigital Library
Index Terms
- Perceptual Evaluation of Synthesized Sound Effects
Recommendations
Toward Generating Realistic Sounds for Soft Bodies: A Review
AM '19: Proceedings of the 14th International Audio Mostly Conference: A Journey in SoundGenerating realistic sounds for soft bodies is a challenging task due to the complexity of the interactions. Therefore, automatic audio generation based on procedural approach has become an attractive method for digital synthesis of soft-body sounds. In ...
Physically-based statistical simulation of rain sound
A typical rainfall scenario contains tens of thousands of dynamic sound sources. A characteristic of the large-scale scene is the strong randomness in raindrop distribution, which makes it notoriously expensive to synthesize such sounds with purely ...
Example-guided physically based modal sound synthesis
Linear modal synthesis methods have often been used to generate sounds for rigid bodies. One of the key challenges in widely adopting such techniques is the lack of automatic determination of satisfactory material parameters that recreate realistic ...
Comments