skip to main content
10.1145/1291233.1291299acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Singing voice detection using perceptually-motivated features

Published: 29 September 2007 Publication History

Abstract

Perceptual features are motivated by human perception of sounds. In this paper, several perceptually-motivated features such as harmonic, vibrato and timbre are studied to detect singing voice segments in a song. In addition, singing formant and attack-decay envelope of the sound are also studied for acoustic feature formulation. The cepstral coefficients which reflect the timbre characteristics are formulated by combining information from harmonic content, vibrato, singing formant and attack-decay envelope of the sound. Bandpass filters that spread according to the octave frequency scale are used to extract vibrato and harmonic information. Several experiments are conducted using a database that includes 84 popular songs from commercially available CD recordings. The experiments show that the proposed feature formulation methods are effective.

References

[1]
Becchetti, C., and Ricotti, L. P. Speech Recognition Theory and C++ Implementation. New York: John Wiley & Sons, 1998
[2]
Everest, F. A. The Master Handbook of Acoustics. New York, McGraw-Hill, 2001.
[3]
Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T. and Okuno, H. G. F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2006, vol. 5, pp. V-253--V-256.
[4]
Goto, M. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, vol. 43, no. 4, pp. 311--329, September 2004.
[5]
Hackhaus, W. Die Ausgleichsvorgange. Zeitschrift fur Technische Physik, 1932.
[6]
Mellody, M., Herseth, F. and Wakefield, G. H. Modal distribution analysis, synthesis, and perception of a soprano's sung vowels. J. Voice, vol. 15, pp. 469--482, December 2001.
[7]
Nwe, T. L., Foo, S. W., and De Silva, L. C. Stress classification using subband based features. IEICE Trans. Information and Systems, Special Issue on Speech Information Processing, vol. E86-D, no.3, pp. 565--573, March 2003.
[8]
Nwe, T. L. and Li, H. Exploring vibrato-motivated acoustic features for singer identification. IEEE Transactions, Audio, Speech and Language Processing: vol. 15, no. 2, 2007.
[9]
Rabiner, L. R., and Juang, B. H. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993
[10]
Sundberg, J. The Acoustics of The Singing Voice, Scientific American, 1977.
[11]
Sundberg, J. The Science of Singing Voice. Northern Illinois University Press, 1987, ch. 8.
[12]
Tzanetakis, G. Song-specific bootstrapping of singing voice structure. IEEE Int. Conf. Multimedia and Expo, 2004.
[13]
Timmers, R., and Desain, P. Vibrato: Questions and answers from musicians and science. in Proc. Int. Conf. Music Perception and Cognition, England, 2000.
[14]
"Vibrato", Word of the Day. Answers Corporation, 2006. Answers.com 13 Dec. 2006. http://www.answers.com/topic/vibrato
[15]
Wakefield, G. H. and Bartsch, M. A. Where's Caruso? Singer identification by listener and machine. Cambridge Music Processing Colloquium, Cambridge, England, 2003.
[16]
Winckell, F. Music, sound and sensation. Dover, NY, 1967.
[17]
Zhang, T. System and method for automatic singer identification. IEEE Int. Conf. Multimedia and Expo, Baltimore, MD, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. harmonic
  2. singing formant
  3. singing voice
  4. timbre
  5. vibrato

Qualifiers

  • Article

Conference

MM07

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media