skip to main content
10.1145/2502081.2502224acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Recent developments in openSMILE, the munich open-source multimedia feature extractor

Published:21 October 2013Publication History

ABSTRACT

We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical flow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.

References

  1. C.-C. Chang and C.-J. Lin. LibSVM: a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Duan, G. J. Mysore, and P. Smaragdis. Speech enhancement by online non-negative spectrogram decomposition in non-stationary noise environments. In Proc. of Interspeech, Portland, OR, USA, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  3. F. Eyben, F. Weninger, N. Lehment, G. Rigoll, and B. Schuller. Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets. In Proceedings MediaEval 2012 Workshop, Pisa, Italy, October 2012. 2 pages.Google ScholarGoogle Scholar
  4. F. Eyben, M. Wollmer, and B. Schuller. openSMILE -- The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. of ACM MM, pages 1459--1462, Florence, Italy, October 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5-6):602--610, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Maas, A. Schwarz, Y. Zheng, K. Reindl, S. Meier, A. Sehr, and W. Kellermann. A Two-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments. In Proc. of CHiME, pages 41--46, 2011.Google ScholarGoogle Scholar
  7. B. Schuller. The Computational Paralinguistics Challenge. IEEE Signal Processing Magazine, 29(4):97--101, July 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al. The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Con ict, Emotion, Autism. In Proc. of INTERSPEECH, Lyon, France, August 2013. ISCA. in press.Google ScholarGoogle Scholar
  9. F. Weninger, F. Eyben, B. W. Schuller, M. Mortillaro, and K. R. Scherer. On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Emotion Science, 2013. DOI: 10.3389/fpsyg.2013.00292, in press.Google ScholarGoogle ScholarCross RefCross Ref
  10. F. Weninger, C. Wagner, M. Wollmer, B. Schuller, and L.-P. Morency. Speaker Trait Characterization in Web Videos: Uniting Speech, Language, and Facial Features. In Proc. of ICASSP, Vancouver, Canada, May 2013. IEEE. in press.Google ScholarGoogle ScholarCross RefCross Ref
  11. I. H. Witten and E. Frank. Data mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK book (v3.4). Cambridge University Press, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Recent developments in openSMILE, the munich open-source multimedia feature extractor

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader