ABSTRACT
This paper investigates the classification of short user-generated videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by single-category soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with chi_square kernel also reports comparable results.
- D. Brezeale and D. J. Cook. Using closed captions and visual features to classify movies by genre. In MDM/KDD, San Jose, CA, 2006.Google Scholar
- D. Brezeale and D. J. Cook. Automatic video classification: A survey of the literature. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 38(3):416--430, 2008. Google ScholarDigital Library
- X. Cheng, C. Dale, and J. Liu. Statistics and social network of youtube videos. In IWQoS, pages 229--238, Enskede, Netherlands, 2008.Google ScholarCross Ref
- H. K. Ekenel, T. Semela, and R. Stiefelhagen. Content-based video genre classification using multiple cues. In AIEMPro, pages 21--26, Firenze, Italy, 2010. Google ScholarDigital Library
- R. Glasberg, S. Schmiedeke, M. Mocigemba, and T. Sikora. New real-time approaches for video-genre-classification using high-level descriptors and a set of classifiers. In ICSC, pages 120--127, Washington, DC, USA, 2008. Google ScholarDigital Library
- J. Guo, D. Scott, F. Hopfgartner, and C. Gurrin. Detecting complex events in user-generated video using concept classifiers. In CBMI, pages 177--182, Annecy, France, 2012.Google ScholarCross Ref
- J. R. Hershey and P. A. Olsen. VariationaluppercaseBhattacharyya divergence for hiddenuppercaseMarkov models. In ICASSP, pages 4557--4560, Las Vegas, Nevada, USA, 2008.Google Scholar
- B. Ionescu, K. Seyerlehner, C. Rasche, C. Vertan, and P. Lambert. Content-based video description for automatic video genre categorization. In MMM, pages 51--62, Klagenfurt, Austria, 2012. Google ScholarDigital Library
- M. Nancy. Manifesto for a new age. Wired Magazine, page 128, 2007.Google Scholar
- P. Over, G. Awad, M. Michel, J. Fiscus, W. Kraaij, A. F. Smeaton, and G. Qu?not. Trecvid 2011 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2011. NIST, USA, 2011.Google Scholar
- Q. D. Phung, C. Dorai, and S. Venkatesh. Video genre categorization using audio wavelet coefficients. In Fifth Asian Conference on Computer Vision, Melbourne, Australia, January 2002.Google Scholar
- M. Roach and J. S. D. Mason. Classification of video genre using audio. In INTERSPEECH, pages 2693--2696, Aalborg, Denmark, 2001.Google Scholar
- M. Rouvier, G. Linarès, and D. Matrouf. Robust audio-based classification of video genre. In INTERSPEECH, pages 1159--1162, Brighton, United Kingdom, 2009.Google Scholar
- M. Rouvier, G. Linares, and D. Matrouf. On-the-fly video genre classification by combination of audio features. In ICASSP, pages 45--48, Dallas, Texas, USA, 2010.Google ScholarCross Ref
- C. C. Tan, Y.-G. Jiang, and C.-W. Ngo. Towards textually describing complex video contents with audio-visual concept classifiers. In ACM Multimedia, pages 655--658, Scottsdale, AZ, USA, 2011. Google ScholarDigital Library
- J. You, G. Liu, and A. Perkis. A semantic framework for video genre classification and event analysis. Sig. Proc.: Image Communication, 25(4):287--302, 2010. Google ScholarDigital Library
- J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2):213--238, 2007. Google ScholarDigital Library
- N. Zhang and L. Guan. An efficient framework on large-scale video genre classification. In MMSP, pages 481--486, Saint Malo, France, 2010.Google ScholarCross Ref
- W. Zhu, C. Toklu, and S.-P. Liou. Automatic news video segmentation and categorization based on closed-captioned text. In ICME, Tokyo, Japan, 2001.Google Scholar
Index Terms
- Short user-generated videos classification using accompanied audio categories
Recommendations
Content-based singer classification on compressed domain audio data
In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral ...
Music genre classification using MIDI and audio features
We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI ...
Audio-based emotion recognition using GMM supervector an SVM linear kernel
ICMLSC '18: Proceedings of the 2nd International Conference on Machine Learning and Soft ComputingIn this paper, we present an audio-based emotion recognition model by using OpenSmile, Gaussian mixture models (GMMs) Supervector and Support vector machines (SVM) with Linear kernel. Features are extracted from audio characteristics of emotional video ...
Comments