research-article

Short user-generated videos classification using accompanied audio categories

Authors:
Jinlin Guo

CLARITY and School of Computing, Dublin City University, Dublin, Ireland

CLARITY and School of Computing, Dublin City University, Dublin, Ireland
View Profile

,
Cathal Gurrin

CLARITY and School of Computing, Dublin City University, Dublin, Ireland

CLARITY and School of Computing, Dublin City University, Dublin, Ireland
View Profile

AMVA '12: Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysisNovember 2012Pages 15–20https://doi.org/10.1145/2390214.2390220

Published:02 November 2012Publication History

AMVA '12: Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis

Pages 15–20

ABSTRACT

This paper investigates the classification of short user-generated videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by single-category soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with chi_square kernel also reports comparable results.

References

D. Brezeale and D. J. Cook. Using closed captions and visual features to classify movies by genre. In MDM/KDD, San Jose, CA, 2006.Google Scholar
D. Brezeale and D. J. Cook. Automatic video classification: A survey of the literature. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 38(3):416--430, 2008. Google ScholarDigital Library
X. Cheng, C. Dale, and J. Liu. Statistics and social network of youtube videos. In IWQoS, pages 229--238, Enskede, Netherlands, 2008.Google ScholarCross Ref
H. K. Ekenel, T. Semela, and R. Stiefelhagen. Content-based video genre classification using multiple cues. In AIEMPro, pages 21--26, Firenze, Italy, 2010. Google ScholarDigital Library
R. Glasberg, S. Schmiedeke, M. Mocigemba, and T. Sikora. New real-time approaches for video-genre-classification using high-level descriptors and a set of classifiers. In ICSC, pages 120--127, Washington, DC, USA, 2008. Google ScholarDigital Library
J. Guo, D. Scott, F. Hopfgartner, and C. Gurrin. Detecting complex events in user-generated video using concept classifiers. In CBMI, pages 177--182, Annecy, France, 2012.Google ScholarCross Ref
J. R. Hershey and P. A. Olsen. VariationaluppercaseBhattacharyya divergence for hiddenuppercaseMarkov models. In ICASSP, pages 4557--4560, Las Vegas, Nevada, USA, 2008.Google Scholar
B. Ionescu, K. Seyerlehner, C. Rasche, C. Vertan, and P. Lambert. Content-based video description for automatic video genre categorization. In MMM, pages 51--62, Klagenfurt, Austria, 2012. Google ScholarDigital Library
M. Nancy. Manifesto for a new age. Wired Magazine, page 128, 2007.Google Scholar
P. Over, G. Awad, M. Michel, J. Fiscus, W. Kraaij, A. F. Smeaton, and G. Qu?not. Trecvid 2011 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2011. NIST, USA, 2011.Google Scholar
Q. D. Phung, C. Dorai, and S. Venkatesh. Video genre categorization using audio wavelet coefficients. In Fifth Asian Conference on Computer Vision, Melbourne, Australia, January 2002.Google Scholar
M. Roach and J. S. D. Mason. Classification of video genre using audio. In INTERSPEECH, pages 2693--2696, Aalborg, Denmark, 2001.Google Scholar
M. Rouvier, G. Linarès, and D. Matrouf. Robust audio-based classification of video genre. In INTERSPEECH, pages 1159--1162, Brighton, United Kingdom, 2009.Google Scholar
M. Rouvier, G. Linares, and D. Matrouf. On-the-fly video genre classification by combination of audio features. In ICASSP, pages 45--48, Dallas, Texas, USA, 2010.Google ScholarCross Ref
C. C. Tan, Y.-G. Jiang, and C.-W. Ngo. Towards textually describing complex video contents with audio-visual concept classifiers. In ACM Multimedia, pages 655--658, Scottsdale, AZ, USA, 2011. Google ScholarDigital Library
J. You, G. Liu, and A. Perkis. A semantic framework for video genre classification and event analysis. Sig. Proc.: Image Communication, 25(4):287--302, 2010. Google ScholarDigital Library
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2):213--238, 2007. Google ScholarDigital Library
N. Zhang and L. Guan. An efficient framework on large-scale video genre classification. In MMSP, pages 481--486, Saint Malo, France, 2010.Google ScholarCross Ref
W. Zhu, C. Toklu, and S.-P. Liou. Automatic news video segmentation and categorization based on closed-captioned text. In ICME, Tokyo, Japan, 2001.Google Scholar

Index Terms

Short user-generated videos classification using accompanied audio categories
1. Computing methodologies
  1. Machine learning

Recommendations

Content-based singer classification on compressed domain audio data

In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral ...
Read More
Music genre classification using MIDI and audio features

We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI ...
Read More
Audio-based emotion recognition using GMM supervector an SVM linear kernel
ICMLSC '18: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing

In this paper, we present an audio-based emotion recognition model by using OpenSmile, Gaussian mixture models (GMMs) Supervector and Support vector machines (SVM) with Linear kernel. Features are extracted from audio characteristics of emotional video ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AMVA '12: Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
November 2012
42 pages
ISBN:9781450315852
DOI:10.1145/2390214
General Chairs:
Gerald Friedland
International Computer Science Institute and UC Berkeley, USA
,
Dan Ellis
Columbia University, USA
,
Florian Metze
Carnegie Mellon University, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
mfcc
user-generated video
video classification
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 107
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Short user-generated videos classification using accompanied audio categories

AMVA '12: Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Content-based singer classification on compressed domain audio data

Music genre classification using MIDI and audio features

Audio-based emotion recognition using GMM supervector an SVM linear kernel