skip to main content
10.1145/1835449.1835555acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Effective music tagging through advanced statistical modeling

Published: 19 July 2010 Publication History

Abstract

Music information retrieval (MIR) holds great promise as a technology for managing large music archives. One of the key components of MIR that has been actively researched into is music tagging. While significant progress has been achieved, most of the existing systems still adopt a simple classification approach, and apply machine learning classifiers directly on low level acoustic features. Consequently, they suffer the shortcomings of (1) poor accuracy, (2) lack of comprehensive evaluation results and the associated analysis based on large scale datasets, and (3) incomplete content representation, arising from the lack of multimodal and temporal information integration.
In this paper, we introduce a novel system called MMTagger that effectively integrates both multimodal and temporal information in the representation of music signal. The carefully designed multilayer architecture of the proposed classification framework seamlessly combines Multiple Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) into a single framework. The structure preserves more discriminative information, leading to more accurate and robust tagging. Experiment results obtained with two large music collections highlight the various advantages of our multilayer framework over state of the art techniques.

References

[1]
E. Alpaydin. Introduction to Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2004.
[2]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
[3]
M. Bartsch and G. Wakefield. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.
[4]
T. Bertin-Mahieux, D. Eck, F. Maillet, and P. Lamere. Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research, 37(2), 2008.
[5]
C. Dorai and S. Venkatesh. Bridging the semantic gap with computational media aesthetics. IEEE Multimedia, 10(2), 2003.
[6]
Z. Duan, L. Lu, and C. Zhang. Collective annotation of music from multiple semantic categories. In Proc. of ISMIR, 2008.
[7]
R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley and Sons, 2001.
[8]
D. Eck, P. Lamere, T. Bertin-Mahieux, and S. Green. Automatic generation of social tags for music recommendation. In Proc. of NIPS, 2007.
[9]
H. Hermansky and N. Morgan. Rasta processing of speech. IEEE Transaction on Speech and Audio Processing, 2:578--589, 1994.
[10]
N. Hu, R. Dannenberg, and G. Tzanetakis. Polyphonic audio matching and alignment for music retrieval. In Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 185--188, 2003.
[11]
C. Lee, C. Lin, and B. Juang. A study on speaker adaptation of the parameters of continuous density hidden markov models. IEEE Transactions on Signal Processing, 39(4), 1991.
[12]
T. Li, M. Ogihara, and Q. Li. A comparative study on content-based music genre classification. In Proc. of ACM SIGIR Conference, 2003.
[13]
B. Logan. Mel frequency cepstral coefficients for music modeling. In Proc. of the ISMIR, 2000.
[14]
L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust., Speech, Signal, 2006.
[15]
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, 2000.
[16]
N. Orio. Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval, 1(1), 2006.
[17]
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 2000.
[18]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, Z.-J. Zha, and H.-J. Zhang. A joint appearance-spatial distance for kernel-based image categorization. In Proc. of CVPR, 2008.
[19]
J. Shen, B. Cui, J. Shepherd, and K.-L. Tan. Towards efficient automated singer identification in large music databases. In Proc. of ACM SIGIR Conference, pages 59--66, 2006.
[20]
J. Shen, J. Shepherd, B. Cui, and K.-L. Tan. A novel framework for efficient automated singer identification in large music databases. ACM Trans. Inf. Syst., 27(3), 2009.
[21]
J. Shen, J. Shepherd, and A. H. H. Ngu. Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Transactions on Multimedia, 8(6):1179--1189, 2006.
[22]
S. Shwartz and N. Srebro. SVM optimization: inverse dependence on training set size. In Proc. of ICML, 2008.
[23]
D. Turnbull, L. Barrington, and G. Lanckriet. Modeling music and words using a multi-class naive bayes approach. In Proc. of ISMIR, 2006.
[24]
D. Turnbull, L. Barrington, G. R. G. Lanckriet, and M. Yazdani. Combining audio content and social context for semantic music discovery. In Proc. of ACM SIGIR Conference, pages 387--394, 2009.
[25]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. Towards musical query-by-semantic-description using the CAL500 data set. In Proc. of ACM SIGIR Conference, 2007.
[26]
D. Turnbull, L. Barrington, D. Torres, and G. R. G. Lanckriet. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech & Language Processing, 16(2), 2008.
[27]
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Trans. on Speech and Audio Processing, 2002.
[28]
B. Whitman. Learning the meaning of music. PhD thesis, Massachusetts Institute of Technology, 2005.
[29]
B. Whitman and R. M. Rifkin. Musical query-by-description as a multiclass learning problem. In Proc. of IEEE Workshop on Multimedia Signal Processing, 2002.
[30]
B. Zhang, J. Shen, Q. Xiang, and Y. Wang. Compositemap: a novel framework for music similarity measure. In Proc. of ACM SIGIR, pages 403--410, 2009.

Cited By

View all
  • (2021)A method of music autotagging based on audio and lyricsMultimedia Tools and Applications10.1007/s11042-020-10381-y80:10(15511-15539)Online publication date: 1-Apr-2021
  • (2021)MIMVOGUE: modeling Indian music using a variable order gapped HMMMultimedia Tools and Applications10.1007/s11042-020-10303-y80:10(14853-14866)Online publication date: 1-Apr-2021
  • (2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. browsing
  2. music
  3. recommendation
  4. search
  5. tagging

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A method of music autotagging based on audio and lyricsMultimedia Tools and Applications10.1007/s11042-020-10381-y80:10(15511-15539)Online publication date: 1-Apr-2021
  • (2021)MIMVOGUE: modeling Indian music using a variable order gapped HMMMultimedia Tools and Applications10.1007/s11042-020-10303-y80:10(14853-14866)Online publication date: 1-Apr-2021
  • (2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
  • (2019)SATINMultimedia Tools and Applications10.1007/s11042-018-5797-878:3(2703-2718)Online publication date: 1-Feb-2019
  • (2018)Social image tag enrichment based on textual similarity modelingMultimedia Tools and Applications10.1007/s11042-017-5184-x77:3(3659-3676)Online publication date: 1-Feb-2018
  • (2017)Exploiting music play sequence for music recommendationProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172400(3654-3660)Online publication date: 19-Aug-2017
  • (2017)Exploring User-Specific Information in Music RetrievalProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080772(655-664)Online publication date: 7-Aug-2017
  • (2016)Importance of audio feature reduction in automatic music genre classificationMultimedia Tools and Applications10.1007/s11042-014-2418-z75:6(3013-3026)Online publication date: 1-Mar-2016
  • (2016)Accurate online video tagging via probabilistic hybrid modelingMultimedia Systems10.1007/s00530-014-0399-422:1(99-113)Online publication date: 1-Feb-2016
  • (2014)Personalized Recommendation Combining User Interest and Social CircleIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.16826:7(1763-1777)Online publication date: Jul-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media