research-article

Effective music tagging through advanced statistical modeling

Authors:

Xiansheng HuaAuthors Info & Claims

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Pages 635 - 642

https://doi.org/10.1145/1835449.1835555

Published: 19 July 2010 Publication History

Abstract

Music information retrieval (MIR) holds great promise as a technology for managing large music archives. One of the key components of MIR that has been actively researched into is music tagging. While significant progress has been achieved, most of the existing systems still adopt a simple classification approach, and apply machine learning classifiers directly on low level acoustic features. Consequently, they suffer the shortcomings of (1) poor accuracy, (2) lack of comprehensive evaluation results and the associated analysis based on large scale datasets, and (3) incomplete content representation, arising from the lack of multimodal and temporal information integration.

In this paper, we introduce a novel system called MMTagger that effectively integrates both multimodal and temporal information in the representation of music signal. The carefully designed multilayer architecture of the proposed classification framework seamlessly combines Multiple Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) into a single framework. The structure preserves more discriminative information, leading to more accurate and robust tagging. Experiment results obtained with two large music collections highlight the various advantages of our multilayer framework over state of the art techniques.

References

[1]

E. Alpaydin. Introduction to Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2004.

Digital Library

[2]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.

Digital Library

[3]

M. Bartsch and G. Wakefield. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.

[4]

T. Bertin-Mahieux, D. Eck, F. Maillet, and P. Lamere. Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research, 37(2), 2008.

[5]

C. Dorai and S. Venkatesh. Bridging the semantic gap with computational media aesthetics. IEEE Multimedia, 10(2), 2003.

Digital Library

[6]

Z. Duan, L. Lu, and C. Zhang. Collective annotation of music from multiple semantic categories. In Proc. of ISMIR, 2008.

[7]

R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley and Sons, 2001.

Digital Library

[8]

D. Eck, P. Lamere, T. Bertin-Mahieux, and S. Green. Automatic generation of social tags for music recommendation. In Proc. of NIPS, 2007.

[9]

H. Hermansky and N. Morgan. Rasta processing of speech. IEEE Transaction on Speech and Audio Processing, 2:578--589, 1994.

[10]

N. Hu, R. Dannenberg, and G. Tzanetakis. Polyphonic audio matching and alignment for music retrieval. In Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 185--188, 2003.

[11]

C. Lee, C. Lin, and B. Juang. A study on speaker adaptation of the parameters of continuous density hidden markov models. IEEE Transactions on Signal Processing, 39(4), 1991.

Digital Library

[12]

T. Li, M. Ogihara, and Q. Li. A comparative study on content-based music genre classification. In Proc. of ACM SIGIR Conference, 2003.

Digital Library

[13]

B. Logan. Mel frequency cepstral coefficients for music modeling. In Proc. of the ISMIR, 2000.

[14]

L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust., Speech, Signal, 2006.

[15]

G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, 2000.

[16]

N. Orio. Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval, 1(1), 2006.

Digital Library

[17]

J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 2000.

[18]

G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, Z.-J. Zha, and H.-J. Zhang. A joint appearance-spatial distance for kernel-based image categorization. In Proc. of CVPR, 2008.

[19]

J. Shen, B. Cui, J. Shepherd, and K.-L. Tan. Towards efficient automated singer identification in large music databases. In Proc. of ACM SIGIR Conference, pages 59--66, 2006.

Digital Library

[20]

J. Shen, J. Shepherd, B. Cui, and K.-L. Tan. A novel framework for efficient automated singer identification in large music databases. ACM Trans. Inf. Syst., 27(3), 2009.

Digital Library

[21]

J. Shen, J. Shepherd, and A. H. H. Ngu. Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Transactions on Multimedia, 8(6):1179--1189, 2006.

Digital Library

[22]

S. Shwartz and N. Srebro. SVM optimization: inverse dependence on training set size. In Proc. of ICML, 2008.

Digital Library

[23]

D. Turnbull, L. Barrington, and G. Lanckriet. Modeling music and words using a multi-class naive bayes approach. In Proc. of ISMIR, 2006.

[24]

D. Turnbull, L. Barrington, G. R. G. Lanckriet, and M. Yazdani. Combining audio content and social context for semantic music discovery. In Proc. of ACM SIGIR Conference, pages 387--394, 2009.

Digital Library

[25]

D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. Towards musical query-by-semantic-description using the CAL500 data set. In Proc. of ACM SIGIR Conference, 2007.

Digital Library

[26]

D. Turnbull, L. Barrington, D. Torres, and G. R. G. Lanckriet. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech & Language Processing, 16(2), 2008.

Digital Library

[27]

G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Trans. on Speech and Audio Processing, 2002.

[28]

B. Whitman. Learning the meaning of music. PhD thesis, Massachusetts Institute of Technology, 2005.

Digital Library

[29]

B. Whitman and R. M. Rifkin. Musical query-by-description as a multiclass learning problem. In Proc. of IEEE Workshop on Multimedia Signal Processing, 2002.

[30]

B. Zhang, J. Shen, Q. Xiang, and Y. Wang. Compositemap: a novel framework for music similarity measure. In Proc. of ACM SIGIR, pages 403--410, 2009.

Digital Library

Cited By

Wang HSyu SWongchaisuwat P(2021)A method of music autotagging based on audio and lyricsMultimedia Tools and Applications10.1007/s11042-020-10381-y80:10(15511-15539)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s11042-020-10381-y
Mor BGarhwal SKumar A(2021)MIMVOGUE: modeling Indian music using a variable order gapped HMMMultimedia Tools and Applications10.1007/s11042-020-10303-y80:10(14853-14866)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s11042-020-10303-y
Zhao WHou YChen JZhu JYin ESu HWen J(2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
https://dl.acm.org/doi/10.1145/3377550
Show More Cited By

Index Terms

Effective music tagging through advanced statistical modeling
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...
Music classification method based on lyrics for music therapy
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

Music is used for people practicing sports, for elderly individuals, and to help train the mind. Recently in music information science, studies have been conducted on music therapy and on music classification from a therapeutic point of view. However, ...
CompositeMap: a novel framework for music similarity measure
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

With the continuing advances in data storage and communication technology, there has been an explosive growth of music information from different application domains. As an effective technique for organizing, browsing, and searching large data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

July 2010

944 pages

ISBN:9781450301534

DOI:10.1145/1835449

General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '10

Sponsor:

SIGIR

SIGIR '10: The 33rd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2010

Geneva, Switzerland

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang HSyu SWongchaisuwat P(2021)A method of music autotagging based on audio and lyricsMultimedia Tools and Applications10.1007/s11042-020-10381-y80:10(15511-15539)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s11042-020-10381-y
Mor BGarhwal SKumar A(2021)MIMVOGUE: modeling Indian music using a variable order gapped HMMMultimedia Tools and Applications10.1007/s11042-020-10303-y80:10(14853-14866)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s11042-020-10303-y
Zhao WHou YChen JZhu JYin ESu HWen J(2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
https://dl.acm.org/doi/10.1145/3377550
Bayle YRobine MHanna P(2019)SATINMultimedia Tools and Applications10.1007/s11042-018-5797-878:3(2703-2718)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s11042-018-5797-8
Shen M(2018)Social image tag enrichment based on textual similarity modelingMultimedia Tools and Applications10.1007/s11042-017-5184-x77:3(3659-3676)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s11042-017-5184-x
Cheng ZShen JZhu LKankanhalli MNie L(2017)Exploiting music play sequence for music recommendationProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172400(3654-3660)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172400
Cheng ZShen JNie LChua TKankanhalli MKando NSakai TJoho HLi Hde Vries AWhite R(2017)Exploring User-Specific Information in Music RetrievalProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080772(655-664)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080772
Baniya BLee J(2016)Importance of audio feature reduction in automatic music genre classificationMultimedia Tools and Applications10.1007/s11042-014-2418-z75:6(3013-3026)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s11042-014-2418-z
Shen JWang MChua T(2016)Accurate online video tagging via probabilistic hybrid modelingMultimedia Systems10.1007/s00530-014-0399-422:1(99-113)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s00530-014-0399-4
Qian XFeng HZhao GMei T(2014)Personalized Recommendation Combining User Interest and Social CircleIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.16826:7(1763-1777)Online publication date: Jul-2014
https://doi.org/10.1109/TKDE.2013.168
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten