Article

Multimodal content-based structure analysis of karaoke music

Authors:

Qibin SunAuthors Info & Claims

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Pages 638 - 647

https://doi.org/10.1145/1101149.1101293

Published: 06 November 2005 Publication History

Abstract

This paper presents a novel approach for content-based analysis of karaoke music, which utilizes multimodal contents including synchronized lyrics text from the video channel and original singing audio as well as accompaniment audio in the two audio channels. We proposed a novel video text extraction technique to accurately segment the bitmaps of lyrics text from the video frames and track the time of its color changes that are synchronized to the music. A technique that characterizes the original singing voice by analyzing the volume balance between the two audio channels is also proposed. A novel music structure analysis method using lyrics text and audio content is then proposed to precisely identify the verses and choruses of a song, and segment the lyrics into singing phrases. Experimental results based on 20 karaoke music titles of difference languages have shown that our proposed video text extraction technique can detect and segment the lyrics texts with accuracy higher than 90%, and the proposed multimodal approach for music structure analysis method has better performance than the previous methods that are based only on audio content analysis.

References

[1]

Goto, M. A Chorus-Section Detecting Method for Musical Audio Signals. In Proc. IEEE ICASSP. 2003

[2]

Bartsch, M. A., and Wakefield, G.H. To Catch a Chorus: Using Chroma-based Representations for Audio Thumbnailing. In Proc. WASPA. 2001.

[3]

Chai, W., and Vercoe, B. Music Thumbnailing via Structural Analysis. In Proc. ACM Multimedia 2003, 223--226.

Digital Library

[4]

Cooper, M., and Foote, J. Automatic Music Summarization via Similarity Analysis, In Proc. ISMIR, 2002.

[5]

Lu, L., and Zhang, H. Automated Extraction of Music Snippets, In Proc. ACM Multimedia. 2003, 140--147.

Digital Library

[6]

Maddage, N. C., Xu, C., Kankanhalli, M. and Shao, X. Content-based Music Structure Analysis with the Applications to Music Semantic Understanding. Proc. ACM Multimedia 2004.

Digital Library

[7]

Shao, X., Xu, C., and Kankanhalli M. A New Approach to Automatic Music Video Summarization. Proc. ICIP 2004.

[8]

Wang, Y., Kan, M.Y., New, T.L., Shenoy A., and Yin, J. lyrically: Automatic Synchronization of Acoustic Musical Signals and Textural Lyrics. In Proc. ACM Multimedia 2004.

Digital Library

[9]

Lienhart, R. and Wernicke A., Localizing and Segmenting Text in Images and Videos. IEEE Trans. on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2004.

Digital Library

[10]

Lienhart, R. and Effelsberg, W., Automatic text segmentation and text recognition for video indexing, Multimedia Syst., vol. 8, pp. 69--81, Jan. 2000

Digital Library

[11]

Li, H, Doermann D., and Kia, O., Automatic text detection and tracking in digital video, IEEE Trans. Image Processing, vol. 9, pp. 147--156, Jan. 2000.

Digital Library

[12]

Sato, T., Kanade, T., Hughes, E., Smith, M., and Satoh, S.-i Video OCR: Indexing digital news libraries by recognition of superimposed caption, Multimedia Syst., vol. 7, no. 5, pp. 385--395, 1999.

Digital Library

[13]

Goto, M. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. Journal of New Music Research. June 2001, Vol.30, 159--171.

[14]

Brown J.C. 1991. Calculation of a constant Q spectral transform. In J. Acoust. Soc. Am., 89(1):425--434, 1991.

[15]

Li, D. and Sethi, I.K., MDC: a software tool for developing MPEG applications, IEEE International Conference on Multimedia Computing and Systems, volume 1, pages 445--450, 1999.

Digital Library

Cited By

Simonetta FNtalampiras SAvanzini F(2019)Multimodal Music Information Processing and Retrieval: Survey and Future Challenges2019 International Workshop on Multilayer Music Representation and Processing (MMRP)10.1109/MMRP.2019.00012(10-18)Online publication date: Jan-2019
https://doi.org/10.1109/MMRP.2019.00012
Lu TJin YSu FShivakumara PTan C(2015)Content-oriented multimedia document understanding through cross-media correlationMultimedia Tools and Applications10.1007/s11042-014-2044-974:18(8105-8135)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1007/s11042-014-2044-9
Sharma ALakhtaria KPanwar AVishwakarma S(2014)An efficient approach using LPFT for the karaoke formation of musical song2014 IEEE International Advance Computing Conference (IACC)10.1109/IAdCC.2014.6779393(601-605)Online publication date: Feb-2014
https://doi.org/10.1109/IAdCC.2014.6779393
Show More Cited By

Index Terms

Multimodal content-based structure analysis of karaoke music
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

A Query-by-Singing System for Retrieving Karaoke Music

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels ...
A music retrieval system based on query-by-singing for karaoke jukebox
AIRS'06: Proceedings of the Third Asia conference on Information Retrieval Technology

This paper investigates the problem of retrieving Karaoke music by singing. The Karaoke music encompasses two audio channels in each track: one is a mix of vocal and background accompaniment, and the other is composed of accompaniment only. The ...
Computational Analysis of Jazz Music: Estimating Tonality through Chord Progression Distances
CSAE '23: Proceedings of the 7th International Conference on Computer Science and Application Engineering

Currently, research in music informatics focuses extensively on music theory, particularly on the theoretical systems of Western classical music dating back to the 19th century. However, contemporary popular music genres such as pop, rock, and jazz often ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

November 2005

1110 pages

ISBN:1595930442

DOI:10.1145/1101149

General Chairs:
Hongjiang Zhang
Microsoft Research Asia, China
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Program Chairs:
Ralf Steinmetz
Technische Universitat Darmstadt, Germany
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Lynn Wilcox
FXPAL

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM05

Sponsor:

MM05: 2005 13th Annual ACM International Conference on Multimedia

November 6 - 11, 2005

Hilton, Singapore

Acceptance Rates

MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
463
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Simonetta FNtalampiras SAvanzini F(2019)Multimodal Music Information Processing and Retrieval: Survey and Future Challenges2019 International Workshop on Multilayer Music Representation and Processing (MMRP)10.1109/MMRP.2019.00012(10-18)Online publication date: Jan-2019
https://doi.org/10.1109/MMRP.2019.00012
Lu TJin YSu FShivakumara PTan C(2015)Content-oriented multimedia document understanding through cross-media correlationMultimedia Tools and Applications10.1007/s11042-014-2044-974:18(8105-8135)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1007/s11042-014-2044-9
Sharma ALakhtaria KPanwar AVishwakarma S(2014)An efficient approach using LPFT for the karaoke formation of musical song2014 IEEE International Advance Computing Conference (IACC)10.1109/IAdCC.2014.6779393(601-605)Online publication date: Feb-2014
https://doi.org/10.1109/IAdCC.2014.6779393
Lu TPalaiahnakote STan CLiu WLu TPalaiahnakote STan CLiu W(2014)Text Detection in Multimodal Video AnalysisVideo Text Detection10.1007/978-1-4471-6515-6_9(221-246)Online publication date: 30-Jun-2014
https://doi.org/10.1007/978-1-4471-6515-6_9
Lu TPalaiahnakote STan CLiu WLu TPalaiahnakote STan CLiu W(2014)Video Text Detection SystemsVideo Text Detection10.1007/978-1-4471-6515-6_7(169-193)Online publication date: 30-Jun-2014
https://doi.org/10.1007/978-1-4471-6515-6_7
Wang JPan JFeng SDeng D(2012)Enabling multiparty karaoke over Internet based on low-level computers: practice and experimentInternational Journal of Communication Systems10.1002/dac.130225:8(1015-1033)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.1002/dac.1302
Mayer RRauber A(2010)Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?Advances in Music Information Retrieval10.1007/978-3-642-11674-2_15(333-363)Online publication date: 2010
https://doi.org/10.1007/978-3-642-11674-2_15
Mayer RNeumayer RRauber AEL Saddik AVuong SGriwodz CDel Bimbo ACandan KJaimes A(2008)Combination of audio and lyrics features for genre classification in digital audio collectionsProceedings of the 16th ACM international conference on Multimedia10.1145/1459359.1459382(159-168)Online publication date: 26-Oct-2008
https://dl.acm.org/doi/10.1145/1459359.1459382
Kan MWang YIskandar DNew TShenoy A(2008)LyricAllyIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2007.91155916:2(338-349)Online publication date: 1-Feb-2008
https://dl.acm.org/doi/10.1109/TASL.2007.911559
Kim NChoi MHwang JHwang MKo S(2007)Efficient Compression Scheme For Time Codes in KaroaleIEEE Transactions on Consumer Electronics10.1109/TCE.2007.33953053:1(235-238)Online publication date: 1-Feb-2007
https://dl.acm.org/doi/10.1109/TCE.2007.339530
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten