skip to main content
10.1145/1101149.1101293acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Multimodal content-based structure analysis of karaoke music

Published: 06 November 2005 Publication History

Abstract

This paper presents a novel approach for content-based analysis of karaoke music, which utilizes multimodal contents including synchronized lyrics text from the video channel and original singing audio as well as accompaniment audio in the two audio channels. We proposed a novel video text extraction technique to accurately segment the bitmaps of lyrics text from the video frames and track the time of its color changes that are synchronized to the music. A technique that characterizes the original singing voice by analyzing the volume balance between the two audio channels is also proposed. A novel music structure analysis method using lyrics text and audio content is then proposed to precisely identify the verses and choruses of a song, and segment the lyrics into singing phrases. Experimental results based on 20 karaoke music titles of difference languages have shown that our proposed video text extraction technique can detect and segment the lyrics texts with accuracy higher than 90%, and the proposed multimodal approach for music structure analysis method has better performance than the previous methods that are based only on audio content analysis.

References

[1]
Goto, M. A Chorus-Section Detecting Method for Musical Audio Signals. In Proc. IEEE ICASSP. 2003
[2]
Bartsch, M. A., and Wakefield, G.H. To Catch a Chorus: Using Chroma-based Representations for Audio Thumbnailing. In Proc. WASPA. 2001.
[3]
Chai, W., and Vercoe, B. Music Thumbnailing via Structural Analysis. In Proc. ACM Multimedia 2003, 223--226.
[4]
Cooper, M., and Foote, J. Automatic Music Summarization via Similarity Analysis, In Proc. ISMIR, 2002.
[5]
Lu, L., and Zhang, H. Automated Extraction of Music Snippets, In Proc. ACM Multimedia. 2003, 140--147.
[6]
Maddage, N. C., Xu, C., Kankanhalli, M. and Shao, X. Content-based Music Structure Analysis with the Applications to Music Semantic Understanding. Proc. ACM Multimedia 2004.
[7]
Shao, X., Xu, C., and Kankanhalli M. A New Approach to Automatic Music Video Summarization. Proc. ICIP 2004.
[8]
Wang, Y., Kan, M.Y., New, T.L., Shenoy A., and Yin, J. lyrically: Automatic Synchronization of Acoustic Musical Signals and Textural Lyrics. In Proc. ACM Multimedia 2004.
[9]
Lienhart, R. and Wernicke A., Localizing and Segmenting Text in Images and Videos. IEEE Trans. on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2004.
[10]
Lienhart, R. and Effelsberg, W., Automatic text segmentation and text recognition for video indexing, Multimedia Syst., vol. 8, pp. 69--81, Jan. 2000
[11]
Li, H, Doermann D., and Kia, O., Automatic text detection and tracking in digital video, IEEE Trans. Image Processing, vol. 9, pp. 147--156, Jan. 2000.
[12]
Sato, T., Kanade, T., Hughes, E., Smith, M., and Satoh, S.-i Video OCR: Indexing digital news libraries by recognition of superimposed caption, Multimedia Syst., vol. 7, no. 5, pp. 385--395, 1999.
[13]
Goto, M. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. Journal of New Music Research. June 2001, Vol.30, 159--171.
[14]
Brown J.C. 1991. Calculation of a constant Q spectral transform. In J. Acoust. Soc. Am., 89(1):425--434, 1991.
[15]
Li, D. and Sethi, I.K., MDC: a software tool for developing MPEG applications, IEEE International Conference on Multimedia Computing and Systems, volume 1, pages 445--450, 1999.

Cited By

View all
  • (2019)Multimodal Music Information Processing and Retrieval: Survey and Future Challenges2019 International Workshop on Multilayer Music Representation and Processing (MMRP)10.1109/MMRP.2019.00012(10-18)Online publication date: Jan-2019
  • (2015)Content-oriented multimedia document understanding through cross-media correlationMultimedia Tools and Applications10.1007/s11042-014-2044-974:18(8105-8135)Online publication date: 1-Sep-2015
  • (2014)An efficient approach using LPFT for the karaoke formation of musical song2014 IEEE International Advance Computing Conference (IACC)10.1109/IAdCC.2014.6779393(601-605)Online publication date: Feb-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
November 2005
1110 pages
ISBN:1595930442
DOI:10.1145/1101149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. karaoke
  2. multimodality
  3. music information retrieval
  4. music structure analysis
  5. video text detection

Qualifiers

  • Article

Conference

MM05

Acceptance Rates

MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Multimodal Music Information Processing and Retrieval: Survey and Future Challenges2019 International Workshop on Multilayer Music Representation and Processing (MMRP)10.1109/MMRP.2019.00012(10-18)Online publication date: Jan-2019
  • (2015)Content-oriented multimedia document understanding through cross-media correlationMultimedia Tools and Applications10.1007/s11042-014-2044-974:18(8105-8135)Online publication date: 1-Sep-2015
  • (2014)An efficient approach using LPFT for the karaoke formation of musical song2014 IEEE International Advance Computing Conference (IACC)10.1109/IAdCC.2014.6779393(601-605)Online publication date: Feb-2014
  • (2014)Text Detection in Multimodal Video AnalysisVideo Text Detection10.1007/978-1-4471-6515-6_9(221-246)Online publication date: 30-Jun-2014
  • (2014)Video Text Detection SystemsVideo Text Detection10.1007/978-1-4471-6515-6_7(169-193)Online publication date: 30-Jun-2014
  • (2012)Enabling multiparty karaoke over Internet based on low-level computers: practice and experimentInternational Journal of Communication Systems10.1002/dac.130225:8(1015-1033)Online publication date: 1-Aug-2012
  • (2010)Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?Advances in Music Information Retrieval10.1007/978-3-642-11674-2_15(333-363)Online publication date: 2010
  • (2008)Combination of audio and lyrics features for genre classification in digital audio collectionsProceedings of the 16th ACM international conference on Multimedia10.1145/1459359.1459382(159-168)Online publication date: 26-Oct-2008
  • (2008)LyricAllyIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2007.91155916:2(338-349)Online publication date: 1-Feb-2008
  • (2007)Efficient Compression Scheme For Time Codes in KaroaleIEEE Transactions on Consumer Electronics10.1109/TCE.2007.33953053:1(235-238)Online publication date: 1-Feb-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media