skip to main content
10.1145/1180639.1180777acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Syllabic level automatic synchronization of music signals and text lyrics

Published: 23 October 2006 Publication History

Abstract

We present a framework to synchronize pop music to corresponding text lyric. We refine line level alignment achievable by existing work to syllabic level by using a dynamic programming process. Our main contribution is using music knowledge to constrain the dynamic programming search. This is done by modeling (1) non-uniform note length distribution and (2) a note length distribution for each section type (for example intro, chorus, and bridge). These reduce alignment error by 6.4% and improve time efficiency by a factor of 2.2.

References

[1]
Furini, M. and Alboresi, L. Audio-text synchronization inside MP3 files: a new approach and its implementation. In Proceedings of the IEEE Consumer Communications & Networking 2004 (CCNC2004), Las Vegas, USA, 2004.
[2]
Loscos, A., Cano, P., and Bonada, J. Low-Delay Singing Voice Alignment to Text. In Proceedings of International Computer Music Conference, Beijing, China, 1999.
[3]
Wang, C.-K., Lyu, R.-Y., Chiang, Y.-C. An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. In Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH-2003), 1197--1200.
[4]
Wang, Y., Kan, M.-Y., Nwe, T.L., Shenoy, A., and Yin, J. LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics. In Proceedings of the 12th ACM International Conference on Multimedia, pp. 212--219, 2004.
[5]
Yoshii, K., Goto, M., and Okuno, H. G. Automatic drum sound description for real-world music using template adaptation and matching methods. In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR2004), Barcelona, Spain, 2004.
[6]
Zhu, Y., Chen, K., and Sun, Q. Multimodal content-based structure analysis of karaoke music. In Proceedings of ACM International Conference on Multimedia, pp. 638--647, 2005.
[7]
HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk/

Cited By

View all
  • (2018)Analysis of Speech and Singing Signals for Temporal Alignment2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPA.2018.8659615(1893-1898)Online publication date: Nov-2018
  • (2018)Retrieval of Song Lyrics from Sung Queries2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461908(111-115)Online publication date: Apr-2018
  • (2017)Word level lyrics-audio synchronization using separated vocals2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952235(646-650)Online publication date: Mar-2017
  • Show More Cited By

Index Terms

  1. Syllabic level automatic synchronization of music signals and text lyrics

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '06: Proceedings of the 14th ACM international conference on Multimedia
      October 2006
      1072 pages
      ISBN:1595934472
      DOI:10.1145/1180639
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 October 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dynamic programming
      2. hidden Markov model
      3. music structure
      4. voice alignment

      Qualifiers

      • Article

      Conference

      MM06
      MM06: The 14th ACM International Conference on Multimedia 2006
      October 23 - 27, 2006
      CA, Santa Barbara, USA

      Acceptance Rates

      Overall Acceptance Rate 1,639 of 6,626 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Analysis of Speech and Singing Signals for Temporal Alignment2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPA.2018.8659615(1893-1898)Online publication date: Nov-2018
      • (2018)Retrieval of Song Lyrics from Sung Queries2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461908(111-115)Online publication date: Apr-2018
      • (2017)Word level lyrics-audio synchronization using separated vocals2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952235(646-650)Online publication date: Mar-2017
      • (2017)A dual alignment scheme for improved speech-to-singing voice conversion2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPA.2017.8282289(1547-1555)Online publication date: Dec-2017
      • (2016)Alignment of Lyrics With Accompanied Singing Audio Based on Acoustic-Phonetic Vowel Likelihood ModelingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.259428224:11(1998-2008)Online publication date: 1-Nov-2016
      • (2014)Singing information processing2014 12th International Conference on Signal Processing (ICSP)10.1109/ICOSP.2014.7015431(2431-2438)Online publication date: Oct-2014
      • (2012)SAP HANA databaseACM SIGMOD Record10.1145/2094114.209412640:4(45-51)Online publication date: 11-Jan-2012
      • (2012)The database architectures research group at CWIACM SIGMOD Record10.1145/2094114.209412440:4(39-44)Online publication date: 11-Jan-2012
      • (2012)Parallel data processing with MapReduceACM SIGMOD Record10.1145/2094114.209411840:4(11-20)Online publication date: 11-Jan-2012
      • (2012)Optimizing index scans on flash memory SSDsACM SIGMOD Record10.1145/2094114.209411640:4(5-10)Online publication date: 11-Jan-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media