research-article

Harnessing Music-Related Visual Stereotypes for Music Information Retrieval

Authors:
Alexander Schindler

Austrian Institute of Technology, Vienna, Austria

Austrian Institute of Technology, Vienna, Austria
View Profile

,
Andreas Rauber

Technische Universitt Wien

Technische Universitt Wien
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 8 Issue 2Article No.: 20pp 1–21https://doi.org/10.1145/2926719

Published:25 October 2016Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Over decades, music labels have shaped easily identifiable genres to improve recognition value and subsequently market sales of new music acts. Referring to print magazines and later to music television as important distribution channels, the visual representation thus played and still plays a significant role in music marketing. Visual stereotypes developed over decades that enable us to quickly identify referenced music only by sight without listening. Despite the richness of music-related visual information provided by music videos and album covers as well as T-shirts, advertisements, and magazines, research towards harnessing this information to advance existing or approach new problems of music retrieval or recommendation is scarce or missing. In this article, we present our research on visual music computing that aims to extract stereotypical music-related visual information from music videos. To provide comprehensive and reproducible results, we present the Music Video Dataset, a thoroughly assembled suite of datasets with dedicated evaluation tasks that are aligned to current Music Information Retrieval tasks. Based on this dataset, we provide evaluations of conventional low-level image processing and affect-related features to provide an overview of the expressiveness of fundamental visual properties such as color, illumination, and contrasts. Further, we introduce a high-level approach based on visual concept detection to facilitate visual stereotypes. This approach decomposes the semantic content of music video frames into concrete concepts such as vehicles, tools, and so on, defined in a wide visual vocabulary. Concepts are detected using convolutional neural networks and their frequency distributions as semantic descriptions for a music video. Evaluations showed that these descriptions show good performance in predicting the music genre of a video and even outperform audio-content descriptors on cross-genre thematic tags. Further, highly significant performance improvements were observed by augmenting audio-based approaches through the introduced visual approach.

References

Esra Acar, Frank Hopfgartner, and Sahin Albayrak. 2014. Understanding affective content of music videos through learned representations. In MultiMedia Modeling. Springer, 303--314. Google ScholarDigital Library
Eric Brochu, Nando De Freitas, and Kejie Bao. 2003. The sound of an album cover: Probabilistic multimedia and IR. In Proceedings of the Workshop on Artificial Intelligence and Statistics.Google Scholar
Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. 2007. Automated music video generation using web image resource. In Acoustics, Speech and Signal Processing. ICASSP. Google ScholarCross Ref
Cyril Cleverdon. 1967. The Cranfield tests on index language devices. In Aslib Proceedings, Vol. 19. 173--194. Google Scholar
Frederique Crete, Thierry Dolmiere, Patricia Ladret, and Marina Nicolas. 2007. The blur effect: Perception and estimation with a new no-reference perceptual blur metric. In Electronic Imaging.Google Scholar
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Computer Vision--ECCV 2006. Springer, 288--301. Google ScholarDigital Library
J. Stephen Downie. 2003. Music information retrieval. Annual Review of Information Science and Tech.Google Scholar
Peter Dunker, Stefanie Nowak, André Begau, and Cornelia Lanz. 2008. Content-based mood classification for photos and music: A generic multi-modal classification framework and evaluation approach. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 97--104. Google ScholarDigital Library
Sebastian Ewert, Meinard Müller, and Peter Grosche. 2009. High resolution audio synchronization using chroma onset features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Google ScholarDigital Library
Helen Farley. 2009. Demons, devils and witches: The occult in heavy metal music. Heavy Metal Music in Britain (2009), 73--88.Google Scholar
Joanna Finkelstein. 2007. Art of Self Invention: Image and Identity in Popular Visual Culture. IB Tauris.Google Scholar
Jonathan Foote, Matthew Cooper, and Andreas Girgensohn. 2002. Creating music videos using automatic media analysis. In Proceedings of the 10th ACM International Conference on Multimedia. ACM. Google ScholarDigital Library
Simon Frith, Andrew Goodwin, and Lawrence Grossberg. 2005. Sound and Vision: The Music Video Reader. Routledge.Google Scholar
Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. 2011. A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13, 2 (2011), 303--319. Google ScholarDigital Library
Olivier Gillet, Slim Essid, and Gaël Richard. 2007. On the correlation of automatic audio and visual segmentations of music videos. IEEE Trans. Circuits Syst. Video Technol. (2007). Google ScholarDigital Library
Magnus Haake and Agneta Gulz. 2008. Visual stereotypes and virtual pedagogical agents. J. Educ. Technol. Soc. 11, 4 (2008), 1--15.Google Scholar
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1 (2009). Google ScholarDigital Library
P. Cougar Hall, Joshua H. West, and Shane Hill. 2012. Sexualization in lyrics of popular music from 1959 to 2009: Implications for sexuality educators. Sexuality 8 Culture 16, 2 (June 2012), 103--117.Google Scholar
Allan Hanbury. 2003. Circular statistics applied to colour images. In Proceedings of the 8th Computer Vision Winter Workshop.Google Scholar
Allan Hanbury and Jean Serra. 2003. Colour image analysis in 3D-polar coordinates. In Proceedings of 25th DAGM Symposium on Pattern Recognition. Springer, 124--131. Google ScholarCross Ref
Xiao Hu and J. Stephen Downie. 2010. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries. ACM, 159--168. Google ScholarDigital Library
Xian-Sheng Hua, Lie Lu, and Hong-Jiang Zhang. 2004. Automatic music video generation based on temporal pattern analysis. In Proceedings of the 12th Annual ACM International Conference on Multimedia. ACM. Google ScholarDigital Library
Johannes Itten and Ernst Van Haagen. 1973. The Art of Color: The Subjective Experience and Objective Rationale of Color. Van Nostrand Reinhold New York, NY.Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
Wonjun Kim and Changick Kim. 2007. Automatic region of interest determination in music videos. In Proceedings of the 41th Asilomar Conference on Signals, Systems and Computers. IEEE, 485--489. Google ScholarCross Ref
Sander Koelstra, Christian Mühl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2012. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3, 1 (2012), 18--31. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
Paul Lamere. 2008. Social tagging and music information retrieval. J. New Music Res. 37, 2 (2008), 101--114. Google ScholarCross Ref
Olivier Lartillot and Petri Toiviainen. 2007. A matlab toolbox for musical feature extraction from audio. In International Conference on Digital Audio Effects. 237--244.Google Scholar
Jin Ha Lee, Kahyun Choi, Xiao Hu, and J. Stephen Downie. 2013. K-pop genres: A cross-cultural exploration. In ISMIR. 529--534.Google Scholar
Jin Ha Lee, J. Stephen Downie, and Sally Jo Cunningham. 2005. Challenges in cross-cultural/multilingual music information seeking. In ISMIR. 1--7.Google Scholar
Janis Lıbeks and Douglas Turnbull. 2010. Exploring artist image using content-based analysis of promotional photos. In Proceedings of the International Computer Music Conference.Google Scholar
J. Libeks and D. Turnbull. 2011. You can judge an artist by an album cover: Using images for music annotation. IEEE MultiMed. 18, 4 (April 2011), 30--37. Google ScholarDigital Library
Thomas Lidy and Andreas Rauber. 2005. Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In ISMIR.Google Scholar
Cynthia Liem, Meinard Müller, Douglas Eck, George Tzanetakis, and Alan Hanjalic. 2011. The need for music information retrieval with user-centered and multimodal strategies. In Proceedings of the 1st International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies. ACM, 1--6. Google ScholarDigital Library
Beth Logan and others. 2000. Mel frequency cepstral coefficients for music modeling. In ISMIR.Google Scholar
Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the International Conference on Multimedia. ACM, 83--92. Google ScholarDigital Library
Alison Mattek and Michael Casey. 2011. Cross-modal aesthetics from a feature extraction perspective: A pilot study. In ISMIR.Google Scholar
Rudolf Mayer. 2011. Analysing the similarity of album art with self-organising maps. In Advances in Self-Organizing Maps. LNCS, Vol. 6731. Springer. Google ScholarDigital Library
Rudolf Mayer, Robert Neumayer, and Andreas Rauber. 2008. Rhyme and style features for musical genre classification by song lyrics.Google Scholar
Cory McKay and Ichiro Fujinaga. 2006. Musical genre classification: Is it worth pursuing and how can it be improved? In ISMIR. 101--106.Google Scholar
Leonard B. Meyer. 1956. Emotion and meaning in music. University of Chicago Press.Google Scholar
George A. Miller. 1995. WordNet: A lexical database for english. Commun. ACM 38, 11 (1995), 39--41. Google ScholarDigital Library
Riccardo Miotto and Nicola Orio. 2008. A music identification system based on chroma indexing and statistical modeling. In ISMIR.Google Scholar
Keith Negus. 2011. Producing Pop: Culture and Conflict in the Popular Music Industry. (out of print.)Google Scholar
Bureau of the Census and United States. 2009. Statistical Abstract of the United States. US Government Printing Office.Google Scholar
Nicola Orio, Cynthia C. S. Liem, Geoffroy Peeters, and Markus Schedl. 2012. MusiClef: Multimodal music tagging task. In Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359. Google ScholarDigital Library
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 2 (2000), 99--121. Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (2015). Google ScholarDigital Library
Shoto Sasaki, Tatsunori Hirai, Hayato Ohya, and Shigeo Morishima. 2015. Affective music recommendation system based on the mood of input video. LNCS, Vol. 8936. Springer International Publishing.Google ScholarCross Ref
Nicolas Scaringella, Giorgio Zoia, and Daniel Mlynek. 2006. Automatic genre classification of music content: A survey. IEEE Sign. Process. Mag. 23, 2 (2006), 133--141. Google ScholarCross Ref
Markus Schedl, Tim Pohle, Peter Knees, and Gerhard Widmer. 2006. Assigning and visualizing music genres by web-based co-occurrence analysis. In ISMIR. Citeseer, 260--265.Google Scholar
Alexander Schindler. 2014. A picture is worth a thousand songs: Exploring visual aspects of music. In Proceedings of the 1st International Workshop on Digital Libraries for Musicology (DLfM’14). Google ScholarDigital Library
Alexander Schindler and Andreas Rauber. 2013. A music video information retrieval approach to artist identification. In Proceedings of the 10th Symposium on Computer Music Multidisciplinary Research (CMMR 2013).Google Scholar
Alexander Schindler and Andreas Rauber. 2014. Capturing the temporal domain in echonest features for improved classification effectiveness. LNCS, Vol. 8382. Google ScholarCross Ref
Alexander Schindler and Andreas Rauber. 2015. An audio-visual approach to music genre classification through affective color features. In Advances in Information Retrieval. LNCS, Vol. 9022. 61--67. Google ScholarCross Ref
Xavier Serra, Michela Magas, Emmanouil Benetos, Magdalena Chudy, S. Dixon, Arthur Flexer, Emilia Gómez, F. Gouyon, P. Herrera, S. Jordà, Oscar Paytuvi, G. Peeters, Jan Schlüter, H. Vinet, and G. Widmer. 2013. Roadmap for Music Information ReSearch.Google Scholar
Xi Shao, Changsheng Xu, Namunu C. Maddage, Qi Tian, Mohan S. Kankanhalli, and Jesse S. Jin. 2006. Automatic summarization of music videos. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 2, 2 (May 2006), 127--148. Google ScholarDigital Library
Mark Shevy. 2008. Music genre as cognitive schema: Extramusical associations with country and hip-hop music. Psychol. Music 36, 4 (2008), 477--498. Google ScholarCross Ref
Kai Siedenburg, Ichiro Fujinaga, and Stephen McAdams. 2016. A comparison of approaches to timbre descriptors in music information retrieval and music psychology. J. New Music Research (2016).Google Scholar
Bob L. Sturm. 2013. Classification accuracy is not enough. J. Intell. Inform. Syst. (2013). Google ScholarDigital Library
Bob L. Sturm. 2014. A simple method to determine if a music information retrieval system is a horse. IEEE Trans. Multimed. 16, 6 (2014), 1636--1644. Google ScholarCross Ref
George Tzanetakis and Perry Cook. 2000. Marsyas: A framework for audio analysis. Organised Sound (2000). Google ScholarDigital Library
George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5 (2002), 293--302. Google ScholarCross Ref
Julián Urbano, Markus Schedl, and Xavier Serra. 2013. Evaluation in music information retrieval. J. Intell. Inform. Syst. 41, 3 (2013), 345--369. Google ScholarDigital Library
Patricia Valdez and Albert Mehrabian. 1994. Effects of color on emotions. J. Exp. Psychol.: Gen. 123, 4 (1994), 394. Google ScholarCross Ref
Andrea Vedaldi and Stefano Soatto. 2008. Quick shift and kernel methods for mode seeking. In Computer Vision--ECCV 2008. Springer, 705--718. Google ScholarCross Ref
Carol Vernallis. 2004. Experiencing Music Video: Aesthetics and Cultural Context. Columbia University Press.Google Scholar
Wang Wei-ning, Yu Ying-lin, and Jiang Sheng-ming. 2006. Image retrieval by emotional semantics: A study of emotional space and feature extraction. In Proceedings of the International Conference on Systems, Man and Cybernetics. IEEE.Google Scholar
Felix Weninger, Björn Schuller, Cynthia Liem, Frank Kurth, and Alan Hanjalic. 2012. Music information retrieval: An inspirational guide to transfer from related disciplines. Dagstuhl Follow-Ups 3 (2012).Google Scholar
Ashkan Yazdani, Krista Kappeler, and Touradj Ebrahimi. 2011. Affective content analysis of music video clips. In Proceedings of the International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies. Google ScholarDigital Library
Jong-Chul Yoon, In-Kwon Lee, and Siwoo Byun. 2009. Automated music video generation using multi-level feature-based segmentation. In Handbook of Multimedia for Digital Entertainment and Arts. Springer. Google ScholarCross Ref
Shiliang Zhang, Qingming Huang, Shuqiang Jiang, Wen Gao, and Qi Tian. 2010. Affective visualization and retrieval for music video. IEEE Trans. Multimed. 12, 6 (2010). Google ScholarDigital Library
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Proc. Systems. Google ScholarDigital Library
Karel Zuiderveld. 1994. Contrast limited adaptive histogram equalization. In Graphics gems IV. 474--485. Google ScholarDigital Library

Index Terms

Harnessing Music-Related Visual Stereotypes for Music Information Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
        Visual content-based indexing and retrieval
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval
  2. Information systems applications
    1. Multimedia information systems

Recommendations

Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification
ICCEE '08: Proceedings of the 2008 International Conference on Computer and Electrical Engineering

In this paper, a methodology for Carnatic music singer identification is proposed and implemented. The motive behind identifying the singer is to extend this work for efficient music information retrieval of Carnatic music song based on singer ...
Read More
Pitch-frequency histogram-based music information retrieval for Turkish music

This study reviews the use of pitch histograms in music information retrieval studies for western and non-western music. The problems in applying the pitch-class histogram-based methods developed for western music to non-western music and specifically ...
Read More
User-centric music information retrieval
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 2
Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
March 2017
407 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3004291
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2016
- Accepted: 1 April 2016
- Revised: 1 February 2016
- Received: 1 October 2015
Published in tist Volume 8, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Music videos
video analysis
visual concept detection
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 484
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Harnessing Music-Related Visual Stereotypes for Music Information Retrieval

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification

Pitch-frequency histogram-based music information retrieval for Turkish music

User-centric music information retrieval