skip to main content
research-article

Scalable multimedia content analysis on parallel platforms using python

Published: 14 February 2014 Publication History

Abstract

In this new era dominated by consumer-produced media there is a high demand for web-scalable solutions to multimedia content analysis. A compelling approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge due to the increased code complexity, limited portability and required low-level knowledge of the underlying hardware. In this article, we present PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code to a variety of parallel platforms. PyCASP is designed using a systematic, pattern-oriented approach to offer a single software development environment for multimedia content analysis applications. Using PyCASP, applications can be prototyped in a couple hundred lines of Python code and automatically scale to modern parallel processors. Applications written with PyCASP are portable to a variety of parallel platforms and efficiently scale from a single desktop Graphics Processing Unit (GPU) to an entire cluster with a small change to application code. To illustrate our approach, we present three multimedia content analysis applications that use our framework: a state-of-the-art speaker diarization application, a content-based music recommendation system based on the Million Song Dataset, and a video event detection system for consumer-produced videos. We show that across this wide range of applications, our approach achieves the goal of automatic portability and scalability while at the same time allowing easy prototyping in a high-level language and efficient performance of low-level optimized code.

References

[1]
X. Amatriain, M. D. Boer, and E. Robledo. 2002. Clam: An OO framework for developing audio and music applications. In Proceedings of the 17th Annual Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'02).
[2]
A. Andoni 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, 459--468.
[3]
X. Anguera, S. Bozonnet, N. W. D. Evans, C. Fredouille, G. Friedland, and O. Vinyals. 2012. Speaker diarization: A review of recent research. IEEE Trans. Acoust. Speech Signal Process. 20, 356--370.
[4]
K. Asanovic, R. Bodik, et al. 2006. The landscape of parallel computing research: A view from Berkeley. Tech. rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.
[5]
D. Ascher, P. F. Dubois, K. Hinsen, J. Hugunin, and T. Oliphant. 1999. Numerical Python UCRL-MA-128569. Lawrence Livermore National Laboratory, Livermore, CA.
[6]
E. Battenberg and D. Wessel. 2009. Accelerating non-negative matrix factorization for audio source separation on multi-core and many-core architectures. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 501--506.
[7]
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference.
[8]
T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million song dataset. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR'11).
[9]
C. M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK.
[10]
L. S. Blackford, J. Demmel, et al. 2001. An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw. 28, 135--151.
[11]
J. Carletta. 2007. Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Language Resources Eval. 41, 2, 181--190.
[12]
B. Catanzaro, M. Garland, and K. Keutzer. 2010. Copperhead: Compiling an embedded data parallel language. Tech. rep. UCB/EECS-2010-124, EECS Department, University of California, Berkeley.
[13]
B. Catanzaro, S. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. 2009a. SEJITS: Getting productivity and performance with selective embedded JIT specialization. In Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA'09).
[14]
B. Catanzaro, B.-Y. Su, N. Sundaram, Y. Lee, M. Murphy, and K. Keutzer. 2009b. Efficient, high-quality image contour detection. In Proceedings of the IEEE 12th International Conference on Computer Vision. 2381--2388.
[15]
B. Catanzaro, N. Sundaram, and K. Keutzer. 2008. Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th International Conference on Machine Learning (ICML'08). ACM, New York, 104--111.
[16]
H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM, New York, 35--46.
[17]
E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu, and H. Cui. 2009. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, Springer, 213--220.
[18]
C. Charbuillet, D. Tardieu, and G. Peeters. 2011. Gmm supervector for content based music similarity. In Proceedings of the 14th International Conference on Digital Audio Effects.
[19]
S. Chaudhuri, M. Harvilla, and B. Raj. 2011. Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In Proceedings of the 11th Proceedings of the Annual Conference of the International Speech Communication Association.
[20]
J. Chaves, J. Nehrbass, B. Guilfoos, J. Gardiner, S. Ahalt, A. Krishnamurthy, J. Unpingco, A. Chalker, A. Warnock, and S. Samsi. 2006. Octave and Python: High-level scripting languages productivity and performance evaluation. In Proceedings of the HPCMP Users Group Conference. 429--434.
[21]
J. Chong, G. Friedland, A. Janin, N. Morgan, and C. Oei. 2010. Opportunities and challenges of parallelizing speech recognition. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar'10). USENIX Association, Berkeley, CA, 2--2.
[22]
J. Chong, E. Gonina, Y. Yi, and K. Keutzer. 2009. A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. In Proceedings of the 10th Annual Conference of the International Speech Communication Association.
[23]
H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, and A. Fox. 2011. Cuda-level performance with python-level productivity for Gaussian mixture model applications. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism.
[24]
J. Dean and S. Ghemawat. 2008. Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107--113.
[25]
B. Elizalde, G. Friedland, H. Lei, and A. Divakaran. 2012. There is no data like less data: Percepts for video concept detection on consumer-produced media. In Proceedings of the 1st ACM Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis.
[26]
P. Ferraro, P. Hanna, L. Imbert, and T. Izard. 2009. Accelerating query-by-humming on gpu. In Proceedings of the International Symposium on Music Information Retrieval. K. Hirata, G. Tzanetakis, and K. Yoshii, Eds., International Society for Music Information Retrieval, 279--284.
[27]
G. Friedland, C. Yeo, and H. Hung. 2010. Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem. ACM Trans. Multimedia Comput. Commun. Appl. 6, 27:1--27:18.
[28]
E. Gonina, G. Friedland, H. Cook, and K. Keutzer. 2011. Fast speaker diarization using a high-level scripting language. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 553--558.
[29]
E. Gonina, A. Kannan, J. Shafer, and M. Budiu. 2011. Parallelizing large-scale data processing applications with data skew: A case study in product-offer matching. In Proceedings of the 2nd International Workshop on MapReduce and Its Applications (MapReduce'11). ACM, New York, 35--42.
[30]
T. Goodale, G. Allen, G. Lanfermann, J. Mass, E. Seidel, and J. Shalf. The cactus framework and toolkit: Design and applications. In Proceedings of the 5th International Conference on High Performance Computing for Computational Science (VECPAR'02). Springer, 26--28.
[31]
V. W. Gregory. 2000. Programmers tool chest: The OpenCV library. Dr. Dobbs Journal.
[32]
E. Grinspun, P. Krysl, and P. Schröder. 2002. Charms: A simple framework for adaptive simulation. ACM Trans. Graphics 281--290.
[33]
HMM Toolkit web page.
[34]
P. Hudak and M. Jones. 1994. Haskell vs. ada vs. c++ vs. awk vs. … an experiment in software prototyping productivity. Research Report YALEU/DCS/RR-1049, Department of Computer Science, Yale University, New Haven, CT. Oct.
[35]
D. Imseng and G. Friedland. 2009. Robust speaker diarization for short speech recordings. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 432--437.
[36]
Intel. Cilk 5.4.6 Reference Manual. Intel. Version 5.4.6.
[37]
S. Kamil, D. Coetzee, and A. Fox. 2011. Bringing parallel performance to python with domain-specific selective embedded just-in-time specialization. In Proceedings of the Python for Scientific Computing Conference.
[38]
K. Keutzer and T. G. Mattson. 2010. A design pattern language for engineering (parallel) software. Intel Tech. J. 4.
[39]
Khronos Group 2010. OpenCL 1.1 Specification. Khronos Group. Version 1.1.
[40]
A. Kosner. 2012. Youtube turns seven today, now uploads 72 hours of video per minute. Forbes.
[41]
Z. Liu, Y. Zhang, E. Y. Chang, and M. Sun. 2011. Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3, 26:1--26:18.
[42]
L. Lu and A. Hanjalic. 2008. Audio keywords discovery for text-like audio content analysis and retrieval. IEEE Trans. Multimedia 10, 1, 74--85.
[43]
C. R. Michael Casey and M. Slaney. 2008. Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio Speech Lang. Process 16, 10151028.
[44]
F. Mueller. 1995. Pthreads library interface. Florida State University.
[45]
NVIDIA Corporation 2010. NVIDIA CUDA Programming Guide. NVIDIA Corporation. Version 3.2.
[46]
OpenMP 2008. OpenMP Application Programming Interface. OpenMP. Version 3.0.
[47]
A. D. Pangborn. 2010. Scalable data clustering using gpus. M.S. thesis, Rochester Institute of Technology.
[48]
L. Prechelt. 2000. An empirical comparison of seven programming languages. Computer 33, 10, 23--29.
[49]
L. Ramakrishnan, P. T. Zbiegel, et al. 2011. Magellan: experiences from a science cloud. In Proceedings of the 2nd International Workshop on Scientific Cloud Computing (ScienceCloud'11). ACM, New York, 49--58.
[50]
D. Reynolds and P. Torres-Carrasquillo. 2005. Approaches and applications of audio diarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05). Vol. 5. v/953--v/956.
[51]
M. Slaney. 2010. Processing web-scale multimedia data. In Proceedings of the International Conference on Multimedia.
[52]
N. Sundaram, T. Brox, and K. Keutzer. 2010. Dense point trajectories by gpu-accelerated large displacement optical flow. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). Springer, 438--451.
[53]
G. Takács, I. Pilászy, B. Németh, and D. Tikk. 2009. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 10, 623--656.
[54]
G. Tzanetakis, Marsyas submissions to MIREX 2007. In Proceedings of the 8th International Conference on Music Information Retrieval.
[55]
R. Vuduc, J. W. Demmel, and K. A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. J. Phys. Conf. Ser. 16, 1, 521.
[56]
R. C. Whaley and A. Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice Experi. 35, 2, 101--121. http://www.cs.utsa.edu/∼whaley/papers/spercw04.ps.
[57]
T. White. 2009. Hadoop: The Definitive Guide Ist Ed. O'Reilly.
[58]
C. Wooters and M. Huijbregts. 2007. The ICSI RT07s Speaker Diarization System. In Proceedings of the 2nd International Workshop on Classification of Events, Activities, and Relationships (CLEAR'07) and the 5th Rich Transcription Meeting Recognition (RT'07). 509--519.
[59]
R. Xia, T. Elmas, S. A. Kamil, A. Fox, and K. Sen. 2012. Multi-level debugging for multi-stage, parallelizing compilers. Tech. rep. UCB/EECS-2012-227, EECS Department, University of California, Berkeley.
[60]
K. You, J. Chong, Y. Yi, E. Gonina, C. Hughes, Y. Chen, W. Sung, and K. Keutzer. 2009. Parallel scalability in speech recognition: Inference engine in large vocabulary continuous speech recognition. IEEE Signal Process Mag. 6, 124--135.

Cited By

View all
  • (2022)Application-Oriented Content Quality Analysis of Data Using PythonData Engineering and Intelligent Computing10.1007/978-981-19-1559-8_4(25-32)Online publication date: 6-Jul-2022
  • (2020)A Survey of Profit Optimization Techniques for Cloud ProvidersACM Computing Surveys10.1145/337691753:2(1-35)Online publication date: 20-Mar-2020
  • (2019)Probabilistic Worst-Case Timing AnalysisACM Computing Surveys10.1145/330128352:1(1-35)Online publication date: 13-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 2
February 2014
142 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2579228
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2014
Accepted: 01 August 2013
Revised: 01 June 2013
Received: 01 January 2013
Published in TOMM Volume 10, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. Multimedia content analysis
  3. parallelism
  4. rapid prototyping

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Application-Oriented Content Quality Analysis of Data Using PythonData Engineering and Intelligent Computing10.1007/978-981-19-1559-8_4(25-32)Online publication date: 6-Jul-2022
  • (2020)A Survey of Profit Optimization Techniques for Cloud ProvidersACM Computing Surveys10.1145/337691753:2(1-35)Online publication date: 20-Mar-2020
  • (2019)Probabilistic Worst-Case Timing AnalysisACM Computing Surveys10.1145/330128352:1(1-35)Online publication date: 13-Feb-2019
  • (2016)Automatic extraction of future references from news using morphosemantic patterns with application to future trend predictionAI Matters10.1145/3008665.30086712:4(13-15)Online publication date: 8-Dec-2016
  • (2016)Nonverbal communication in socially assistive human-robot interactionAI Matters10.1145/3008665.30086692:4(9-10)Online publication date: 8-Dec-2016
  • (2016)Detecting Events in Streaming Multimedia with Big Data Techniques2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)10.1109/PDP.2016.45(345-349)Online publication date: Feb-2016
  • (2015)Probabilistic bisimulationACM SIGLOG News10.1145/2815493.28155012:3(72-84)Online publication date: 17-Aug-2015
  • (2015)Location privacy via geo-indistinguishabilityACM SIGLOG News10.1145/2815493.28154992:3(46-69)Online publication date: 17-Aug-2015
  • (2015)A pattern oriented approach for designing scalable analytics applications (invited talk)Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications10.1145/2726935.2726939(4-8)Online publication date: 8-Feb-2015
  • (2015)Multi-Camera Coordination and Control in Surveillance SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/271012811:4(1-30)Online publication date: 2-Jun-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media