ABSTRACT
Composed of several hundreds of processors, the Graphics Processing Unit (GPU) has become a very interesting platform for computationally demanding tasks on massive data. A special hierarchy of processors and fast memory units allow very powerful and efficient parallelization but also demands novel parallel algorithms. Expectation Maximization (EM) is a widely used technique for maximum likelihood estimation. In this paper, we propose an innovative EM clustering algorithm particularly suited for the GPU platform on NVIDIA's Fermi architecture. The central idea of our algorithm is to allow the parallel threads exchanging their local information in an asynchronous way and thus updating their cluster representatives on demand by a technique called Asynchronous Model Updates (Async-EM). Async-EM enables our algorithm not only to accelerate convergence but also to reduce the overhead induced by memory bandwidth limitations and synchronization requirements. We demonstrate (1) how to reformulate the EM algorithm to be able to exchange information using Async-EM and (2) how to exploit the special memory and processor architecture of a modern GPU in order to share this information among threads in an optimal way. As a perspective Async-EM is not limited to EM but can be applied to a variety of algorithms.
- X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, and etal., "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, no. 1, pp. 1--37, 2007. Google ScholarDigital Library
- P. Liang and D. Klein, "Online em for unsupervised models," in Proceedings of Human Language Technologies, ser. NAACL Stroudsburg, PA, USA: Association for Computational Linguistics, 2009, pp. 611--619. Google ScholarDigital Library
- W. Kowalczyk and N. A. Vlassis, "Newscast em," in NIPS, 2004.Google Scholar
- A. Nikseresht and M. Gelgon, "Gossip-based computation of a gaussian mixture model for distributed multimedia indexing," IEEE Transactions on Multimedia, vol. 10, no. 3, pp. 385--392, 2008. Google ScholarDigital Library
- T. Mensink, W. Zajdel, and B. Kröse, "Distributed em learning for multi-camera tracking," in IEEE/ACM International Conference on Distributed Smart Cameras, 2007.Google Scholar
- R. D. Nowak, "Distributed em algorithms for density estimation and clustering in sensor networks," IEEE Transactions on Signal Processing, vol. 51, no. 8, pp. 2245--2253, 2003. Google ScholarDigital Library
- D. Gu, "Distributed em algorithm for gaussian mixtures in sensor networks," IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1154--1166, 2008. Google ScholarDigital Library
- A. D. Pangborn, "Scalable data clustering using gpus. master thesis, rochester institute of technology, 2010," http://apangborn.com/ms-thesis/#Download.Google Scholar
- N. S. L. P. Kumar, S. Satoor, and I. Buck, "Fast parallel expectation maximization for gaussian mixture models on gpus using cuda," in HPCC, 2009, pp. 103--109. Google ScholarDigital Library
- A. Harp, "Em of gmms with gpu acceleration with cuda," http://andrewharp.com/gmmcuda, lastly updated on 21/05/2009.Google Scholar
- J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach, 5th Edition. Morgan Kaufmann, 2011. Google ScholarDigital Library
- C. Plant and C. Böhm, "Inconco: interpretable clustering of numerical and categorical objects," in KDD, 2011, pp. 1127--1135. Google ScholarDigital Library
- A. Frank and A. Asuncion, "UCI machine learning repository," 2010. {Online}. Available: http://archive.ics.uci.edu/mlGoogle Scholar
- R. Neal and G. E. Hinton, "A view of the em algorithm that justifies incremental, sparse, and other variants," in Learning in Graphical Models. Kluwer Academic Publishers, 1998, pp. 355--368. Google ScholarDigital Library
Index Terms
- Massively parallel expectation maximization using graphics processing units
Recommendations
An Improved Magma Gemm For Fermi Graphics Processing Units
We present an improved matrixâ matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify ...
Parallel spherical harmonic transforms on heterogeneous architectures graphics processing units/multi-core CPUs
Spherical harmonic transforms SHT are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas, new cutting-edge science goals have been recently proposed requiring ...
Exploring graphics processing units as parallel coprocessors for online aggregation
DOLAP '10: Proceedings of the ACM 13th international workshop on Data warehousing and OLAPMultidimensional aggregation is one of the most important computational building blocks and hence also a potential performance bottleneck in Online Analytic Processing (OLAP). In order to deliver fast query responses for interactive operations such as ...
Comments