research-article

Multi-core parallelization in Clojure: a case study

Authors:
Johann M. Kraus

University of Ulm, Ulm

University of Ulm, Ulm
View Profile

,
Hans A. Kestler

University of Ulm, Ulm and University Hospital Ulm, Ulm

University of Ulm, Ulm and University Hospital Ulm, Ulm
View Profile

Authors Info & Claims

ELW '09: Proceedings of the 6th European Lisp WorkshopJuly 2009Pages 8–17https://doi.org/10.1145/1562868.1562870

Published:06 July 2009Publication History

ELW '09: Proceedings of the 6th European Lisp Workshop

Pages 8–17

ABSTRACT

In recent years, the demand for computational power in data mining applications has increased due to rapidly growing data sets. As a consequence, standard algorithms need to be parallelized for fast processing of the generated data sets. Unfortunately, most approaches for parallelizing algorithms require a careful software design and a deep knowledge about thread-safe programming. As a consequence they are hardly applicable for rapid prototyping of new algorithms. We outline the process of multi-core parallelization using Clojure, a new functional programming language utilizing the Java Virtual Machine (JVM) that does not require knowledge of thread-safe programming. We provide some benchmark results for our multi-core algorithm to demonstrate its computational power. The rationale behind Clojure is combining the industry-standard JVM with functional programming, immutable data structures, and a built-in concurrency support via software transactional memory. This makes it a suitable tool for parallelization and rapid prototyping in many areas. In this case study we present a multi-core parallel implementation of the k-means cluster algorithm. The multi-core algorithm shows an increase in computation speed up to a factor of 10 compared to R or network based parallelization.

References

A.-R. Adl-Tabatabai, C. Kozyrakis, and B. Saha. Unlocking concurrency. ACM Queue, 4(10):24--33, 2006. Google ScholarDigital Library
S. Ben-David, D. Pál, and H. U. Simon. Stability of k-means clustering. In N. H. Bshouty and C. Gentile, editors, Conference on Learning Theory, volume 4539 of Lecture Notes in Artificial Intelligence, pages 20--34, Berlin, 2007. Springer. Google ScholarDigital Library
S. Ben-David, U. von Luxburg, and D. Pál. A sober look at clustering stability. In J. G. Carbonell and J. Siekmann, editors, Conference on Learning Theory, volume 4005 of Lecture Notes in Artificial Intelligence, pages 5--19, Berlin, 2006. Springer. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys, 13(2):185--221, 1981. Google ScholarDigital Library
A. Bertoni and G. Valentini. Random projections for assessing gene expression cluster stability. In Proceedings of the IEEE-International Joint Conference on Neural Networks (IJCNN), volume 1, pages 149--154. IEEE Computer Society, 2005.Google ScholarCross Ref
B. Chapman, G. Jost, and R. van der Pas. Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press, Cambridge, 2007. Google ScholarDigital Library
E. Fowlkes and C. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553--569, 1983.Google ScholarCross Ref
W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, Cambridge, 1999. Google ScholarDigital Library
S. Halloway. Programming Clojure. Pragmatic Programmers, Raleigh, 2009. Google ScholarDigital Library
J. Handl, J. Knowles, and D. Kell. Computational cluster validation in post-genomic data analysis. Bioinformatics, 21(15):3201--3212, 2005. Google ScholarDigital Library
J. Hill, M. Hambley, T. Forster, M. Mewissen, T. M. Sloan, F. Scharinger, A. Trew, and P. Ghazal. Sprint: A new parallel framework for r. BMC Bioinformatics, 9(558), 2008.Google Scholar
L. Hubert and P. Arabie. Comparing partitions. Journal of Mathematical Classification, 2:193--218, 1985.Google ScholarCross Ref
P. Jaccard. Nouvelles recherches sur la distribution florale. Bulletin de la Société Vaudoise des sciences naturelles, 44:223--270, 1908.Google Scholar
A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, New Jersey, 1988. Google ScholarDigital Library
A. K. Jain and J. V. Moreau. Bootstrap technique in cluster analysis. Pattern Recognition, 20(5):547--568, 1987. Google ScholarDigital Library
H. A. Kestler, A. Müller, F. Schwenker, T. Gress, T. Mattfeldt, and G. Palm. Cluster analysis of comparative genomic hybridization data. Lecture Notes NATO ASI: Aritificial Intelligence and Heuristic Methods for Bioinformatics, pages S--40, 2001. Abstract.Google Scholar
D. Koenig, A. Glover, P. King, G. Laforge, and J. Skeet. Groovy in Action. Manning Publications Co., Greenwich, 2007. Google ScholarDigital Library
P. Kraj, A. Sharma, N. Garge, R. Podolsky, and R. A. McIndoe. Parakmeans: Implementation of a parallelized k-means algorithm suitable for general laboratory use. BMC Bioinformatics, 9(200), 2008.Google Scholar
T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann. Stability-based validation of clustering solutions. Neural Computation, 16(6):1299--1323, 2004. Google ScholarDigital Library
D. Lea. Concurrent Programming in Java: Design Principles and Patterns. Addison Wesley, Boston, 2nd edition, 2000. Google ScholarDigital Library
J. MacQueen. Some methods for classification and analysis of multivariate observations. In J. Neyman and L. L. Cam, editors, Proceedings of the 5th Berkeley Symposium on Math, Statistics and Probability, volume 1, pages 281--297, Berkely, 1967. University of California Press.Google Scholar
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. Stamp: Stanford transactional applications for multi-processing. In IISWC '08: Proceedings of the IEEE International Symposium on Workload Characterization, pages 35--46, Los Alamitos, 2008. IEEE Computer Society.Google Scholar
M. Odersky, L. Spoon, and B. Venners. Programming in Scala. Artima, Mountain View, 2008.Google Scholar
S. Peyton-Jones. Beautiful concurrency. In A. Oram and G. Wilson, editors, Beautiful code, chapter 24. O'Reilly, Sebastopol, 2007.Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, 2009. ISBN 3-900051-07-0.Google Scholar
R. Rajwar and J. Goodman. Transactional execution: Toward reliable, high-performance multithreading. IEEE Micro, 23(6):117--125, 2003. Google ScholarDigital Library
A. Rakhlin and A. Caponnetto. Stability of k-means clustering. In B. Schölkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1121--128. MIT Press, Cambridge, 2007.Google Scholar
W. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66:846--850, 1971.Google ScholarCross Ref
N. Shavit and D. Touitou. Software transactional memory. In Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pages 204--213, New York, 1995. ACM Press. Google ScholarDigital Library
S. Thompson. Haskell: The Craft of Functional Programming. Addison Wesley, Boston, 2nd edition, 1999. Google ScholarDigital Library

Index Terms

Multi-core parallelization in Clojure: a case study
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

A history of Clojure

Clojure was designed to be a general-purpose, practical functional language, suitable for use by professionals wherever its host language, e.g., Java, would be. Initially designed in 2005 and released in 2007, Clojure is a dialect of Lisp, but is not a ...
Read More
Clojure for Number Crunching on Multicore Machines

Clojure is a Lisp language designed to run on a Java Virtual Machine (JVM) and interoperate automatically with all Java libraries. However, compared to Java, Clojure has a concurrency API that encourages programmers to take advantage of multicore ...
Read More
Accelerating critical section execution with asymmetric multi-core architectures
ASPLOS 2009

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only ...
Read More

Reviews

Reviewer: Arthur Gittleman

Presented at the 2009 European Lisp Workshop (ELW), this paper shows, via a case study, how Clojure, a new language in the Lisp family, can be used for the rapid prototyping of multicore data mining applications. Clojure is a functional Java Virtual Machine (JVM) language with concurrency support that uses software transactional memory. It is easier to implement multicore parallelization in Clojure than in languages that require "a deep knowledge about thread-safe programming." This makes Clojure a good candidate for the "rapid prototyping of new algorithms" and for researchers who are more interested in applications than in the complexities of concurrent programming. Section 1 introduces parallel programming concepts, and Section 2 briefly introduces Clojure, without any details. Section 3 describes a k -means clustering case study. Data can be grouped into clusters, where each cluster contains a group of data items so that the items in the cluster are more similar to each other than to data in other clusters. Although there are actually several algorithms used for k -means clustering, "The k -means Clustering Algorithm" is a subsection of Section 3. In Section 4, "Experiments and Results," the first results used simulated data without clustering structure, with two datasets: a smaller set, with 10,000 cases and 100 dimensions, and a larger one, with 1,000,000 cases and 200 dimensions. Both results used 20 clusters. The test hardware was a dual quad-core machine. The authors compared Clojure to ParaKMeans-a Web application-and R-a software environment for statistical computing that has a k -means function. ParaKMeans could not handle the larger dataset; considering it is a Web application, it is not a good comparison for Clojure. The results show that Clojure performs worse than R for the smaller dataset, but better than R for the larger dataset. As Kraus and Kestler conclude, for the smaller dataset, "the computational overhead of the parallelization negatively effects [sic] the runtime." Unfortunately, the authors do not include the algorithms used, although the R documentation mentions that it uses a better algorithm than the one commonly used. Appendix A includes the 113-line Clojure source code that runs if a source file and a sampling function are provided. While this paper may present Clojure's value, the benchmark results require more analysis. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ELW '09: Proceedings of the 6th European Lisp Workshop
July 2009
35 pages
ISBN:9781605585390
DOI:10.1145/1562868
Conference Chair:
Didier Verna
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ACM proceedings
Lisp
clustering
parallel programming
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 819
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-core parallelization in Clojure: a case study

ELW '09: Proceedings of the 6th European Lisp Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

A history of Clojure

Clojure for Number Crunching on Multicore Machines

Accelerating critical section execution with asymmetric multi-core architectures

Reviews

Access critical reviews of Computing literature here