The SPLASH-2 programs: characterization and methodological considerations

Authors:
Steven Cameron Woo

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Moriyoshi Ohara

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Evan Torrie

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Jaswinder Pal Singh

Department of Computer Science, Princeton University, Princeton, NJ

Department of Computer Science, Princeton University, Princeton, NJ
View Profile

,
Anoop Gupta

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 23 Issue 2May 1995pp 24–36https://doi.org/10.1145/225830.223990

Published:01 May 1995Publication History

ACM SIGARCH Computer Architecture News

Abstract

The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.

References

Bai90 David H. Bailey. FFT's in External or Hierarchical Memory. Journal ot Supercomputing, 4( 1):23-35, March 1990 Google ScholarDigital Library
BLM+91 Guy E Blelloch, Charles E Leiserson, Bruce M. Maggs, C. Greg Plaxton, Stephen J. Smith, and Marco Zagha. A Comparison of Sorting Algorithms for the Connection Machine CM-2. In Proceedings of the Symposium on Parallel Algorithms and Architectures, pp. 3-16, July 1991. Google ScholarDigital Library
Bra77 Achi Brandt. Multi-Level Adaptive Solutions to Boundary- Value Problems. Mathematics of Computation 31(138):333- 390.Google Scholar
Den68 Peter J. Denning. The Working Set Model for Program Behavior. Communtcations of the ACM, 11(5):323-333. Google ScholarDigital Library
DSR+93 Michel Dubois, Jonas Skeppstedt, Livio Ricciulli, Krishnan Ramamurthy, and Per Stenstrom. The Detection and Elimination of Useless Mxsses in Multiprocessors. In Proceedings o.{ the 20th International Symposium on Computer Architecture, pp. 88-97, May 1993. Google ScholarDigital Library
EgK89 Susan J. Eggers and Randy H. Katz. The Effects of Shanng on the Cache and Bus Performance of Parallel Programs. In Proceedings of the Third International Conjerence on Archttectural Support .for Programming Languages and Operating Systems (ASPLOS III), pp. 257-270, April 1989. Google ScholarDigital Library
FoW78 S. Fortune and J. Wyllie. Parallelism in Random Access Machines, In Proceedings of the Tenth A CM Symposium on Theory of Computing, May 1978. Google ScholarDigital Library
Gol93 Stephen Goldschmidt. Simulation of Multiprocessors: Accuracy and Performance. Ph.D. Thesis, Stanford University, June 1993. Google ScholarDigital Library
Gre87 Leslie Greengard. The Rapid Evaluation of Potential Fields in Particle Systems. ACM Press. 1987.Google Scholar
GuW92 Anoop Gupta and Wolf-Dietrich Weber. Cache invalidation Patterns in Shared-Memory Muluprocessors. IEEE Transactions on Computers, 41(7):794-810, July 1992. Google ScholarDigital Library
HHS+95 Chris Holt, Mark Heinrich, Jaswinder Pal Singh, Edward Rothberg, and John Hennessy. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors. Stanford University Technical Report No. CSL- TR-95-660. January 1995. Google ScholarDigital Library
HSA91 Pat Hanrahan, David $ali~rnan and Larry Aupperle, "'A Rapid Hierarchical Radiosity Algorithm", In Proceedings oj SIG- GRAPH 1991. Google ScholarDigital Library
HoS95 Chris Holt and Jaswinder Pal Singh. Hierarchical N-Body Methods on Shared Address Space Multiprocessors. In Proceedmgs oJ the Seventh SIAM International Conference on Parallel Processing .for Scientific Computing, pp. 313-318, Feb 1995.Google Scholar
NiL92 Jason Nieh and Marc Levoy, "Volume Rendering on Scalable Shared-Memory MIMD Architectures", In Proceedings of the Boston Workshop on Volume Visualization, October 1992. Google ScholarDigital Library
PaP84 M. Papamarcos and J. Patel. A Low Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In Proceedings oJ the 11th international Symposium on Computer Architecture, pp. 348-354, 1984. Google ScholarDigital Library
RSG93 Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings o{ the 20th International Symposium on Computer Architecture, pp. 14-25, May 1993. Google ScholarDigital Library
SGL94 Jaswinder Pal Singh, Anoop Gupta and Marc Levoy, "Parallel Visualization Algorithms: Performance and Architectural Implications", IEEE Computer 27(7):45-55, July 1994. Google ScholarDigital Library
SWG92 Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20( 1):5-44, March 1992. Google ScholarDigital Library
TLH94 Josep Torrellas, Monica S. Lam, and John L. Hennessy. False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Transactions on Computers, 43(6):651-663, June 1994. Google ScholarDigital Library
TuE93 Dean M. Tullsen and Susan J. Eggers. Lxmitations of Cache Prefetching on a Bus-Based Multiprocessor. In Proceedings oJ the 20th International Sympostum on Computer Architecture, pp. 278-288, May 1993 Google ScholarDigital Library
WSH93 Steven Cameron Woo, Jaswmder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Message Passing m Cache-Coherent Multiprocessors Stanford University Technical Report No. CSL-TR-93-593, December 1993. Google ScholarDigital Library
WSH94 Steven Cameron Woo, Jaswinder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors. In Proceedings of the Sixth International Conterence on Architectural Support for Programming Languages and Operating Systems (ASP- LOS- VI), pp. 219-229, October 1994. Google ScholarDigital Library

Index Terms

The SPLASH-2 programs: characterization and methodological considerations

Recommendations

SPLASH: Stanford parallel applications for shared-memory

We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will ...
Read More
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the ...
Read More
On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator
ICECCS '14: Proceedings of the 2014 3rd International Conference on Eco-friendly Computing and Communication Systems

Today cache size and hierarchy level of caches play an important role in improving computer performance. By using full system simulations of gem5, the variation in memory bandwidth, system bus throughput, L1 and L2 cache size misses are measured by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 23, Issue 2
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
May 1995
412 pages
ISSN:0163-5964
DOI:10.1145/225830
Chairman:
David A. Patterson
Univ. of California, Berkeley
Issue’s Table of Contents
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture
July 1995
426 pages
ISBN:0897916980
DOI:10.1145/223982
Chairman:
David A. Patterson
Univ. of California, Berkeley
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1995
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,206
  Total Citations
  View Citations
- 5,037
  Total Downloads
- Downloads (Last 12 months)399
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The SPLASH-2 programs: characterization and methodological considerations

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

SPLASH: Stanford parallel applications for shared-memory

The SPLASH-2 programs: characterization and methodological considerations

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator