Abstract
The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.
- Bai90 David H. Bailey. FFT's in External or Hierarchical Memory. Journal ot Supercomputing, 4( 1):23-35, March 1990 Google ScholarDigital Library
- BLM+91 Guy E Blelloch, Charles E Leiserson, Bruce M. Maggs, C. Greg Plaxton, Stephen J. Smith, and Marco Zagha. A Comparison of Sorting Algorithms for the Connection Machine CM-2. In Proceedings of the Symposium on Parallel Algorithms and Architectures, pp. 3-16, July 1991. Google ScholarDigital Library
- Bra77 Achi Brandt. Multi-Level Adaptive Solutions to Boundary- Value Problems. Mathematics of Computation 31(138):333- 390.Google Scholar
- Den68 Peter J. Denning. The Working Set Model for Program Behavior. Communtcations of the ACM, 11(5):323-333. Google ScholarDigital Library
- DSR+93 Michel Dubois, Jonas Skeppstedt, Livio Ricciulli, Krishnan Ramamurthy, and Per Stenstrom. The Detection and Elimination of Useless Mxsses in Multiprocessors. In Proceedings o.{ the 20th International Symposium on Computer Architecture, pp. 88-97, May 1993. Google ScholarDigital Library
- EgK89 Susan J. Eggers and Randy H. Katz. The Effects of Shanng on the Cache and Bus Performance of Parallel Programs. In Proceedings of the Third International Conjerence on Archttectural Support .for Programming Languages and Operating Systems (ASPLOS III), pp. 257-270, April 1989. Google ScholarDigital Library
- FoW78 S. Fortune and J. Wyllie. Parallelism in Random Access Machines, In Proceedings of the Tenth A CM Symposium on Theory of Computing, May 1978. Google ScholarDigital Library
- Gol93 Stephen Goldschmidt. Simulation of Multiprocessors: Accuracy and Performance. Ph.D. Thesis, Stanford University, June 1993. Google ScholarDigital Library
- Gre87 Leslie Greengard. The Rapid Evaluation of Potential Fields in Particle Systems. ACM Press. 1987.Google Scholar
- GuW92 Anoop Gupta and Wolf-Dietrich Weber. Cache invalidation Patterns in Shared-Memory Muluprocessors. IEEE Transactions on Computers, 41(7):794-810, July 1992. Google ScholarDigital Library
- HHS+95 Chris Holt, Mark Heinrich, Jaswinder Pal Singh, Edward Rothberg, and John Hennessy. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors. Stanford University Technical Report No. CSL- TR-95-660. January 1995. Google ScholarDigital Library
- HSA91 Pat Hanrahan, David $ali~rnan and Larry Aupperle, "'A Rapid Hierarchical Radiosity Algorithm", In Proceedings oj SIG- GRAPH 1991. Google ScholarDigital Library
- HoS95 Chris Holt and Jaswinder Pal Singh. Hierarchical N-Body Methods on Shared Address Space Multiprocessors. In Proceedmgs oJ the Seventh SIAM International Conference on Parallel Processing .for Scientific Computing, pp. 313-318, Feb 1995.Google Scholar
- NiL92 Jason Nieh and Marc Levoy, "Volume Rendering on Scalable Shared-Memory MIMD Architectures", In Proceedings of the Boston Workshop on Volume Visualization, October 1992. Google ScholarDigital Library
- PaP84 M. Papamarcos and J. Patel. A Low Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In Proceedings oJ the 11th international Symposium on Computer Architecture, pp. 348-354, 1984. Google ScholarDigital Library
- RSG93 Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings o{ the 20th International Symposium on Computer Architecture, pp. 14-25, May 1993. Google ScholarDigital Library
- SGL94 Jaswinder Pal Singh, Anoop Gupta and Marc Levoy, "Parallel Visualization Algorithms: Performance and Architectural Implications", IEEE Computer 27(7):45-55, July 1994. Google ScholarDigital Library
- SWG92 Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20( 1):5-44, March 1992. Google ScholarDigital Library
- TLH94 Josep Torrellas, Monica S. Lam, and John L. Hennessy. False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Transactions on Computers, 43(6):651-663, June 1994. Google ScholarDigital Library
- TuE93 Dean M. Tullsen and Susan J. Eggers. Lxmitations of Cache Prefetching on a Bus-Based Multiprocessor. In Proceedings oJ the 20th International Sympostum on Computer Architecture, pp. 278-288, May 1993 Google ScholarDigital Library
- WSH93 Steven Cameron Woo, Jaswmder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Message Passing m Cache-Coherent Multiprocessors Stanford University Technical Report No. CSL-TR-93-593, December 1993. Google ScholarDigital Library
- WSH94 Steven Cameron Woo, Jaswinder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors. In Proceedings of the Sixth International Conterence on Architectural Support for Programming Languages and Operating Systems (ASP- LOS- VI), pp. 219-229, October 1994. Google ScholarDigital Library
Index Terms
- The SPLASH-2 programs: characterization and methodological considerations
Recommendations
SPLASH: Stanford parallel applications for shared-memory
We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will ...
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architectureThe SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the ...
On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator
ICECCS '14: Proceedings of the 2014 3rd International Conference on Eco-friendly Computing and Communication SystemsToday cache size and hierarchy level of caches play an important role in improving computer performance. By using full system simulations of gem5, the variation in memory bandwidth, system bus throughput, L1 and L2 cache size misses are measured by ...
Comments