skip to main content
10.1145/377792acmconferencesBook PagePublication PagesicsConference Proceedingsconference-collections
ICS '01: Proceedings of the 15th international conference on Supercomputing
ACM2001 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ICS01: 15th International Conference on Supercomputing Sorrento Italy
ISBN:
978-1-58113-410-0
Published:
17 June 2001
Sponsors:

Bibliometrics
Abstract

No abstract available.

Article
Analytical cache models with applications to cache partitioning

An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta. The input to the model consists of the isolated miss-rate ...

Article
A synthesis of memory mechanisms for distributed architectures

Producing efficient parallel programs for distributed memory multiprocessors is a difficult task. Hand-coding efficient parallel programs for these systems can be extremely difficult, time consuming and error-prone, so people have turned to the shared ...

Article
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms

This paper explores previously established and novel methods for scaling the performance of OpenMP on NUMA architectures. The spectrum of methods under investigation includes OS-level automatic page placement algorithms, dynamic page migrationd manual ...

Article
Fractal symbolic analysis

Modern compilers perform wholesale restructuring of programs to improve their efficiency. Dependence analysis is the most widely used technique for proving the correctness of such transformations, but it suffers from the limitation that it considers ...

Article
Data locality enhancement by memory reduction

In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will ...

Article
Eliminating redundancies in sum-of-product array computations

Array programming languages such as Fortran 90, High Performance Fortran and ZPL are well-suited to scientific computing because they free the scientist from the responsibility of managing burdensome low-level details that complicate programming in ...

Article
Monotonic evolution: an alternative to induction variable substitution for dependence analysis

We present a new approach to dependence testing in the presence of induction variables. Instead of looking for closed form expressions, our method computes monotonic evolution which captures the direction in which the value of a variable changes. This ...

Article
Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization

At Rice University, we have undertaken a project to construct a framework for generating high-level problem solving languages that can achieve high performance on a variety of platforms.The underlying strategy, called telescoping languages, builds ...

Article
Loop optimization for a class of memory-constrained computations

Compute-intensive multi-dimensional summations that involve products of several arrays arise in the modeling of electronic structure of materials. Sometimes several alternative formulations of a computation, representing different space-time trade-offs, ...

Article
Fast parallel in-memory 64-bit sorting

Parallel in-memory 64-bit sorting is an important problem in Database Management Systems and other applications such as Internet Search Engines and Data Mining Tools.

We propose a new algorithm that we call Parallel Counting Split Radix sort, PCS-Radix ...

Article
Optimizing locality for ODE solvers

Runge-Kutta methods are popular methods for the solution of systems of ordinary differential equations and are provided by many scientific libraries. The performance of Runge-Kutta methods does not only depend on the specific application problem to be ...

Article
Array language support for parallel sparse computation

This paper describes an array-based language-level approach to parallel sparse computation. Our approach is unique due to its separation of sparse index sets from arrays, both syntactically and in the implementation. This design allows users to express ...

Article
A parallel algorithm for sparse symbolic LU factorization without pivoting on out—of—core matrices

Finding the nonzero structures of the lower and upper triangular factors of an unsymmetric sparse matrix A is an important problem in the field of sparse matrix computations. Complementing previous research on sequential algorithms, we develop a ...

Article
Tools for application-oriented performance tuning

Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance bottlenecks. Existing performance tools don't adequately support this ...

Article
Global optimization techniques for automatic parallelization of hybrid applications

This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our scheme is presented in the context of automatic parallelization of sequential ...

Article
Tuning high-performance scientific codes: the use of performance models to control resource usage during data migration and I/O

Large-scale parallel simulations are a popular tool for investigating phenomena ranging from nuclear explosions to protein folding. These codes produce copious output that must be moved to the workstation where it will be visualized. Scientists have a ...

Article
Computer aided hand tuning (CAHT): “applying case-based reasoning to performance tuning”

For most parallel and high performance systems, tuning guides provide the users with advices to optimize the execution time of their programs. Execution time may be very sensitive to small program changes. Such modifications may be local (on loop) or ...

Article
Cache performance for multimedia applications

The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache performance due to non-locality of data references. Despite this, there is ...

Article
On the potential of tolerant region reuse for multimedia applications

The recent years have shown an interesting evolution in the mid-end to low-end embedded domain. Portable systems are growing in importance as they improve in storage capacity and in interaction capabilities with general purpose systems. Furthermore, ...

Article
Evaluation of processor code efficiency for embedded systems

This paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/ CPU2000/JVM98 and CaffeineMark benchmarks, in terms of code sizes, basic block sizes, instruction distributions, and average instruction ...

Article
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D ...

Article
Bringing together automatic differentiation and OpenMP

Derivatives of almost arbitrary functions can be evaluated efficiently by automatic differentiation whenever the functions are given in the form of computer programs in a high-level programming language such as Fortran, C, or C++. Furthermore, in ...

Article
Automatic code generation for a turbulence scheme

In this paper we describe how to extend CTADEL, a Problem Solving Environment, in order to generate code for a turbulence scheme, in our case, within a numerical weather prediction model (NWP). Common for these schemes is the presence of implicit ...

Article
Towards the effective parallel computation of matrix pseudospectra

Given a matrix A, the computation of its pseudospectrum A∈ (A) is a far more expensive task than the computation of characteristics such as the condition number and the matrix spectrum. As research of the last 15 years has shown, however, the matrix ...

Article
A graphical tool for driving the parallel computation of pseudosprectra

This paper presents the programming environment of a new tool for the parallel computation of Pseudospectra. Based on the PPA Talgorithm described in [16, 17], this algorithm offers total reliability and can handle singularities along the level curve ...

Article
Register-sensitive selection, duplication, and sequencing of instructions

In this paper, we present a new framework for selecting, duplicating and sequencing instructions so as to decrease register pressure. The motivation for this work is to target current and future high-performance processors where reductions in register ...

Article
Load and store reuse using register file contents

The detection of opportunities for value reuse optimizations in memory operations require both the addresses and values associated with these operations to be available. Although the values are typically available in the physical register file, their ...

Article
Improving Gang Scheduling through job performance analysis and malleability

The OpenMP programming model provides parallel applications a very important feature: job malleability. Job malleability is the capacity of an application to dynamically adapt its parallelism to the number of processors allocated to it. We believe that ...

Article
Reducing the complexity of the issue logic

The issue logic of dynamically scheduled superscalar processors is one of their most complex and power-consuming parts. In this paper we present alternative issue-logic designs that are much simpler than the traditional scheme while they retain most of ...

Article
Slice-processors: an implementation of operation-based prediction

We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming ...

Contributors
  • National Reseach Council of Italy (CNR), Institute of Applied Sciences and Intelligent Systems “Eduardo Caianiello”
  • University of Patras

Recommendations

Acceptance Rates

ICS '01 Paper Acceptance Rate45of133submissions,34%Overall Acceptance Rate584of2,055submissions,28%
YearSubmittedAcceptedRate
ICS '211573925%
ICS '151604025%
ICS '141603421%
ICS '132024321%
ICS '061413726%
ICS '031713621%
ICS '021443122%
ICS '011334534%
ICS '001223327%
ICS '991805732%
ICS '971354533%
ICS '961165043%
ICS '951204941%
ICS '941144539%
Overall2,05558428%