ABSTRACT
To exploit the benefits of throughput-optimized processors such as GPUs, applications need to be redesigned to achieve performance and efficiency. In this work, we present techniques to speed up statistical timing analysis on throughput processors. We draw upon advancements in improving the efficiency of Monte Carlo based statistical static timing analysis (MC SSTA) using techniques to reduce the sample size or smart sampling techniques. An efficient smart sampling technique, Stratification + Hybrid Quasi Monte Carlo (SH-QMC), is implemented on a GPU based on NVIDIA CUDA architecture. We show that although this application is based on MC analysis with straightforward parallelism available, achieving performance and efficiency on the GPU requires exposing more parallelism and finding locality in computations. This is in contrast with random sampling based algorithms which are inefficient in terms of sample size but can keep resources utilized on a GPU. We show that SH-QMC implemented on a Multi GPU is twice as fast as a single STA on a CPU for benchmark circuits considered. In terms of an efficiency metric, which measures the ability to convert a reduction in sample size to a corresponding reduction in runtime w.r.t a random sampling approach, we achieve 73.9% efficiency with the proposed approaches compared to 4.3% for an implementation involving performing computations on smart samples in parallel. Another contribution of the paper is a critical graph analysis technique to improve the efficiency of Monte Carlo based SSTA, leading to 2--9X further speedup.
- C. Visweswariah, et al., "First-Order Incremental Block-Based Statistical Timing Analysis," Proc. Design Automation Conference, pp 331--336, 2004. Google ScholarDigital Library
- H. Chang, and S. S. Sapatnekar, "Statistical Timing Analysis Considering Spatial Correlations using a Single Pert-Like Traversal," Proc. International Conference on Computer-Aided Design, pp. 621--625, 2003. Google ScholarDigital Library
- K. Chopra, et al., "Parametric yield maximization using gate sizing based on efficient statistical power and delay gradient computation", Proc. International Conference on Computer Aided Design, pp 1023--1028, 2005. Google ScholarDigital Library
- F. N. Najm, N. Menezes, "Statistical timing analysis based on a timing yield model," Proc. Design Automation Conference, pp. 460--465, 2004. Google ScholarDigital Library
- R. Gandikota, D. Blaauw, D. Sylvester, "Modeling crosstalk in statistical static timing analysis", Proc. Design Automation Conference, pp. 974--979, 2008. Google ScholarDigital Library
- R. Y. Rubinstein, Simulation and the Monte Carlo Method, John Wiley & Sons, Inc., 1981. Google ScholarDigital Library
- R. Kanj, R. Joshi, and S. Nassif, "Mixture Importance Sampling and Its Application to the Analysis of SRAM Designs in the Presence of Rare Failure Events," Proc. Design Automation Conference, pp. 69--72, 2006. Google ScholarDigital Library
- A. Singhee and R. A. Rutenbar, "From Finance to Flip Flops: A Study of Fast Quasi-Monte Carlo Methods from Computational Finance Applied to Statistical Circuit Analysis", Proc. ISQED, pp. 685--692, 2007. Google ScholarDigital Library
- A. Singhee, S. Singhal and R. A. Rutenbar, "Practical, fast Monte Carlo statistical static timing analysis: why and how," Proc. International Conference on Computer-Aided Design, pp. 190--195, 2008. Google ScholarDigital Library
- A. Singhee, S. Singhal and R. A. Rutenbar, "Exploiting Correlation Kernels for Efficient Handling of Intra-Die Spatial Correlation, with Application to Statistical Timing," Proc. Design, Automation and Test in Europe, pp. 856--861, 2008. Google ScholarDigital Library
- V. Veetil, D. Sylvester and D. Blaauw, "Efficient Monte Carlo based Incremental Statistical Timing Analysis," Proc. Design Automation Conference, pp. 676--681, 2008. Google ScholarDigital Library
- J. Jaffari, and M. Anis, "On efficient Monte Carlo-based statistical static timing analysis of digital circuits," Proc. International Conference on Computer-Aided Design, pp. 196--203, 2008. Google ScholarDigital Library
- V. Veetil, D. Sylvester, D. Blaauw, S. Shah, and S. Rochel, "Efficient Smart Sampling-Based Full-Chip Leakage Analysis for Intra-Die Variation Considering State Dependence," Design Automation Conference, 2009. Google ScholarDigital Library
- K. Gulati, S. P. Khatri, "Accelerating Statistical Timing Analysis Using Graphics Processing Units", Proc. ASP-DAC, pp. 260--265, 2009. Google ScholarDigital Library
- J. D. Owens, et al., "GPU Computing" Proceedings of the IEEE, vol. 96(5), pp. 879--899, 2008.Google ScholarCross Ref
- http://www.nvidia.com/object/cuda_home.html#Google Scholar
- J. Xiong et al, "Criticality computation in parameterized statistical timing", Proc. Design Automation Conference, pp. 63--68, 2006. Google ScholarDigital Library
- http://www.nvidia.com/object/product_tesla_s1070_us.htmlGoogle Scholar
Index Terms
- Efficient smart monte carlo based SSTA on graphics processing units with improved resource utilization
Recommendations
Algorithmic performance studies on graphics processing units
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear ...
An Improved Magma Gemm For Fermi Graphics Processing Units
We present an improved matrixâ matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify ...
Towards acceleration of fault simulation using graphics processing units
DAC '08: Proceedings of the 45th annual Design Automation ConferenceIn this paper, we explore the implementation of fault simulation on a Graphics Processing Unit (GPU). In particular, we implement a fault simulator that exploits thread level parallelism. Fault simulation is inherently parallelizable, and the large ...
Comments