skip to main content
research-article
Open access

XARK: An extensible framework for automatic recognition of computational kernels

Published: 30 October 2008 Publication History

Abstract

The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.

References

[1]
Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, Reading, MA.
[2]
Allen, R. and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan-Kaufmann, San Francisco, CA.
[3]
Ammarguellat, Z. and Harrison, W. L. 1990. Automatic recognition of induction and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 283--295.
[4]
Andrade, D., Arenaz, M., Fraguela, B. B., Touriño, J., and Doallo, R. 2007. Automated and accurate cache behavior analysis for codes with irregular access patterns. Concur. Comput. Pract. Exper. 19, 18 (Dec.), 2407--2423.
[5]
Arenaz, M. 2003. Compiler framework for the automatic detection of loop-level parallelism. Ph.D. dissertation, Department of Electronics and Systems, University of A Coruña. (Available at http://www.des.udc.es/~arenaz/papers/phdthesis_arenaz.pdf.
[6]
Arenaz, M., Touriño, J., and Doallo, R. 2003. A GSA-based compiler infrastructure to extract parallelism from complex loops. In Proceedings of the 17th International Conference on Supercomputing. (San Francisco, CA). ACM, New York, 193--204.
[7]
Arenaz, M., Touriño, J., and Doallo, R. 2004. Compiler support for parallel code generation through kernel recognition. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (Santa Fe, NM). IEEE Computer Society Press, Los Alamitos, CA.
[8]
Ballance, R. A., MacCabe, A. B., and Ottenstein, K. J. 1990. The program dependence web: A representation supporting control, data, and demand-driven interpretation of imperative languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 257--271.
[9]
Bank, R. E. 2007. PLTMG package. Available at http://cam.ucsd.edu/~reb/software.html.
[10]
Berry, M., Chen, D., Koss, P., Kuck, D., Pointer, L., Lo, S., Pang, Y., Roloff, R., Sameh, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, K., Orzag, S., Seidl, F., Johnson, O., Swanson, G., Goodrum, R., and Martin, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Apps. 3, 3, 5--40.
[11]
Bhansali, S. and Hagemeister, J. R. 1995. A pattern matching approach for reusing software libraries in parallel systems. In Proceedings of Workshop on Knowledge Based Systems for the Reuse of Program Libraries (Sophia Anthipolis, France).
[12]
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D. A., Paek, Y., Pottenger, W. M., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Computer 29, 12 (Dec.), 78--82.
[13]
Callahan, D. 1991. Recognizing and parallelizing bounded recurrences. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA). Lecture Notes in Computer Science, vol. 589. Springer-Verlag, New York, 169--185.
[14]
Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct.), 451--490.
[15]
di Martino, B. and Iannello, G. 1996. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension (Berlin, Germany). IEEE Computer Society Press, Los Alamitos, CA, 164--174.
[16]
Fahringer, T. and Scholz, B. 2000. A unified symbolic evaluation framework for parallelizing compilers. IEEE Trans. Parallel Dist. Syst. 11, 11 (Nov.), 1105--1125.
[17]
Faigin, K. A., Weatherford, S. A., Hoeflinger, J. P., Padua, D. A., and Petersen, P. M. 1994. The Polaris internal representation. Int. J. Parall. Prog. 22, 5 (Oct.), 553--586.
[18]
Fisher, A. L. and Ghuloum, A. M. 1994. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Orlando, FL). ACM, New York, 135--146.
[19]
GCC Internals. GNU Compiler Collection Internals (GCC). Available at http://gcc.gnu. org/onlinedocs/gccint.pdf.
[20]
Gerlek, M. P., Stoltz, E., and Wolfe, M. 1995. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA. ACM Trans. Program. Lang. Syst. 17, 1 (Jan.), 85--122.
[21]
Haghighat, M. R. and Polychronopoulos, C. D. 1996. Symbolic analysis for parallelizing compilers. ACM Trans. Program. Lang. Syst. 18, 4 (July), 477--518.
[22]
Harandi, M. T. and Ning, J. Q. 1990. Knowledge-based program analysis. IEEE Softw. 7, 1, 74--81.
[23]
Jouvelot, P. and Dehbonei, B. 1989. A unified semantic approach for the vectorization and parallelization of generalized reductions. In Proceedings of the 3rd International Conference on Supercomputing (Heraklion, Crete). ACM, New York, 186--194.
[24]
Keßler, C. W. 1996. Pattern-driven automatic parallelization. Scient. Progr. 5, 3, 251--274.
[25]
Keßler, C. W. and Smith, C. 1999. The SPARAMAT approach to automatic comprehension of sparse matrix computations. In Proceedings of the 7th International Workshop on Program Comprehension (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 200--207.
[26]
Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, CA). ACM, New York, 107--120.
[27]
Knobe, K. and Sarkar, V. 2000. Enhanced parallelization via analyses and transformations on Array SSA form. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers (Aussois, France).
[28]
Kozaczynski, W., Ning, J. Q., and Engberts, A. 1992. Program concept recognition and transformation. IEEE Trans. Softw. Eng. 18, 12, 1065--1075.
[29]
Lin, Y. and Padua, D. A. 1998. On the automatic parallelization of sparse and irregular Fortran programs. In Proceedings of the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (Pittsburgh, PA). Lecture Notes in Computer Science, vol. 1511. Springer-Verlag, New York, 41--56.
[30]
Merrill, J. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the 2003 GCC Developers Summit. 171--180. (Available at http://www.gccsummit.org/2003.
[31]
Metzger, R. 1995. Automated recognition of parallel algorithms in scientific applications. In IJCAI-95 Workshop Program Working Notes: “The Next Generation of Plan Recognition Systems”.
[32]
Muchnick, S. S. 1997. Advanced Compiler Design and Implementation. Morgan-Kaufmann, San Francisco, CA.
[33]
Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Eng. 20, 6, 463--475.
[34]
Pinter, S. S. and Pinter, R. Y. 1994. Program optimization and parallelization using idioms. ACM Trans. Program. Lang. Syst. 16, 3 (May), 305--327.
[35]
Pottenger, W. M. and Eigenmann, R. 1995. Idiom recognition in the Polaris parallelizing compiler. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 444--448.
[36]
Redon, X. and Feautrier, P. 1993. Detection of recurrences in sequential programs with loops. In Proceedings of the 5th International Parallel Architectures and Languages Europe Conference (Munich, Germany). Lecture Notes in Computer Science, vol. 694. Springer-Verlag, New York, 132--145.
[37]
Saad, Y. 1994. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations (Version 2). (Available at http://www.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html.
[38]
Sabot, G. and Wholey, S. 1993. CMAX: A Fortran translator for the connection machine system. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan). ACM, New York, 147--156.
[39]
SPEC. SPEC CPU2000. Standard Performance Evaluation Corporation. Available at http://www.spec.org/cpu2000/.
[40]
Suganuma, T., Komatsu, H., and Nakatani, T. 1996. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th International Conference on Supercomputing (Philadelphia, PA). ACM, New York, 18--25.
[41]
Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (June), 146--160.
[42]
Tu, P. and Padua, D. A. 1995. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 414--423.
[43]
van Engelen, R. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the 10th International Conference on Compiler Construction (Genova, Italy). Lecture Notes in Computer Science, vol. 2027. Springer-Verlag, New York, 118--132.
[44]
van Engelen, R., Birch, J., Shou, Y., Walsh, B., and Gallivan, K. 2004. A unified framework for nonlinear dependence testing and symbolic analysis. In Proceedings of the 18th International Conference on Supercomputing (Saint Malo, France). ACM, New York, 106--115.
[45]
Wills, L. M. 1990. Automated program recognition: A feasibility demonstration. Artif. Intell. 45, 1-2, 113--171.
[46]
Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA.
[47]
Wu, P., Cohen, A., Hoeflinger, J., and Padua, D. A. 2001. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In Proceedings of the 15th International Conference on Supercomputing (Sorrento, Italy). ACM, New York, 78--91.
[48]
Zhang, F. and D'Hollander, E. H. 1994. Enhancing parallelism by removing cyclic data dependencies. In Proceedings of the 6th International Parallel Architectures and Languages Europe Conference (Athens, Greece). Lecture Notes in Computer Science, vol. 817. Springer-Verlag, New York, 387--397.
[49]
Zima, E. V. 1986. Automatic construction of systems of recurrence relations. USSR Comput. Math. Math. Phys. 24, 11-12, 193--197.
[50]
Zima, E. V. 1995. Simplification and optimization transformations of chains of recurrences. In Proceedings of the 8th International Symposium on Symbolic and Algebraic Computation (Montreal, Ont., Canada). ACM New York, 42--50.

Cited By

View all
  • (2023)Source Matching and Rewriting for MLIR Using String-Based AutomataACM Transactions on Architecture and Code Optimization10.1145/357128320:2(1-26)Online publication date: 1-Mar-2023
  • (2021)Progressive raising in multi-level IRProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370332(15-26)Online publication date: 27-Feb-2021
  • (2020)APMT: an automatic hardware counter-based performance modeling tool for HPC applicationsCCF Transactions on High Performance Computing10.1007/s42514-020-00035-8Online publication date: 24-Jun-2020
  • Show More Cited By

Index Terms

  1. XARK: An extensible framework for automatic recognition of computational kernels

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Programming Languages and Systems
    ACM Transactions on Programming Languages and Systems  Volume 30, Issue 6
    October 2008
    245 pages
    ISSN:0164-0925
    EISSN:1558-4593
    DOI:10.1145/1391956
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2008
    Accepted: 01 January 2008
    Revised: 01 July 2007
    Received: 01 December 2006
    Published in TOPLAS Volume 30, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automatic kernel recognition
    2. demand-driven algorithms
    3. gated single assignment
    4. strongly connected component
    5. symbolic analysis
    6. use-def chains

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Galician Government
    • FEDER funds of the European Union

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)82
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Source Matching and Rewriting for MLIR Using String-Based AutomataACM Transactions on Architecture and Code Optimization10.1145/357128320:2(1-26)Online publication date: 1-Mar-2023
    • (2021)Progressive raising in multi-level IRProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370332(15-26)Online publication date: 27-Feb-2021
    • (2020)APMT: an automatic hardware counter-based performance modeling tool for HPC applicationsCCF Transactions on High Performance Computing10.1007/s42514-020-00035-8Online publication date: 24-Jun-2020
    • (2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
    • (2019)Parallelware Tools: An Experimental Evaluation on POWER SystemsHigh Performance Computing10.1007/978-3-030-34356-9_27(352-360)Online publication date: 16-Jun-2019
    • (2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
    • (2018)PInT: Pattern Instrumentation Tool for Analyzing and Classifying HPC Applications2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)10.1109/LLVM-HPC.2018.8639275(71-80)Online publication date: Nov-2018
    • (2017)The Technological Roadmap of Parallware and Its Alignment with the OpenPOWER EcosystemHigh Performance Computing10.1007/978-3-319-67630-2_19(237-253)Online publication date: 20-Oct-2017
    • (2016)Automatic recognition of computational kernels for platform-dependent code optimizations2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818326(11-20)Online publication date: Jul-2016
    • (2016)A computational method for modeling arbitrary junctions employing different surface integral equation formulations for three-dimensional scattering and radiation problemsJournal of Electromagnetic Waves and Applications10.1080/09205071.2016.114059630:6(689-713)Online publication date: 16-Mar-2016
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media