skip to main content
research-article

Optimizing data structures in high-level programs: new directions for extensible compilers based on staging

Published: 23 January 2013 Publication History

Abstract

High level data structures are a cornerstone of modern programming and at the same time stand in the way of compiler optimizations. In order to reason about user- or library-defined data structures compilers need to be extensible. Common mechanisms to extend compilers fall into two categories. Frontend macros, staging or partial evaluation systems can be used to programmatically remove abstraction and specialize programs before they enter the compiler. Alternatively, some compilers allow extending the internal workings by adding new transformation passes at different points in the compile chain or adding new intermediate representation (IR) types. None of these mechanisms alone is sufficient to handle the challenges posed by high level data structures. This paper shows a novel way to combine them to yield benefits that are greater than the sum of the parts.
Instead of using staging merely as a front end, we implement internal compiler passes using staging as well. These internal passes delegate back to program execution to construct the transformed IR. Staging is known to simplify program generation, and in the same way it can simplify program transformation. Defining a transformation as a staged IR interpreter is simpler than implementing a low-level IR to IR transformer. With custom IR nodes, many optimizations that are expressed as rewritings from IR nodes to staged program fragments can be combined into a single pass, mitigating phase ordering problems. Speculative rewriting can preserve optimistic assumptions around loops.
We demonstrate several powerful program optimizations using this architecture that are particularly geared towards data structures: a novel loop fusion and deforestation algorithm, array of struct to struct of array conversion, object flattening and code generation for heterogeneous parallel devices. We validate our approach using several non trivial case studies that exhibit order of magnitude speedups in experiments.

Supplementary Material

JPG File (r1d3_talk10.jpg)
MP4 File (r1d3_talk10.mp4)

References

[1]
S. Ackermann, V. Jovanovic, T. Rompf, and M. Odersky. Jet: An embedded dsl for high performance big data processing. BigData, 2012. http://infoscience.epfl.ch/record/181673/files/paper.pdf.
[2]
M. S. Ager, O. Danvy, and H. K. Rohde. Fast partial evaluation of pattern matching in strings. ACM Trans. Program. Lang. Syst., 28 (4): 696--714, 2006.
[3]
J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. OOPSLA, 2010.
[4]
M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program transformation with scoped dynamic rewrite rules. Fundam. Inf., 69: 123--178, July 2005.
[5]
K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. PACT, 2011.
[6]
J. A. Brzozowski. Derivatives of regular expressions. J. ACM, 11 (4): 481--494, 1964.
[7]
C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using asts, gensym, and reflection. In GPCE, 2003.
[8]
J. Carette, O. Kiselyov, and C. chieh Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program., 19 (5): 509--543, 2009.
[9]
C. Click and K. D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17: 181--196, March 1995.
[10]
C. Consel and O. Danvy. Partial evaluation of pattern matching in strings. Inf. Process. Lett., 30 (2): 79--86, 1989.
[11]
W. R. Cook, B. Delaware, T. Finsterbusch, A. Ibrahim, and B. Wiedermann. Model transformation by partial evaluation of model interpreters. Technical Report TR-09-09, UT Austin Department of Computer Science, 2008.
[12]
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP, 2007.
[13]
T. Ekman and G. Hedin. The jastadd system - modular extensible compiler construction. Sci. Comput. Program., 69 (1--3): 14--26, 2007.
[14]
C. Elliott, S. Finne, and O. de Moor. Compiling embedded languages. In W. Taha, editor, phSemantics, Applications, and Implementation of Program Generation, volume 1924 of Lecture Notes in Computer Science, pages 9--26. Springer Berlin / Heidelberg, 2000.
[15]
Y. Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12 (4): 381--391, 1999.
[16]
C. Grelck, K. Hinckfuß, and S.-B. Scholz. With-loop fusion for data locality and parallelism. IFL, 2006.
[17]
D. M. Groenewegen, Z. Hemel, L. C. L. Kats, and E. Visser. WebDSL: a domain-specific language for dynamic web applications. In OOPSLA Companion, 2008.
[18]
C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic embedding of DSLs. GPCE, 2008.
[19]
JetBrains. Meta Programming System, 2009. URL http://www.jetbrains.com/mps/.
[20]
N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.
[21]
S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS, 2008.
[22]
S. P. Jones, A. Tolmach, and T. Hoare. Playing by the rules: rewriting as a practical optimisation technique in ghc. Haskell, 2001.
[23]
S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, and T. J. Williams. Array design and expression evaluation in pooma ii. In ISCOPE, 1998.
[24]
L. C. L. Kats and E. Visser. The Spoofax language workbench. rules for declarative specification of languages and IDEs. In SPLASH/OOPSLA Companion, 2010.
[25]
R. Kelsey and P. Hudak. Realistic compilation by program transformation. In POPL, 1989.
[26]
K. Kennedy, B. Broom, A. Chauhan, R. Fowler, J. Garvin, C. Koelbel, C. McCosh, and J. Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93 (3): 387--408, 2005.
[27]
G. Kossakowski, N. Amin, T. Rompf, and M. Odersky. Javascript as an embedded dsl. In ECOOP, 2012.
[28]
H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31 (5): 42--53, 2011.
[29]
S. Lerner, D. Grove, and C. Chambers. Composing dataflow analyses and transformations. SIGPLAN Not., 37: 270--282, January 2002.
[30]
S. Lerner, T. D. Millstein, and C. Chambers. Automatically proving the correctness of compiler optimizations. In PLDI, 2003.
[31]
A. Møller. dk.brics.automaton -- finite-state automata and regular expressions for Java, 2010.texttthttp://www.brics.dk/automaton/.
[32]
A. Moors, T. Rompf, P. Haller, and M. Odersky. Scala-virtualized. PEPM, 2012.
[33]
N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for java. In CC, 2003.
[34]
N. Nystrom, D. White, and K. Das. Firepile: run-time compilation for gpus in scala. GPCE, 2011.
[35]
S. Owens, J. Reppy, and A. Turon. Regular-expression derivatives re-examined. J. Funct. Program., 19 (2): 173--190, Mar. 2009.
[36]
D. J. Quinlan, M. Schordan, Q. Yi, and A. Sæbjørnsen. Classification and utilization of abstractions for optimization. In ISoLA (Preliminary proceedings), 2004.
[37]
T. Rompf. phLightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012.
[38]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. GPCE, 2010.
[39]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55 (6): 121--130, 2012.
[40]
T. Rompf, I. Maier, and M. Odersky. Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. In ICFP, 2009.
[41]
T. Rompf, A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Odersky, and K. Olukotun. Building-blocks for performance oriented dsls. DSL, 2011.
[42]
A. Shali and W. R. Cook. Hybrid partial evaluation. OOPSLA, 2011.
[43]
M. Sperber and P. Thiemann. Realistic compilation by partial evaluation. In PLDI, 1996.
[44]
A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. ICML, 2011.
[45]
E. Sumii and N. Kobayashi. A hybrid approach to online and offline partial evaluation. Higher-Order and Symbolic Computation, 14 (2--3): 101--142, 2001.
[46]
W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248 (1--2): 211--242, 2000.
[47]
R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: a new approach to optimization. In POPL, 2009.
[48]
R. Tate, M. Stepp, and S. Lerner. Generating compiler optimizations from proofs. In POPL, 2010.
[49]
P. Thiemann and D. Dussart. Partial evaluation for higher-order languages with state. Technical report, 1999. URL http://www.informatik.uni-freiburg.de/ thiemann/papers/mlpe.ps.gz.
[50]
S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. PLDI'11, 2011.
[51]
T. L. Veldhuizen. Expression templates, C++gems. SIGS Publications, Inc., New York, NY, 1996.
[52]
T. L. Veldhuizen. Arrays in blitz. In ISCOPE, 1998.
[53]
T. L. Veldhuizen and J. G. Siek. Combining optimizations, combining theories. Technical report, Indiana University, 2008.
[54]
P. Wadler. Deforestation: Transforming programs to eliminate trees. Theor. Comput. Sci., 73 (2): 231--248, 1990.
[55]
P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad-hoc. In POPL, 1989.

Cited By

View all
  • (2024)Flan: An Expressive and Efficient Datalog Compiler for Program AnalysisProceedings of the ACM on Programming Languages10.1145/36329288:POPL(2577-2609)Online publication date: 5-Jan-2024
  • (2021)Reachability types: tracking aliasing and separation in higher-order functional programsProceedings of the ACM on Programming Languages10.1145/34855165:OOPSLA(1-32)Online publication date: 15-Oct-2021
  • (2017)Quoted staged rewriting: a practical approach to library-defined optimizationsACM SIGPLAN Notices10.1145/3170492.313604352:12(131-145)Online publication date: 23-Oct-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 48, Issue 1
POPL '13
January 2013
561 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2480359
Issue’s Table of Contents
  • cover image ACM Conferences
    POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
    January 2013
    586 pages
    ISBN:9781450318327
    DOI:10.1145/2429069
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2013
Published in SIGPLAN Volume 48, Issue 1

Check for updates

Author Tags

  1. code generation
  2. data structures
  3. extensible compilers
  4. staging

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Flan: An Expressive and Efficient Datalog Compiler for Program AnalysisProceedings of the ACM on Programming Languages10.1145/36329288:POPL(2577-2609)Online publication date: 5-Jan-2024
  • (2021)Reachability types: tracking aliasing and separation in higher-order functional programsProceedings of the ACM on Programming Languages10.1145/34855165:OOPSLA(1-32)Online publication date: 15-Oct-2021
  • (2017)Quoted staged rewriting: a practical approach to library-defined optimizationsACM SIGPLAN Notices10.1145/3170492.313604352:12(131-145)Online publication date: 23-Oct-2017
  • (2017)No!Proceedings of the 16th Workshop on Hot Topics in Operating Systems10.1145/3102980.3102995(88-93)Online publication date: 7-May-2017
  • (2016)Autotuning Runtime Specialization for Sparse Matrix-Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/285150013:1(1-26)Online publication date: 28-Mar-2016
  • (2016)JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data StructureIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2015.246733122:1(1025-1034)Online publication date: 31-Jan-2016
  • (2015)A Refactoring Library for Scala Compiler ExtensionsCompiler Construction10.1007/978-3-662-46663-6_2(31-48)Online publication date: 2015
  • (2014)TrioletACM SIGPLAN Notices10.1145/2692916.255526849:8(247-258)Online publication date: 6-Feb-2014
  • (2014)Fusing filters with integer linear programmingProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636235(53-62)Online publication date: 3-Sep-2014
  • (2014)Group communication patterns for high performance computing in scalaProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636229(75-85)Online publication date: 3-Sep-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media