skip to main content
article
Open access

Forma: A framework for safe automatic array reshaping

Published: 01 November 2007 Publication History

Abstract

This article presents Forma, a practical, safe, and automatic data reshaping framework that reorganizes arrays to improve data locality. Forma splits large aggregated data-types into smaller ones to improve data locality. Arrays of these large data types are then replaced by multiple arrays of the smaller types. These new arrays form natural data streams that have smaller memory footprints, better locality, and are more suitable for hardware stream prefetching. Forma consists of a field-sensitive alias analyzer, a data type checker, a portable structure reshaping planner, and an array reshaper. An extensive experimental study compares different data reshaping strategies in two dimensions: (1) how the data structure is split into smaller ones (maximal partition × frequency-based partition × affinity-based partition); and (2) how partitioned arrays are linked to preserve program semantics (address arithmetic-based reshaping × pointer-based reshaping). This study exposes important characteristics of array reshaping. First, a practical data reshaper needs not only an inter-procedural analysis but also a data-type checker to make sure that array reshaping is safe. Second, the performance improvement due to array reshaping can be dramatic: standard benchmarks can run up to 2.1 times faster after array reshaping. Array reshaping may also result in some performance degradation for certain benchmarks. An extensive micro-architecture-level performance study identifies the causes for this degradation. Third, the seemingly naive maximal partition achieves best or close-to-best performance in the benchmarks studied. This article presents an analysis that explains this surprising result. Finally, address-arithmetic-based reshaping always performs better than its pointer-based counterpart.

References

[1]
Al-Sukhni, H., Bratt, I., and Connors, D. A. 2003. Compiler-directed content-aware prefetching for dynamic data structures. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT) (New Orleans, LA). 91--100.
[2]
Allen, J. R. and Kennedy, K. 1984. Automatic loop interchange. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Montreal, QC, Canada). ACM, New York, 233--246.
[3]
Anderson, J. M., Amarasinghe, S. P., and Lam, M. S. 1995. Data and computation transformations for multiprocessors. In Proceedings of the Symposium of Principles and Practice of Parallel Programming (PPoPP) (Santa Barbara, CA). 166--178.
[4]
Badawy, A.-H., Aggarwal, A., Yeung, D., and Tseng, C.-W. 2001. Evaluating the impact of memory system performance on software prefetching and locality optimizations. In Proceedings of the 2001 International Conference on Supercomputing (ICS'01) (Sorrento, Italy). 486--500.
[5]
Chase, D. R., Wegman, M., and Zadeck, F. K. 1990. Analysis of pointers and structures. In Proceedings of the Programming Language Design and Implementation (PLDI) (White Plains, NY). ACM, New York, 296--310.
[6]
Chilimbi, T. M., Davidson, B., and Laurus, J. R. 1999. Cache-conscious structure definition. In Proceedings of the Programming Language Design and Implementation (PLDI) (Atlanta, GA). ACM, New York, 13--24.
[7]
Condit, J., Harren, M., McPeak, S., Necula, G. C., and Weimer, W. 2003. Cured in the real world. In Proceedings of the Programming Language Design and Implementation (PLDI) (San Diego, CA). ACM, New York, 232--244.
[8]
Ding, C. and Kennedy, K. 1999. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the Programming Language Design and Implementation (PLDI) (Atlanta, GA). ACM, New York, 229--241.
[9]
Franz, M. and Kistler, T. 1998. Splitting data objects to increase cache utilization. Tech. Report ICS-TR-98-34, Dept. of Information and Computer Science, Univ. of California, Irvine, Irvine, CA, Oct.
[10]
Hind, M. and Pioli, A. 2000. Which pointer analysis should I use? In Proceedings of the International Symposium on Software Testing and Analysis (Portland, OR). 113--123.
[11]
Holte, R. C., Mkadmi, T., Zimmer, R. M., and MacDonald, A. J. 1996. Speeding up problem solving by abstraction: A graph oriented approach. Artif. Intell. 85, 1--2, 321--361.
[12]
Hsu, C. and Kremer, U. 2000. A stable and efficient loop tiling algorithm. In Proceedings of the Mid-Atlantic Student Workshop on Programming Languages and Systems (Newark, DE).
[13]
IBM. 2001. The Power4® Processor Introduction and Tuning Guide. IBM Corp. International Technical Support Organization, http://www.redbooks.ibm.com/.
[14]
Intel. 2002. Intel® Itanium® Architecture Software Developer's Manual. Intel Corporation.
[15]
Ishizaka, K., Obata, M., and Kasahara, H. 2003. Cache optimization for coarse grain task parallel processing using inter-array padding. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC) (College Station, TX). 64--76.
[16]
ISO/IEC. 1990. Programming Languages - C. 1st Edition. International Standard ISO/IEC 9899.
[17]
Karlsson, M., Dahlgren, F., and Stenstrom, P. 2000. A prefetching technique for irregular accesses to linked data structures. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (Toulouse, France). 206--217.
[18]
Kennedy, K. 2000. Fast greedy weighted fusion. In Proceedings of the 14th International Conference on Supercomputing. Santa Fe, NM, 131--140.
[19]
Kodukula, I., Ahmed, N., and Pingali, K. 1997. Data-centric multi-level blocking. In Proceedings of the Programming Language Design and Implementation (PLDI) (Las Vegas, NV). ACM, New York, 346--357.
[20]
Lattner, C., and Adve, V. 2002. Automatic pool allocation for disjoint data structures. In ACM SIGPLAN Workshop on Memory System Performance (Berlin, Germany). ACM, New York, 13--24.
[21]
Luk, C. K. 2000. Optimizing the cache performance of non-numeric applications. Ph.D. dissertation, Dept. of Computer Science, Univ. of Toronto, Toronto, Ont., Canada.
[22]
Luk, C.-K. and Mowry, T. C. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Cambridge, MA). ACM, New York, 222--233.
[23]
McKinley, K. S., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Prog. Lang. Syst. 18, 4 (July), 424--453.
[24]
Necula, G. C., McPeak, S., and Weimer, W. 2002. CCured: Type-safe retrofitting of legacy code. In Proceedings of the Principles of Programming Languages (POPL) (Portland, OR). ACM, New York, 128--139.
[25]
Niewiadomski, R., Amaral, J. N., and Holte, R. 2003. Crafting data structures: A study of reference locality in refinement-based path finding. In Proceedings of the International Conference on High Performance Computing (Hyderabad, India). Springer-Verlag, New York, 438--448.
[26]
Niewiadomski, R., Amaral, J. N., and Holte, R. C. 2004. A performance study of data layout techniques for improving data locality in refinement-based pathfinding. ACM J. Exper. Algor. 9, 1.4, 1--28.
[27]
Palem, S., Rabbah, R., Mooney V. J., Korkmaz, P., and Puttaswamy, K. 2000. Design space optimization of embedded memory systems via data remapping. In Proceedings of the 2002 Joint Conference on Languages, Compilers, and Tools for Embedded Systems & Software and Compilers for Embedded Systems (LCTES'02-SCOPES'02) (Berlin, Germany). ACM, New York, 28--37.
[28]
Rabbah, R., and Palem, S. 2003. Data remapping for design space optimization of embedded memory systems. ACM Trans. Embed. Comput. Syst. 2, 2, 186--218.
[29]
Rivera, G. and Tseng, C.-W. 1998. Data transformations for eliminating conflict misses. In Proceedings of the Programming Language Design and Implementation (PLDI) (Montreal, Que., Canada). ACM, New York, 38--49.
[30]
Rivera, G. and Tseng, C.-W. 1999. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (Amsterdam, Netherlands). Springer-Verlag, New York, 168--182.
[31]
Roth, A., Moshovos, A., and Sohi, G. S. 1998. Dependence based prefetching for linked data structures. ACM SIG-PLAN Notices 33, 11, 115--126.
[32]
Ryder, B. G. 2003. Dimensions of precision in reference analysis of object-oriented programming languages. In Proceedings of the International Conference on Compiler Construction (Warsaw, Poland). Springer-Verlag, New York, 168--179.
[33]
Singhai, S. and McKinley, K. 1996. Loop fusion for data locality and parallelism. In Proceedings of the Mid-Atlantic Student Workshop on Programming Languages and Systems (New Paltz, NY).
[34]
Singhai, S., and McKinley, K. S. 1997. A parameterized loop fusion algorithm for improving parallelism and cache locality. Compute. J. 40, 6, 340--355.
[35]
Steensgaard, B. 1996a. Points-to analysis by type inference of programs with structures and unions. In Proceedings of the 6th International Conference on Compiler Construction (Linkoping, Sweden). Springer-Verlag, New York, 136--150.
[36]
Steensgaard, B. 1996b. Points-to analysis in almost linear time. In Proceedings of the Principles of Programming Languages (POPL) (St. Petersburg, FL). ACM, New York, 32--41.
[37]
Stoutchinin, A., Amaral, J. N., Gao, G. R., Dehnert, J., Jain, S., and Douillet, A. 2001. Speculative prefetching of induction pointers. In Proceedings of the International Conference on Compiler Construction 2001 (Genova, Italy). Springer-Verlag, New York, 289--303.
[38]
Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the Symposium Programming Language Design and Implementation (PLDI) (San Diego, CA). ACM, New York, 91--102.
[39]
VanderWiel, S. P. and Lilja, D. J. 2000. Data prefetch mechanisms. ACM Comput. Sur. 32, 2, 174--199.
[40]
Wolfe, M. 1987. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd Annual SIAM Conference on Parallel Processing for Scientific Computing (Reno, NV). SIAM, 357--361.
[41]
Wolfe, M. J. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA.
[42]
Yong, S. H., Horwitz, S., and Reps, T. 1999. Pointer analysis for programs with structures and casting. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI) (Atlanta, GA). ACM, New York, 91--103.
[43]
Zhong, Y., Orlovich, M., Shen, X., and Ding, C. 2004. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI) (Washington, DC). ACM, New York, 255--266.

Cited By

View all
  • (2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
  • (2024)Structure layout optimization based on GCCThird International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024)10.1117/12.3031287(261)Online publication date: 19-Jul-2024
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 30, Issue 1
November 2007
241 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/1290520
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2007
Published in TOPLAS Volume 30, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arrays
  2. data structure
  3. reference analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)22
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
  • (2024)Structure layout optimization based on GCCThird International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024)10.1117/12.3031287(261)Online publication date: 19-Jul-2024
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • (2019)A unifying abstraction for data structure splicingProceedings of the International Symposium on Memory Systems10.1145/3357526.3357548(173-183)Online publication date: 30-Sep-2019
  • (2018)Building Efficient Query Engines in a High-Level LanguageACM Transactions on Database Systems10.1145/318365343:1(1-45)Online publication date: 11-Apr-2018
  • (2018)Performance Optimization of Multithreaded 2D FFT on Multicore Processors: Challenges and Solution Approaches2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)10.1109/HiPCW.2018.8634318(8-17)Online publication date: Dec-2018
  • (2018)Performance Optimization of Multithreaded 2D Fast Fourier Transform on Multicore Processors Using Load Imbalancing Parallel Computing MethodIEEE Access10.1109/ACCESS.2018.28782716(64202-64224)Online publication date: 2018
  • (2018)Modular design of a factor-graph-based inference engine on a System-On-Chip (SoC)Microprocessors and Microsystems10.1016/j.micpro.2018.04.00260(53-64)Online publication date: Jul-2018
  • (2016)The hardness of data packingACM SIGPLAN Notices10.1145/2914770.283766951:1(232-242)Online publication date: 11-Jan-2016
  • (2016)StructSlim: a lightweight profiler to guide structure splittingProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854053(36-46)Online publication date: 29-Feb-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media