skip to main content
article

SmartApps: middle-ware for adaptive applications on reconfigurable platforms

Published: 01 April 2006 Publication History

Abstract

One general avenue to obtain optimized performance on large and complex systems is to approach optimization from a global perspective of the complete system in a customized manner for each application, i.e., application-centric optimization. Lately, there have been encouraging developments in reconfigurable operating systems and hardware that will enable customized optimization. For example, machines built with PIM's and FPGA's can be quickly reconfigured to better fit a certain application and operating systems, such as IBM's K42, can have their services customized to fit the needs and characteristics of an application. While progress in operating system and hardware and hardware has made re-configuration possible, we still need strategies and techniques to exploit them for improved application performance.In this paper, we describe the approach we are using in our smart application (SMARTAPPS) project. In the SMARTAPP executable, the compiler embeds most run-time system services and a feedback loop to monitor performance and trigger run-time adaptations. At run-time, after incorporating the code's input and determining the system's state, the SMARTAPP performs an instance specific optimization. During execution, the application continually monitors its performance and the available resources to determine if restructuring should occur. The framework includes mechanisms for performing the actual restructuring at various levels including: algorithmic adaptation, tuning reconfigurable OS services (scheduling policy, page size, etc.), and system configuration (e.g., number of processors). This paper concentrates on the techniques for providing customized system services for communication, thread scheduling, memory management, and performance monitoring and modeling.

References

[1]
The CHARM++ Programming Language Manual. http://charm.cs.uiuc.edu, 2000.]]
[2]
P. An, et al. STAPL: A standard template adaptive parallel C++ library. In Proc. of the Int. Workshop on Advanced Compiler Technology for High Performance and Embedded Processors, Bucharest, Romania, Jul. 2001.]]
[3]
J. Appavo, et. al. Experience with k42, an open source, linux-compatible, scalable operating-system kernel. IBM Syst. Journal, 44(2), 2005.]]
[4]
P. Beckman and D. Gannon. Tulip: A portable run-time system for object-parallel systems. In Int. Parallel Processing Symp., pp. 532--536, 1996.]]
[5]
G. Blelloch. NESL: A Nested Data-Parallel Language. Tech. Rep. CMU-CS-93-129, Carnegie Mellon Univ., April 1993.]]
[6]
E. Brewer. High-level optimization via automated statistical modeling. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pp. 80--91, 1995.]]
[7]
Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski. Multiple page size modeling and optimization. In Proc. Intern. Conf. Parallel Architecture and Compilation Techniques (PACT), 2005.]]
[8]
C. Chang, A. Sussman, and J. Saltz. Object-oriented runtime support for complex distributed data structures. Technical Report CR-TR-3438, University of Maryland, Department of Computer Science, 1995.]]
[9]
D. Culler, et. al. Parallel programming in Split-C. In Int. Conf. on Supercomputing, Nov. 1993.]]
[10]
F. Dang and L. Rauchwerger. Speculative parallelization of partially parallel loops. In Proc. of the 5th Int. Workshop, Languages, Compilers and Run-time Systems for Scalable Computing, May 2000.]]
[11]
E. Deelman, et. al. POEMS: End-to-end performance design of large parallel adaptive computational systems. In Proc. of the 1st Int. Workshop on Software and Performance, pp. 18--30, New York, Oct. 1998. ACM Press.]]
[12]
I. Foster, C. Kesselman, and S. Tuecke. The Nexus approach to integrating multithreading and communication. Journal of Parallel and Distributed Computing, 37(1):70--82, 1996.]]
[13]
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In ACM SIGPLAN Conf. on Programming Language Design and Implementation, 1998.]]
[14]
M. Govindaraju, et. al. Requirements for and evaluation of RMI protocols for scientific computing. In High Performance Networking and Computing Conf., pp. 76--102, 2000.]]
[15]
E. Johnson. Support for Parallel Generic Programming. PhD thesis, Indiana Univ., 1998.]]
[16]
E. Johnson and D. Gannon. HPC++: Experiments with the parallel standard library. In Int. Conf. on Supercomputing, 1997.]]
[17]
A. Jula and L. Rauchwerger. Defero memory allocator: A semantic driven memory allocator. Tech. Rep. TR-JR-05, Parasol Lab, Dept. of Computer Science, Texas A&M Univ., Nov. 2005.]]
[18]
L. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In Conf. on Object-Oriented Programming Systems, Languages and Applications, pp. 91--108, 1993.]]
[19]
L. Kale and S. Krishnan. Charm++: Parallel programming with message-driven objects. In Gregory Wilson and Paul Lu, editors, Parallel Programming using C++, pp. 175--213. Cambridge, MA: MIT Press, 1996.]]
[20]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw., 5(3):308--323, 1979.]]
[21]
X. Li, M. J. Garzaran, and D. Padua. A dynamically tuned sorting library. In Proc. of the Int. Symposium on Code Generation and Optimization, pp. 111--124, March 2004.]]
[22]
M. Olszewski and M. Voss. Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications, June 21--24, 2004. In Hamid R. Arabnia, editor, PDPTA. CSREA Press, 2004.]]
[23]
M. Puschl et al. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE special issue on Program Generation, Optimization, and Adaptation, 93(2):232--275, 2005.]]
[24]
L. Rauchwerger, N. Amato, and D. Padua. A scalable method for run-time loop parallelization. Int. J. Paral. Prog., 26(6):537--576, July 1995.]]
[25]
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. on Parallel and Distributed Systems, 10(2), 1999.]]
[26]
L. Rauchwerger and D. Padua. Parallelizing WHILE Loops for Multiprocessor Systems. In Proc. of 9th Int. Parallel Processing Symposium, April 1995.]]
[27]
J. Reynders. Pooma: A framework for scientific simulation on parallel architectures, 1996. In Wilson, G., Lu, P. (Eds.): Parallel Programming using C++. M.I.T. Press, pp. 553--594, 1996.]]
[28]
J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.]]
[29]
S. Rus and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. Int. Journal of Parallel Programming, 31(3):251--283, 2003.]]
[30]
T. Sheffler. A portable MPI-based parallel vector template library. Tech. Rep. RIACS-TR-95.04, Research Inst. for Advanced Computer Science, March 1995.]]
[31]
N. Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pp. 277--288, 2005.]]
[32]
D. Vallejo, C. V. Jones, and N. M. Amato. An adaptive framework for 'single shot' motion planning. In Proc. IEEE Int. Conf. Intel. Rob. Syst. (IROS), pp. 1722--1727, 2000.]]
[33]
T. von Eicken, D. Culler, S. Goldstein, and K. Schauser. Active messages: A mechanism for integrated communication and computation. In Int. Symp. on Computer Architecture, pp. 256--266, 1992.]]
[34]
R. Vuduc, J. Demmel, and J. Bilmes. Statistical models for empirical search-based performance tuning. Int. Journal of High Performance Computing Applications, 18(1):65--94, February 2004.]]
[35]
R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, Jan. 2001.]]
[36]
G. Wilson and P. Lu. Parallel Programming using C++. MIT Press, 1996.]]
[37]
H. Yu and L. Rauchwerger. Adaptive reduction parallelization techniques. In ICS '00: Proc. of the 14th Int. Conf. on Supercomputing, pp. 66--77, New York, NY, USA, 2000. ACM Press.]]
[38]
H. Yu, D. Zhang, and L. Rauchwerger. An adaptive algorithm selection framework. In Proc. of the Parallel Architecture and Compilation Techniques, 13th Int. Conf. on (PACT'04), pp. 278--289. IEEE Computer Society, 2004.]]
[39]
M. Morales, L. Tapia, R. Pearce, S. Rodriguez, and N. M. Amato. A machine learning approach for feature-sensitive motion planning. In Proc. Int. Workshop on Algorithmic Foundations of Robotics (WAFR), Utrecht/Zeist, The Netherlands, July 2004.]]
[40]
M. A. Morales et. al. C-space subdivision and integration in feature-sensitive motion planning. In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), April 2005.]]
[41]
R. Wisniewski, et. al. Performance and Environment Monitoring for Whole-System Characterization and Optimization in Proc. Conf. on Power/Performance interaction with Architecture, Circuits and Compilers 2004.]]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 40, Issue 2
April 2006
107 pages
ISSN:0163-5980
DOI:10.1145/1131322
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2006
Published in SIGOPS Volume 40, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 338
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media