research-article

Introducing kernel-level page reuse for high performance computing

Authors:

Sébastien Valat,

William JalbyAuthors Info & Claims

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

Article No.: 3, Pages 1 - 9

https://doi.org/10.1145/2492408.2492414

Published: 16 June 2013 Publication History

Abstract

Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.

References

[1]

A. Arcangeli. Transparent Hugepage Support, KVM Forum http://www.linux-kvm.org/page/Kvm_Forum_2010, 2010.

[2]

E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proceedings of ASPLOS IX, 2000.

Digital Library

[3]

D. P. Bovet and M. C. Ph. Understanding the Linux Kernel. Third edition edition.

Digital Library

[4]

P. Carribault, M. Pérache, and H. Jourdren. Enabling low-overhead hybrid mpi/openmp parallelism with mpc. In Proceedings of IWOMP'10, 2010.

Digital Library

[5]

A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scalable address spaces using RCU balanced trees. In Proceedings of ASPLOS XVII (2012).

Digital Library

[6]

G. C. de Verdière. Hydrobench, https://github.com/HydroBench/Hydro.

[7]

J. Dongarra, P. Beckman, and al. The international exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1).

Digital Library

[8]

J. Evans. A Scalable Concurrent malloc(3) Implementation for FreeBSD http://www.canonware.com/jemalloc/, 2006.

[9]

M. Gorman and P. Healy. Performance characteristics of explicit superpage support. In Proceedings of ISCA'10, 2012.

Digital Library

[10]

H. Jourdren. HERA: A Hydrodynamic AMR Platform for Multi-Physics Simulations. In Adaptive Mesh Refinement - Theory and Applications, Lecture Notes in Computational Science and Engineering, 2005.

[11]

S. Kahan and P. Konecny. "MAMA!": a memory allocator for multithreaded architectures. In Proceedings of PPoPP '06, 2006.

Digital Library

[12]

P. Kaminski. Numa aware heap memory manager (amd).

[13]

M. M. Michael. Scalable lock-free dynamic memory allocation. In Proceedings of PLDI '04, 2004.

Digital Library

[14]

J. Navarro, S. Iyer, P. Druschel, and A. Cox. Practical, transparent operating system support for superpages. In Proceedings of OSDI '02, 2002.

Digital Library

[15]

M. Pérache, P. Carribault, and H. Jourdren. Mpc-mpi: An mpi implementation reducing the overall memory consumption. In Proceedings of European PVM/MPI '09, 2009.

Digital Library

[16]

M. Russinovich and D. A. Solomon. Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition. 2009.

Digital Library

[17]

P. M. Sanjay Ghemawat. Tcmalloc: Thread-caching malloc, http://goog-perftools.sourceforge.net/.

[18]

K. Yoshii, K. Iskra, H. Naik, P. Beckman, and P. C. Broekema. Performance and scalability evaluation of 'big memory' on blue gene linux. Int. J. High Perform. Comput. Appl., 25(2), May 2011.

Digital Library

Cited By

Marques SSerpa MMuñoz ARossi FLuizelli MNavaux PBeck ALorenzon A(2022)Optimizing the EDP of OpenMP applications via concurrency throttling and frequency boostingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102379123:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102379
Medeiros TPereira LRossi FLuizelli MBeck ALorenzon A(2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
https://doi.org/10.1016/j.jpdc.2021.05.006
Roussel ACarribault PJaeger J(2020)Preliminary Experience with OpenMP Memory Management ImplementationOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_20(313-327)Online publication date: 1-Sep-2020
https://doi.org/10.1007/978-3-030-58144-2_20
Show More Cited By

Index Terms

Introducing kernel-level page reuse for high performance computing
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

Shared memory is among the most common approaches to implementing message passing within multicorenodes. However, current shared memory techniques donot scale with increasing numbers of cores and expanding memory hierarchies--most notably when handling ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Fast local page-tables for virtualized NUMA servers with vMitosis
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Increasing heterogeneity in the memory system mandates careful data placement to hide the non-uniform memory access (NUMA) effects on applications. However, NUMA optimizations have predominantly focused on application data in the past decades, largely ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

June 2013

60 pages

ISBN:9781450321037

DOI:10.1145/2492408

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '13

Sponsor:

SIGPLAN

PLDI '13: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 16 - 19, 2013

Washington, Seattle

Acceptance Rates

Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
236
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marques SSerpa MMuñoz ARossi FLuizelli MNavaux PBeck ALorenzon A(2022)Optimizing the EDP of OpenMP applications via concurrency throttling and frequency boostingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102379123:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102379
Medeiros TPereira LRossi FLuizelli MBeck ALorenzon A(2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
https://doi.org/10.1016/j.jpdc.2021.05.006
Roussel ACarribault PJaeger J(2020)Preliminary Experience with OpenMP Memory Management ImplementationOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_20(313-327)Online publication date: 1-Sep-2020
https://doi.org/10.1007/978-3-030-58144-2_20
Valat SCharif-Rubial AJalby WJannesari ACastro PSato YMattson T(2017)MALT: a Malloc trackerProceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems10.1145/3141865.3141867(1-10)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3141865.3141867
Korb IKotthaus HMarwedel POssowski S(2016)mmapcopyProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851736(1832-1837)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1145/2851613.2851736
Kotthaus HKorb IEngel MMarwedel P(2014)Dynamic page sharing optimization for the R languageACM SIGPLAN Notices10.1145/2775052.266109450:2(79-90)Online publication date: 14-Oct-2014
https://dl.acm.org/doi/10.1145/2775052.2661094
Kotthaus HKorb IEngel MMarwedel PBlack ATratt L(2014)Dynamic page sharing optimization for the R languageProceedings of the 10th ACM Symposium on Dynamic languages10.1145/2661088.2661094(79-90)Online publication date: 20-Oct-2014
https://dl.acm.org/doi/10.1145/2661088.2661094

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten