skip to main content
10.1145/2492408.2492414acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
research-article

Introducing kernel-level page reuse for high performance computing

Published: 16 June 2013 Publication History

Abstract

Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.

References

[1]
A. Arcangeli. Transparent Hugepage Support, KVM Forum http://www.linux-kvm.org/page/Kvm_Forum_2010, 2010.
[2]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proceedings of ASPLOS IX, 2000.
[3]
D. P. Bovet and M. C. Ph. Understanding the Linux Kernel. Third edition edition.
[4]
P. Carribault, M. Pérache, and H. Jourdren. Enabling low-overhead hybrid mpi/openmp parallelism with mpc. In Proceedings of IWOMP'10, 2010.
[5]
A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scalable address spaces using RCU balanced trees. In Proceedings of ASPLOS XVII (2012).
[6]
G. C. de Verdière. Hydrobench, https://github.com/HydroBench/Hydro.
[7]
J. Dongarra, P. Beckman, and al. The international exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1).
[8]
J. Evans. A Scalable Concurrent malloc(3) Implementation for FreeBSD http://www.canonware.com/jemalloc/, 2006.
[9]
M. Gorman and P. Healy. Performance characteristics of explicit superpage support. In Proceedings of ISCA'10, 2012.
[10]
H. Jourdren. HERA: A Hydrodynamic AMR Platform for Multi-Physics Simulations. In Adaptive Mesh Refinement - Theory and Applications, Lecture Notes in Computational Science and Engineering, 2005.
[11]
S. Kahan and P. Konecny. "MAMA!": a memory allocator for multithreaded architectures. In Proceedings of PPoPP '06, 2006.
[12]
P. Kaminski. Numa aware heap memory manager (amd).
[13]
M. M. Michael. Scalable lock-free dynamic memory allocation. In Proceedings of PLDI '04, 2004.
[14]
J. Navarro, S. Iyer, P. Druschel, and A. Cox. Practical, transparent operating system support for superpages. In Proceedings of OSDI '02, 2002.
[15]
M. Pérache, P. Carribault, and H. Jourdren. Mpc-mpi: An mpi implementation reducing the overall memory consumption. In Proceedings of European PVM/MPI '09, 2009.
[16]
M. Russinovich and D. A. Solomon. Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition. 2009.
[17]
P. M. Sanjay Ghemawat. Tcmalloc: Thread-caching malloc, http://goog-perftools.sourceforge.net/.
[18]
K. Yoshii, K. Iskra, H. Naik, P. Beckman, and P. C. Broekema. Performance and scalability evaluation of 'big memory' on blue gene linux. Int. J. High Perform. Comput. Appl., 25(2), May 2011.

Cited By

View all
  • (2022)Optimizing the EDP of OpenMP applications via concurrency throttling and frequency boostingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102379123:COnline publication date: 1-Feb-2022
  • (2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
  • (2020)Preliminary Experience with OpenMP Memory Management ImplementationOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_20(313-327)Online publication date: 1-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
June 2013
60 pages
ISBN:9781450321037
DOI:10.1145/2492408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Linux
  2. NUMA
  3. kernel
  4. many-core
  5. memory allocator
  6. memory pool
  7. page fault
  8. parallel
  9. process
  10. zero page

Qualifiers

  • Research-article

Conference

PLDI '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Optimizing the EDP of OpenMP applications via concurrency throttling and frequency boostingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102379123:COnline publication date: 1-Feb-2022
  • (2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
  • (2020)Preliminary Experience with OpenMP Memory Management ImplementationOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_20(313-327)Online publication date: 1-Sep-2020
  • (2017)MALT: a Malloc trackerProceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems10.1145/3141865.3141867(1-10)Online publication date: 23-Oct-2017
  • (2016)mmapcopyProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851736(1832-1837)Online publication date: 4-Apr-2016
  • (2014)Dynamic page sharing optimization for the R languageACM SIGPLAN Notices10.1145/2775052.266109450:2(79-90)Online publication date: 14-Oct-2014
  • (2014)Dynamic page sharing optimization for the R languageProceedings of the 10th ACM Symposium on Dynamic languages10.1145/2661088.2661094(79-90)Online publication date: 20-Oct-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media