skip to main content
10.1145/2491661.2481431acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Transparently consistent asynchronous shared memory

Published:10 June 2013Publication History

ABSTRACT

The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU cores to finish as fast as possible, with visualization and analysis of the data performed after the simulation finished. With decreasing memory available per core, as well as the higher price (in power and time) for storing data on disk or sending it over the network, it now makes sense to run visualization and analytics applications in-situ, while the application is running. Visualization and analytics applications then need to sample the simulation memory with as little interference and as little changes in the simulation code as possible.

We propose an asynchronous memory sharing facility that allows consistent states of the memory to be shared between processes without any implicit or explicit synchronization. We distinguish two types of processes; a single producer and one or more observers. The producer modifies the state of the data, making available consistent versions of the state to any observer. The observers, working at different sampling rates, can access the latest available consistent state.

Some applications that would benefit from this type of facility include check-pointing applications, processes monitoring, unobtrusive process debugging, and the sharing of data for visualization or analytics. To evaluate our ideas we have developed two kernel-level implementations for sharing data asynchronously and we compared these implementations to a traditional user-space synchronized multi-buffer method.

We have seen improvements of up to 3.5x in our tests over the traditional multi-buffer method with 20% of the data pages touched.

References

  1. H. Akkan, M. Lang, and L. M. Liebrock. Stepping towards noiseless linux environment. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, pages 7:1--7:7, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using KSM. In Proceedings of the Linux Symposium, pages 19--28, 2009.Google ScholarGoogle Scholar
  3. A. Belay, A. Bittau, A. Mashtizadeh, D. Terei, D. Mazières, and C. Kozyrakis. Dune: Safe user-level access to privileged CPU features. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 335--348, Berkeley, CA, USA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: distributed shared memory based on type-specific memory coherence. SIGPLAN Not., 25(3):168--176, Feb. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, Aug. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Duell. The design and implementation of Berkeley Lab's Linux Checkpoint/Restart. Technical report, 2003.Google ScholarGoogle Scholar
  7. P. Emelyanov. Checkpoint restart in userspace. http://criu.org, 2013.Google ScholarGoogle Scholar
  8. A. Kulkarni, A. Lumsdaine, M. Lang, and L. Ionkov. Optimizing latency and throughput for spawning processes on massively multicore processors. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, pages 6:1--6:7, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12, april 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Moran. SunOS virtual memory implementation. In Proceedings of the Spring 1988 European UNIX Users Group Conference, 1988.Google ScholarGoogle Scholar
  11. D. Orozco, E. Garcia, R. Pavel, R. Khan, and G. Gao. Tideflow: The time iterated dependency flow execution model. In Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing, DFM '11, pages 1--9, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. C. Sancho, F. Petrini, G. Johnson, J. Fernández, and E. Frachtenberg. On the feasibility of incremental checkpointing for scientific computing. In IPDPS, 2004.Google ScholarGoogle Scholar
  13. P. Snyder. tmpfs: A virtual memory file system. In Proceedings of the Autumn 1990 EUUG Conference, pages 241--248, 1990.Google ScholarGoogle Scholar
  14. S. Zuckerman, J. Suetterlein, R. Knauerhase, and G. R. Gao. Using a "codelet" program execution model for exascale machines: position paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '11, pages 64--69, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transparently consistent asynchronous shared memory

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
    June 2013
    75 pages
    ISBN:9781450321464
    DOI:10.1145/2491661

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 June 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    ROSS '13 Paper Acceptance Rate9of18submissions,50%Overall Acceptance Rate58of169submissions,34%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader