ABSTRACT
The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU cores to finish as fast as possible, with visualization and analysis of the data performed after the simulation finished. With decreasing memory available per core, as well as the higher price (in power and time) for storing data on disk or sending it over the network, it now makes sense to run visualization and analytics applications in-situ, while the application is running. Visualization and analytics applications then need to sample the simulation memory with as little interference and as little changes in the simulation code as possible.
We propose an asynchronous memory sharing facility that allows consistent states of the memory to be shared between processes without any implicit or explicit synchronization. We distinguish two types of processes; a single producer and one or more observers. The producer modifies the state of the data, making available consistent versions of the state to any observer. The observers, working at different sampling rates, can access the latest available consistent state.
Some applications that would benefit from this type of facility include check-pointing applications, processes monitoring, unobtrusive process debugging, and the sharing of data for visualization or analytics. To evaluate our ideas we have developed two kernel-level implementations for sharing data asynchronously and we compared these implementations to a traditional user-space synchronized multi-buffer method.
We have seen improvements of up to 3.5x in our tests over the traditional multi-buffer method with 20% of the data pages touched.
- H. Akkan, M. Lang, and L. M. Liebrock. Stepping towards noiseless linux environment. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, pages 7:1--7:7, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using KSM. In Proceedings of the Linux Symposium, pages 19--28, 2009.Google Scholar
- A. Belay, A. Bittau, A. Mashtizadeh, D. Terei, D. Mazières, and C. Kozyrakis. Dune: Safe user-level access to privileged CPU features. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 335--348, Berkeley, CA, USA, 2012. Google ScholarDigital Library
- J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: distributed shared memory based on type-specific memory coherence. SIGPLAN Not., 25(3):168--176, Feb. 1990. Google ScholarDigital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, Aug. 1995. Google ScholarDigital Library
- J. Duell. The design and implementation of Berkeley Lab's Linux Checkpoint/Restart. Technical report, 2003.Google Scholar
- P. Emelyanov. Checkpoint restart in userspace. http://criu.org, 2013.Google Scholar
- A. Kulkarni, A. Lumsdaine, M. Lang, and L. Ionkov. Optimizing latency and throughput for spawning processes on massively multicore processors. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '12, pages 6:1--6:7, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12, april 2010.Google ScholarCross Ref
- J. Moran. SunOS virtual memory implementation. In Proceedings of the Spring 1988 European UNIX Users Group Conference, 1988.Google Scholar
- D. Orozco, E. Garcia, R. Pavel, R. Khan, and G. Gao. Tideflow: The time iterated dependency flow execution model. In Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing, DFM '11, pages 1--9, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
- J. C. Sancho, F. Petrini, G. Johnson, J. Fernández, and E. Frachtenberg. On the feasibility of incremental checkpointing for scientific computing. In IPDPS, 2004.Google Scholar
- P. Snyder. tmpfs: A virtual memory file system. In Proceedings of the Autumn 1990 EUUG Conference, pages 241--248, 1990.Google Scholar
- S. Zuckerman, J. Suetterlein, R. Knauerhase, and G. R. Gao. Using a "codelet" program execution model for exascale machines: position paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '11, pages 64--69, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Index Terms
- Transparently consistent asynchronous shared memory
Recommendations
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsReplacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsThe non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
A locality-improving dynamic memory allocator
MSP '05: Proceedings of the 2005 workshop on Memory system performanceIn general-purpose applications, most data is dynamically allocated. The memory manager therefore plays a crucial role in application performance by determining the spatial locality of heap objects. Previous general-purpose allocators have focused on ...
Comments