skip to main content
10.1145/1950365.1950376acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

RCDC: a relaxed consistency deterministic computer

Published: 05 March 2011 Publication History

Abstract

Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that enforce determinism in current hardware. Such work has incidentally imposed strong memory-ordering properties. Historically, memory ordering has been relaxed in favor of higher performance in shared memory multiprocessors and, interestingly, determinism exacerbates the cost of strong memory ordering. Consequently, we argue that relaxed memory ordering is vital to achieving faster deterministic execution.
This paper introduces RCDC, a deterministic multiprocessor architecture that takes advantage of relaxed memory orderings to provide high-performance deterministic execution with low hardware complexity. RCDC has two key innovations: a hybrid HW/SW approach to enforcing determinism; and a new deterministic execution strategy that leverages data-race-free-based memory models (e.g., the models for Java and C++) to improve performance and scalability without sacrificing determinism, even in the presence of races. In our hybrid HW/SW approach, the only hardware mechanisms required are software-controlled store buffering and support for precise instruction counting; we do not require speculation. A runtime system uses these mechanisms to enforce determinism for arbitrary programs.
We evaluate RCDC using PARSEC benchmarks and show that relaxing memory ordering leads to performance and scalability close to nondeterministic execution without requiring any form of speculation. We also compare our new execution strategy to one based on TSO (total-store-ordering) and show that some applications benefit significantly from the extra relaxation. We also evaluate a software-only implementation of our new deterministic execution strategy.

References

[1]
S. Adve and M. Hill. Weak Ordering -- A New Definition. In ISCA, 1990.
[2]
G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In SOSP, 2009.
[3]
A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010.
[4]
T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010.
[5]
T. Bergan, N. Hunt, L. Ceze, and S. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010.
[6]
E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.
[7]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT, 2008.
[8]
G. Blelloch. NESL: A Nested Data-Parallel Language (Version 3.1). Technical report, Carnegie Mellon University, Pittsburgh, PA, 2007.
[9]
R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009.
[10]
H.-J. Boehm and S. Adve. Foundations of the C++ Concurrency Memory Model. In PLDI, 2008.
[11]
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In ISCA, 2007.
[12]
M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Workshop on Declarative Aspects of Multicore Programming (DAMP), 2007.
[13]
H. Cui, J. Wu, C. che Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In OSDI, 2010.
[14]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009.
[15]
M. Dubois, J. Wang, L. Barroso, K. Lee, and Y. Chen. Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs. In Supercomputing, 1991.
[16]
S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005.
[17]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, 1990.
[18]
C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC ILP = RC? In ISCA, 1999.
[19]
D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011.
[20]
D. Hower and M. Hill. ReRun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, 2008.
[21]
C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005.
[22]
J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In POPL, 2005.
[23]
P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, 2008.
[24]
S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, 2006.
[25]
R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In PADD, 1993.
[26]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009.
[27]
J. Ousterhout. Scheduling Techniques for Concurrent Systems. In ICDCS, 1982.
[28]
S. Parka, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009.
[29]
M. Rinard and M. Lam. The Design, Implementation, and Evaluation of Jade. ACM TOPLAS, 20(3), 1988.
[30]
K. Russell and D. Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In OOPSLA, 2006.
[31]
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, 2002.
[32]
E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. E. Smith, and M. Valero. Implementing Kilo-Instruction Multiprocessors. In ICPS, July 2005.
[33]
C. von Praun, H. Cain, J. Choi, and K. Ryu. Conditional Memory Ordering. In ISCA, 2006.
[34]
T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for Store-wait-free Multiprocessors. In ISCA, 2007.
[35]
M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, 2003.
[36]
M. Xu, M. Hill, and R. Bodik. A Regulated Transitive Reduction for Longer Memory Race Recording. In ASPLOS, 2006.

Cited By

View all
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2021)STRABProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3442028(1532-1541)Online publication date: 22-Mar-2021
  • (2020)Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinismProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404464(1-12)Online publication date: 17-Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
March 2011
432 pages
ISBN:9781450302661
DOI:10.1145/1950365
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 3
    ASPLOS '11
    March 2011
    407 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1961296
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 39, Issue 1
    ASPLOS '11
    March 2011
    407 pages
    ISSN:0163-5964
    DOI:10.1145/1961295
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. determinism
  2. multicore
  3. parallel programming
  4. relaxed consistency

Qualifiers

  • Research-article

Conference

ASPLOS'11

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2021)STRABProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3442028(1532-1541)Online publication date: 22-Mar-2021
  • (2020)Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinismProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404464(1-12)Online publication date: 17-Aug-2020
  • (2020)Reproducible ContainersProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378519(167-182)Online publication date: 9-Mar-2020
  • (2020)Deterministic Atomic Buffering2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00083(981-995)Online publication date: Oct-2020
  • (2019)Extracting SIMD Parallelism from Recursive Task-Parallel ProgramsACM Transactions on Parallel Computing10.1145/33656636:4(1-37)Online publication date: 26-Dec-2019
  • (2019)HyperqueuesACM Transactions on Parallel Computing10.1145/33656606:4(1-35)Online publication date: 19-Nov-2019
  • (2019)Processor-Oblivious Record and ReplayACM Transactions on Parallel Computing10.1145/33656596:4(1-28)Online publication date: 17-Dec-2019
  • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
  • (2019)Introduction to the Special Issue on Qest 2017ACM Transactions on Modeling and Computer Simulation10.1145/336378429:4(1-2)Online publication date: 18-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media