ABSTRACT
What fundamental opportunities for scalability are latent in interfaces, such as system call APIs? Can scalability opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these questions this paper introduces the following rule: Whenever interface operations commute, they can be implemented in a way that scales. This rule aids developers in building more scalable software starting from interface design and carrying on through implementation, testing, and evaluation.
To help developers apply the rule, a new tool named Commuter accepts high-level interface models and generates tests of operations that commute and hence could scale. Using these tests, Commuter can evaluate the scalability of an implementation. We apply Commuter to 18 POSIX calls and use the results to guide the implementation of a new research operating system kernel called sv6. Linux scales for 68% of the 13,664 tests generated by Commuter for these calls, and Commuter finds many problems that have been observed to limit application scalability. sv6 scales for 99% of the tests.
Supplemental Material
- H. Attiya, E. Hillel, and A. Milani. Inherent limitations on disjoint-access parallel implementations of transactional memory. In Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, Calgary, Canada, August 2009. Google ScholarDigital Library
- H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. M. Michael, and M. Vechev. Laws of order: Expensive synchronization in concurrent algorithms cannot be eliminated. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages, Austin, TX, January 2011. Google ScholarDigital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, October 2009. Google ScholarDigital Library
- F. Bellard et al. QEMU. http://www.qemu.org/.Google Scholar
- D. J. Bernstein. Some thoughts on security after ten years of qmail 1.0. In Proceedings of the ACM Workshop on Computer Security Architecture, Fairfax, VA, November 2007. Google ScholarDigital Library
- P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys, 13(2):185--221, June 1981. Google ScholarDigital Library
- S. Boyd-Wickizer. Optimizing Communication Bottlenecks in Multiprocessor Operating System Kernels. PhD thesis, Massachusetts Institute of Technology, February 2014.Google Scholar
- S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, M. F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), San Diego, CA, December 2008. Google ScholarDigital Library
- S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, Canada, October 2010. Google ScholarDigital Library
- C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM Conference on Computer and Communications Security, 2006. Google ScholarDigital Library
- C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), San Diego, CA, December 2008. Google ScholarDigital Library
- B. Cantrill and J. Bonwick. Real-world concurrency. Communications of the ACM, 51(11):34--39, 2008. Google ScholarDigital Library
- K. Claessen and J. Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. In Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming, Montreal, Canada, September 2000. Google ScholarDigital Library
- A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Concurrent address spaces using RCU balanced trees. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, March 2012. Google ScholarDigital Library
- A. T. Clements, M. F. Kaashoek, and N. Zeldovich. RadixVM: Scalable address spaces for multithreaded applications. In Proceedings of the ACM EuroSys Conference, Prague, Czech Republic, April 2013. Google ScholarDigital Library
- J. Corbet. The search for fast, scalable counters, May 2010. http://lwn.net/Articles/170003/.Google Scholar
- J. Corbet. Dcache scalability and RCU-walk, April 23, 2012. http://lwn.net/Articles/419811/.Google Scholar
- R. Cox, M. F. Kaashoek, and R. T. Morris. Xv6, a simple Unix-like teaching operating system. http://pdos.csail.mit.edu/6.828/2012/xv6.html.Google Scholar
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Budapest, Hungary, March--April 2008. Google ScholarDigital Library
- DWARF Debugging Information Format Committee. DWARF debugging information format, version 4, June 2010.Google Scholar
- F. Ellen, Y. Lev, V. Luchango, and M. Moir. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Portland, OR, August 2007. Google ScholarDigital Library
- P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, IL, June 2005. Google ScholarDigital Library
- M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent transactional objects. In Proceedings of the 13th ACM Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, February 2008. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- D. Howells. Extended file stat functions, Linux patch. https://lkml.org/lkml/2010/7/14/539.Google Scholar
- A. Israeli and L. Rappoport. Disjoint-access-parallel implementations of strong shared memory primitives. In Proceedings of the 13th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Los Angeles, CA, August 1994. Google ScholarDigital Library
- P. Koopman, A. Alimarine, J. Tretmans, and R. Plasmeijer. Gast: Generic automated software testing. In Proceedings of the 14th International Workshop on the Implementation of Functional Languages, Madrid, Spain, September 2002. Google ScholarDigital Library
- C. Lameter. Effective synchronization on Linux/NUMA systems. In Gelato Conference, May 2005. http://www.lameter.com/gelato2005.pdf.Google Scholar
- P. E. McKenney. Differential profilng. Software: Practice and Experience, 29(3):219--234, 1999. Google ScholarDigital Library
- P. E. McKenney. Concurrent code and expensive instructions. https://lwn.net/Articles/423994/, January 2011.Google Scholar
- P. E. McKenney, D. Sarma, A. Arcangeli, A. Kleen, O. Krieger, and R. Russell. Read-copy update. In Proceedings of the Linux Symposium, Ottawa, Canada, June 2002.Google Scholar
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google ScholarDigital Library
- M. S. Papamarcos and J. H. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture, Ann Arbor, MI, June 1984. Google ScholarDigital Library
- P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation, San Jose, CA, June 2011. Google ScholarDigital Library
- M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Transactions on Programming Languages and Systems, 19(6):942--991, November 1997. Google ScholarDigital Library
- A. Roy, S. Hand, and T. Harris. Exploring the limits of disjoint access parallelism. In Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism, Berkeley, CA, March 2009. Google ScholarDigital Library
- K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lisbon, Portugal, September 2005. Google ScholarDigital Library
- M. Shapiro, N. Preguica, C. Baquero, and M. Zawirski. Conflict-free replicated data types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, Grenoble, France, October 2011. Google ScholarDigital Library
- M. Shapiro, N. Preguica, C. Baquero, and M. Zawirski. Convergent and commutative replicated data types. Bulletin of the EATCS, 104:67--88, June 2011.Google Scholar
- G. L. Steele, Jr. Making asynchronous parallelism safe for the world. In Proceedings of the 17th ACM Symposium on Principles of Programming Languages, San Francisco, CA, January 1990. Google ScholarDigital Library
- G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. SIGPLAN Notices, 46(11):79--88, June 2011. Google ScholarDigital Library
- W. E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12):1488--1505, December 1988. Google ScholarDigital Library
- D. Wentzlaff and A. Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating System Review, 43(2):76--85, 2009. Google ScholarDigital Library
Index Terms
- The scalable commutativity rule: designing scalable software for multicore processors
Recommendations
The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors
What opportunities for multicore scalability are latent in software interfaces, such as system call APIs? Can scalability challenges and opportunities be identified even before any implementation exists, simply by considering interface specifications? ...
The scalable commutativity rule: designing scalable software for multicore processors
Developing software that scales on multicore processors is an inexact science dominated by guesswork, measurement, and expensive cycles of redesign and reimplementation. Current approaches are workload-driven and, hence, can reveal scalability ...
Highly scalable and robust rule learner: performance evaluation and comparison
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all ...
Comments