ABSTRACT
Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efficiently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced write, work-efficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadth-first search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly work-efficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.
- HP, SanDisk partner on memristor, ReRAM technology. http://www.bit-tech.net/news/hardware/2015/10/09/hp-sandisk-reram-memristor, Oct. 2015.Google Scholar
- Intel and Micron produce breakthrough memory technology. http://newsroom.intel.com/community/intel_newsroom/blog/2015/07/28/intel-and-micron-produce-breakthrough-memory-technology, July 2015.Google Scholar
- A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. Onyx: A prototype phase change memory storage array. In HotStorage, 2011. Google ScholarDigital Library
- N. Alon and B. Schieber. Optimal preprocessing for answering on-line product queries. Technical Report, Tel Aviv University, 1987.Google Scholar
- R. J. Anderson and G. L. Miller. A simple randomized parallel algorithm for list-ranking. Inf. Proc. Letters, 1990. Google ScholarDigital Library
- A. M. Andrew. Another efficient algorithm for convex hulls in two dimensions. Inf. Proc. Letters, 1979.Google Scholar
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, 1998. Google ScholarDigital Library
- M. Athanassoulis, B. Bhattacharjee, M. Canim, and K. A. Ross. Path processing using solid state storage. In ADMS, 2012.Google Scholar
- A. Ben-Aroya and S. Toledo. Competitive analysis of flash-memory algorithms. In ESA, 2006. Google ScholarDigital Library
- G. E. Blelloch. Programming parallel algorithms. Commun. ACM, 39, 1996. Google ScholarDigital Library
- G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, and J. Shun. Efficient algorithms under asymmetric read and write costs. arXiv preprint arXiv:1511.01038, 2015.Google Scholar
- G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, and J. Shun. Sorting with asymmetric read and write costs. In SPAA, 2015. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999. Google ScholarDigital Library
- R. P. Brent. The parallel evaluation of general arithmetic expressions. JACM, 21(2), 1974. Google ScholarDigital Library
- E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simahdri. Write-avoiding algorithms. In IPDPS, 2016.Google ScholarCross Ref
- T. M. Chan. Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete & Computational Geometry, 16(4), 1996.Google Scholar
- T. M. Chan. Backwards analysis of the Karger-Klein-Tarjan algorithm for minimum spanning trees. Inf. Proc. Letters, 67(6), 1998. Google ScholarDigital Library
- S. Chen, P. B. Gibbons, and S. Nath. Rethinking database algorithms for phase change memory. In CIDR, 2011.Google Scholar
- S. Cho and H. Lee. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In MICRO, 2009. Google ScholarDigital Library
- R. Cole, P. N. Klein, and R. E. Tarjan. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In SPAA, 1996. Google ScholarDigital Library
- B. Dixon, M. Rauch, and R. E. Tarjan. Verification and sensitivity analysis of minimum spanning trees in linear time. SIAM J. on Computing, 21(6), 1992. Google ScholarDigital Library
- X. Dong, N. P. Jouupi, and Y. Xie. PCRAMsim: System-level performance, energy, and area modeling for phase-change RAM. In ICCAD, 2009. Google ScholarDigital Library
- X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In DAC, 2008. Google ScholarDigital Library
- D. Eppstein, M. T. Goodrich, M. Mitzenmacher, and P. Pszona. Wear minimization for cuckoo hashing: How not to throw a lot of eggs into one basket. In SEA, 2014. Google ScholarDigital Library
- E. Gal and S. Toledo. Algorithms and data structures for flash memories. ACM Computing Surveys, 37(2), 2005. Google ScholarDigital Library
- H. Gazit, G. L. Miller, and S.-H. Teng. Optimal tree contraction in the EREW model. Springer, 1988.Google ScholarCross Ref
- M. R. Ghouse and M. T. Goodrich. Fast randomized parallel methods for planar convex hull construction. Computational Geometry, 7(4), 1997. Google ScholarDigital Library
- J. Gil, Y. Matias, and U. Vishkin. Towards a theory of nearly constant time parallel algorithms. In FOCS, 1991. Google ScholarDigital Library
- M. T. Goodrich. Finding the convex hull of a sorted point set in parallel. Inf. Proc. Letters, 26(4), 1987. Google ScholarDigital Library
- R. L. Graham. An efficient algorithm for determining the convex hull of a finite planar set. Inf. Proc. Letters, 1972.Google Scholar
- T. Hagerup. An even simpler linear-time algorithm for verifying minimum spanning trees. In Graph-Theoretic Concepts in Computer Science. Springer, 2010. Google ScholarDigital Library
- J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, S. Gu, and E. Sha. Scheduling to optimize cache utilization for non-volatile main memories. IEEE Transactions on Computers, 63(8), 2014. Google ScholarDigital Library
- www.slideshare.net/IBMZRL/theseus-pss-nvmw2014, 2014.Google Scholar
- J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992. Google ScholarDigital Library
- D. R. Karger, P. N. Klein, and R. E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. JACM, 42(2), 1995. Google ScholarDigital Library
- H. Kim, S. Seshadri, C. L. Dickey, and L. Chu. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. In FAST, 2014. Google ScholarDigital Library
- V. King. A simpler minimum spanning tree verification algorithm. Algorithmica, 18(2), 1997.Google Scholar
- D. G. Kirkpatrick and R. Seidel. The ultimate planar convex hull algorithm? SIAM J. on Computing, 15(1), 1986. Google ScholarDigital Library
- B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting phase change memory as a scalable DRAM alternative. In ISCA, 2009. Google ScholarDigital Library
- J. S. Meena, S. M. Sze, U. Chand, and T.-Y. Tseng. Overview of emerging nonvolatile memory technologies. Nanoscale Research Letters, 2014.Google ScholarCross Ref
- N. Megiddo. Linear programming in linear time when the dimension is fixed. JACM, 31(1), 1984. Google ScholarDigital Library
- G. Miller and J. Reif. Parallel tree contraction and its application. In FOCS, 1985. Google ScholarDigital Library
- G. Miller and J. Reif. Parallel tree contraction part 2: Further applications. SIAM J. on Computing, 20(6), 1991. Google ScholarDigital Library
- S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. VLDB J., 19(1), 2010. Google ScholarDigital Library
- H. Park and K. Shim. FAST: flash-aware external sorting for mobile database systems. J. of Systems and Software, 82(8), 2009. Google ScholarDigital Library
- M. K. Qureshi, S. Gurumurthi, and B. Rajendran. Phase Change Memory: From Devices to Systems. Morgan & Claypool, 2012. Google ScholarDigital Library
- R. Seidel. Small-dimensional linear programming and convex hulls made easy. Discrete & Computational Geometry, 6(3), 1991. Google ScholarDigital Library
- J. Shun, Y. Gu, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In SODA, 2015. Google ScholarDigital Library
- R. E. Tarjan and U. Vishkin. Finding biconnected componemts and computing tree functions in logarithmic parallel time. In FOCS, 1984. Google ScholarDigital Library
- S. D. Viglas. Adapting the B+-tree for asymmetric I/O. In ADBIS, 2012. Google ScholarDigital Library
- S. D. Viglas. Write-limited sorts and joins for persistent memory. Proc. VLDB Endowment, 7(5), 2014. Google ScholarDigital Library
- C. Xu, X. Dong, N. P. Jouppi, and Y. Xie. Design implications of memristor-based RRAM cross-point structures. In DATE, 2011.Google Scholar
- B.-D. Yang, J.-E. Lee, J.-S. Kim, J. Cho, S.-Y. Lee, and B.-G. Yu. A low power phase-change random access memory using a data-comparison write scheme. In ISCAS, 2007.Google ScholarCross Ref
- Yole Developpement. Emerging non-volatile memory technologies, 2013.Google Scholar
- P. Zhou, B. Zhao, J. Yang, and Y. Zhang. A durable and energy efficient main memory using phase change memory technology. In ISCA, 2009. Google ScholarDigital Library
- O. Zilberberg, S. Weiss, and S. Toledo. Phase-change memory: An architectural perspective. ACM Computing Surveys, 45(3), 2013. Google ScholarDigital Library
Index Terms
Parallel Algorithms for Asymmetric Read-Write Costs
Recommendations
Sorting with Asymmetric Read and Write Costs
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesEmerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus on write-...
Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry
SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and ArchitecturesIn this paper, we design parallel write-efficient geometric algorithms that perform asymptotically fewer writes than standard algorithms for the same problem. This is motivated by emerging non-volatile memory technologies with read performance being ...
Software-Managed Read and Write Wear-Leveling for Non-Volatile Main Memory
In-memory wear-leveling has become an important research field for emerging non-volatile main memories over the past years. Many approaches in the literature perform wear-leveling by making use of special hardware. Since most non-volatile memories only ...
Comments