skip to main content
10.1145/2935764.2935767acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Public Access

Parallel Algorithms for Asymmetric Read-Write Costs

Published:11 July 2016Publication History

ABSTRACT

Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efficiently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced write, work-efficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadth-first search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly work-efficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.

References

  1. HP, SanDisk partner on memristor, ReRAM technology. http://www.bit-tech.net/news/hardware/2015/10/09/hp-sandisk-reram-memristor, Oct. 2015.Google ScholarGoogle Scholar
  2. Intel and Micron produce breakthrough memory technology. http://newsroom.intel.com/community/intel_newsroom/blog/2015/07/28/intel-and-micron-produce-breakthrough-memory-technology, July 2015.Google ScholarGoogle Scholar
  3. A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. Onyx: A prototype phase change memory storage array. In HotStorage, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Alon and B. Schieber. Optimal preprocessing for answering on-line product queries. Technical Report, Tel Aviv University, 1987.Google ScholarGoogle Scholar
  5. R. J. Anderson and G. L. Miller. A simple randomized parallel algorithm for list-ranking. Inf. Proc. Letters, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. M. Andrew. Another efficient algorithm for convex hulls in two dimensions. Inf. Proc. Letters, 1979.Google ScholarGoogle Scholar
  7. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Athanassoulis, B. Bhattacharjee, M. Canim, and K. A. Ross. Path processing using solid state storage. In ADMS, 2012.Google ScholarGoogle Scholar
  9. A. Ben-Aroya and S. Toledo. Competitive analysis of flash-memory algorithms. In ESA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. E. Blelloch. Programming parallel algorithms. Commun. ACM, 39, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, and J. Shun. Efficient algorithms under asymmetric read and write costs. arXiv preprint arXiv:1511.01038, 2015.Google ScholarGoogle Scholar
  12. G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, and J. Shun. Sorting with asymmetric read and write costs. In SPAA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. P. Brent. The parallel evaluation of general arithmetic expressions. JACM, 21(2), 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simahdri. Write-avoiding algorithms. In IPDPS, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. M. Chan. Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete & Computational Geometry, 16(4), 1996.Google ScholarGoogle Scholar
  17. T. M. Chan. Backwards analysis of the Karger-Klein-Tarjan algorithm for minimum spanning trees. Inf. Proc. Letters, 67(6), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Chen, P. B. Gibbons, and S. Nath. Rethinking database algorithms for phase change memory. In CIDR, 2011.Google ScholarGoogle Scholar
  19. S. Cho and H. Lee. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Cole, P. N. Klein, and R. E. Tarjan. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In SPAA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Dixon, M. Rauch, and R. E. Tarjan. Verification and sensitivity analysis of minimum spanning trees in linear time. SIAM J. on Computing, 21(6), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Dong, N. P. Jouupi, and Y. Xie. PCRAMsim: System-level performance, energy, and area modeling for phase-change RAM. In ICCAD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In DAC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Eppstein, M. T. Goodrich, M. Mitzenmacher, and P. Pszona. Wear minimization for cuckoo hashing: How not to throw a lot of eggs into one basket. In SEA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Gal and S. Toledo. Algorithms and data structures for flash memories. ACM Computing Surveys, 37(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Gazit, G. L. Miller, and S.-H. Teng. Optimal tree contraction in the EREW model. Springer, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. R. Ghouse and M. T. Goodrich. Fast randomized parallel methods for planar convex hull construction. Computational Geometry, 7(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Gil, Y. Matias, and U. Vishkin. Towards a theory of nearly constant time parallel algorithms. In FOCS, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. T. Goodrich. Finding the convex hull of a sorted point set in parallel. Inf. Proc. Letters, 26(4), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. L. Graham. An efficient algorithm for determining the convex hull of a finite planar set. Inf. Proc. Letters, 1972.Google ScholarGoogle Scholar
  31. T. Hagerup. An even simpler linear-time algorithm for verifying minimum spanning trees. In Graph-Theoretic Concepts in Computer Science. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, S. Gu, and E. Sha. Scheduling to optimize cache utilization for non-volatile main memories. IEEE Transactions on Computers, 63(8), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. www.slideshare.net/IBMZRL/theseus-pss-nvmw2014, 2014.Google ScholarGoogle Scholar
  34. J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. R. Karger, P. N. Klein, and R. E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. JACM, 42(2), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Kim, S. Seshadri, C. L. Dickey, and L. Chu. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. In FAST, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. V. King. A simpler minimum spanning tree verification algorithm. Algorithmica, 18(2), 1997.Google ScholarGoogle Scholar
  38. D. G. Kirkpatrick and R. Seidel. The ultimate planar convex hull algorithm? SIAM J. on Computing, 15(1), 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting phase change memory as a scalable DRAM alternative. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. S. Meena, S. M. Sze, U. Chand, and T.-Y. Tseng. Overview of emerging nonvolatile memory technologies. Nanoscale Research Letters, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  41. N. Megiddo. Linear programming in linear time when the dimension is fixed. JACM, 31(1), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. Miller and J. Reif. Parallel tree contraction and its application. In FOCS, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Miller and J. Reif. Parallel tree contraction part 2: Further applications. SIAM J. on Computing, 20(6), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. VLDB J., 19(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. H. Park and K. Shim. FAST: flash-aware external sorting for mobile database systems. J. of Systems and Software, 82(8), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. K. Qureshi, S. Gurumurthi, and B. Rajendran. Phase Change Memory: From Devices to Systems. Morgan & Claypool, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. R. Seidel. Small-dimensional linear programming and convex hulls made easy. Discrete & Computational Geometry, 6(3), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Shun, Y. Gu, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In SODA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R. E. Tarjan and U. Vishkin. Finding biconnected componemts and computing tree functions in logarithmic parallel time. In FOCS, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. D. Viglas. Adapting the B+-tree for asymmetric I/O. In ADBIS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. D. Viglas. Write-limited sorts and joins for persistent memory. Proc. VLDB Endowment, 7(5), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Xu, X. Dong, N. P. Jouppi, and Y. Xie. Design implications of memristor-based RRAM cross-point structures. In DATE, 2011.Google ScholarGoogle Scholar
  53. B.-D. Yang, J.-E. Lee, J.-S. Kim, J. Cho, S.-Y. Lee, and B.-G. Yu. A low power phase-change random access memory using a data-comparison write scheme. In ISCAS, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  54. Yole Developpement. Emerging non-volatile memory technologies, 2013.Google ScholarGoogle Scholar
  55. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. A durable and energy efficient main memory using phase change memory technology. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. O. Zilberberg, S. Weiss, and S. Toledo. Phase-change memory: An architectural perspective. ACM Computing Surveys, 45(3), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel Algorithms for Asymmetric Read-Write Costs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader