skip to main content
10.1145/2989081.2989087acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Data-Centric Computing Frontiers: A Survey On Processing-In-Memory

Authors Info & Claims
Published:03 October 2016Publication History

ABSTRACT

A major shift from compute-centric to data-centric computing systems can be perceived, as novel big data workloads like cognitive computing and machine learning strongly enforce embarrassingly parallel and highly efficient processor architectures. With Moore's law having surrendered, innovative architectural concepts as well as technologies are urgently required, to enable a path for tackling exascale and beyond -- even though current computing systems face the inevitable instruction-level parallelism, power, memory, and bandwidth walls.

As part of any computing system, the general perception of memories depicts unreliability, power hungriness and slowness, resulting in a future prospective bottleneck. The latter being an outcome of a pin limitation derived by packaging constraints, an unexploited tremendous row bandwidth is determinable, which off-chip diminishes to a bare minimum. Building upon a shift towards data-centric computing systems, the near-memory processing concept seems to be most promising, since power efficiency and computing performance increase by co-locating tasks on bandwidth-rich in-memory processing units, whereas data motion mitigates by the avoidance of entire memory hierarchies. By considering the umbrella of near-data processing as the urgent required breakthrough for future computing systems, this survey presents its derivations with a special emphasis on Processing-In-Memory (PIM), highlighting historical achievements in technology as well as architecture while depicting its advantages and obstacles.

References

  1. The road to the amd "fiji" gpu. Taiwan, September 2015.Google ScholarGoogle Scholar
  2. T. Agerwala. Data centric systems: The next paradigm in computing. Parallel Processing (ICPP), 2014 43rd International Conference on, Sept 2014.Google ScholarGoogle Scholar
  3. J. Ahn, S. Yoo, O. Mutlu, and K. Choi. Pim-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA '15, pages 336--348, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Aimoto, T. Kimura, Y. Yabe, H. Heiuchi, Y. Nakazawa, M. Motomura, T. Koga, Y. Fujita, M. Hamada, T. Tanigawa, H. Nobusawa, and K. Koyama. A 7.68 gips 3.84 gb/s 1w parallel image processing ram integrating a 16 mb dram and 128 processors. In Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC., 1996 IEEE International, pages 372--373, Feb 1996.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. M. S. Aly, M. Gao, G. Hills, C.-S. Lee, G. Pitner, M. M. Shulaker, T. F. Wu, M. Asheghi, J. Bokor, F. Franchetti, K. E. Goodson, C. Kozyrakis, I. Markov, K. Olukotun, L. Pileggi, E. Pop, J. Rabaey, C. Re, H. S. P. Wong, and S. Mitra. Energy-efficient abundant-data computing: The n3xt 1,000x. Computer, 48(12):24--33, Dec 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Balasubramonian, J. Chang, T. Manning, J. H. Moreno, R. Murphy, R. Nair, and S. Swanson. Near-data processing: Insights from a micro-46 workshop. Micro, IEEE, 34(4):36--42, July 2014.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. F. Baumeister, H. Boettiger, J. R. Brunheroto, T. Hater, T. Maurer, A. Nobile, and D. Pleiter. High Performance Computing: 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings, chapter Accelerating LBM and LQCD Application Kernels by In-Memory Processing, pages 96--112. Springer International Publishing, 2015.Google ScholarGoogle Scholar
  8. O. A. R. Board. Openmp application programming interface. Technical report, Nov 2015.Google ScholarGoogle Scholar
  9. I. Bolsens. 2.5d ics: Just a stepping stone or a long term alternative to 3d? 2011.Google ScholarGoogle Scholar
  10. P. Bose. The power of communication - trends, challenges (and accounting issues). Discussion as NSF WETI Workshop, Feb 2012.Google ScholarGoogle Scholar
  11. J. B. Brockman, S. Thoziyoor, S. K. Kuntz, and P. M. Kogge. A low cost, multithreaded processing-in-memory system. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture, WMPI '04, pages 16--22, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. A. Burton, G. Schrom, F. Paillet, J. Douglas, W. J. Lambert, K. Radhakrishnan, and M. J. Hill. Fivr --- fully integrated voltage regulators on 4th generation intel® core™ socs. In Applied Power Electronics Conference and Exposition (APEC), 2014 Twenty-Ninth Annual IEEE, pages 432--439, March 2014.Google ScholarGoogle Scholar
  13. D. W. Chang, G. Byun, H. Kim, M. Ahn, S. Ryu, N. S. Kim, and M. Schulte. Reevaluating the latency claims of 3d stacked memories. In Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific, pages 657--662, Jan 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. L. Chua. Memristor-the missing circuit element. IEEE Transactions on Circuit Theory, 18(5):507--519, Sep 1971.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Coughlin. Crossing the chasm to new solid-state storage architectures. IEEE Consumer Electronics Magazine, 5(1):133--142, Jan 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-order commit processors. In Software, IEE Proceedings-, pages 48--59, Feb 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Cristal, O. J. Santana, F. Cazorla, M. Galluzzi, T. Ramirez, M. Pericas, and M. Valero. Kilo-instruction processors: overcoming the memory wall. Micro, IEEE, 25(3):48--57, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Y. David Patterson, Tom Anderson. A case for intelligent dram: Iram. Palo Alto CA., August 1996.Google ScholarGoogle Scholar
  19. G. Davidson, K. Boyack, R. Zacharski, S. Helmreich, and C. J.R. Data-centric computing with the netezza architecture. Technical report sand 2006-3640, Sandia National Laboratories, April 2006.Google ScholarGoogle Scholar
  20. W. R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. M. Sule, M. Steer, and P. D. Franzon. Demystifying 3d ics: the pros and cons of going vertical. Design Test of Computers, IEEE, 22(6):498--510, Nov 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. F. Deering, M. G. Lavelle, and S. A. Schlapp. A cached vram for 3d graphics. HotChips VI, 1994.Google ScholarGoogle Scholar
  22. M. F. Deering, S. A. Schlapp, and M. G. Lavelle. Fbram: A new form of memory optimized for 3d graphics. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '94, pages 167--174, New York, NY, USA, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. H. Dennard. Field-effect transistor memory, jun 4 1968. US Patent 3,387,286.Google ScholarGoogle Scholar
  24. M. Deo. Enabling next-generation platforms using altera's 3d system-in-package technology. Whitepaper, Altera, June 2015.Google ScholarGoogle Scholar
  25. M. Deo, J. Schulz, and L. Brown. Stratix 10 mx devices solve the memory bandwidth challenge. Whitepaper, Altera, now part of Intel, May 2016.Google ScholarGoogle Scholar
  26. P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 25(12):3088--3098, Dec 2014.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Easton. In-memory computing - next generation technologies. November 2013.Google ScholarGoogle Scholar
  28. R. Egawa, M. Sato, J. Tada, and H. Kobayashi. Vertically integrated processor and memory module design for vector supercomputers. In 3DIC, pages 1--6, 2013.Google ScholarGoogle Scholar
  29. D. G. Elliott, W. M. Snelgrove, and M. Stumm. Computational ram: A memory-simd hybrid and its application to dsp. In Custom Integrated Circuits Conference, 1992., Proceedings of the IEEE 1992, pages 30--6, May 1992.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. G. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. McKenzie. Computational ram: implementing processors in memory. Design Test of Computers, IEEE, 16(1):32--41, Jan 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Fang, L. Zhang, J. B. Carter, A. Ibrahim, and M. A. Parker. Active memory operations. In Proceedings of the 21st Annual International Conference on Supercomputing, ICS '07, pages 232--241, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Fang, L. Zhang, J. B. Carter, S. A. McKee, A. Ibrahim, M. A. Parker, and X. Jiang. Active memory controller. The Journal of Supercomputing, 62(1):510--549, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim. Drama: An architecture for accelerated processing near memory. Computer Architecture Letters, 14(1):26--29, Jan 2015.Google ScholarGoogle ScholarCross RefCross Ref
  34. B. G. Fitch, A. Rayshubskiy, M. C. Pitman, T. J. C. Ward, and R. S. Germain. Using the active storage fabrics model to address petascale storage challenges. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW '09, pages 47--54, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Fuchs and J. Poulton. Pixel-planes: A vlsi-oriented design for a raster graphics engine. In VLSI-DESIGN, 81(3), pages 20--28, 1981.Google ScholarGoogle Scholar
  36. M. Gao, G. Ayers, and C. Kozyrakis. Practical near-data processing for in-memory analytics frameworks. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation, PACT '15, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Gara. The long term impact of codesign. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 2212--2246, Nov 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Gokhale, B. Holmes, and K. Iobst. Processing in memory: The terasys massively parallel pim array. Computer, 28(4):23--31, Apr 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. H. Hajkazemi, M. K. Tavana, and H. Homayoun. Wide i/o or lpddr?: Exploration and analysis of performance, power and temperature trade-offs of emerging dram technologies in embedded mpsocs. In Computer Design (ICCD), 2015 33rd IEEE International Conference on, pages 62--69, Oct 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Hammarlund, A. J. Martinez, A. A. Bajwa, D. L. Hill, E. Hallnor, H. Jiang, M. Dixon, M. Derr, M. Hunsaker, R. Kumar, R. B. Osborne, R. Rajwar, R. Singhal, R. D'Sa, R. Chappell, S. Kaushik, S. Chennupaty, S. Jourdan, S. Gunther, T. Piazza, and T. Burton. Haswell: The fourth-generation intel core processor. Micro, IEEE, 34(2):6--20, Mar 2014.Google ScholarGoogle ScholarCross RefCross Ref
  41. R. A. Haring, M. Ohmacht, T. W. Fox, M. K. Gschwind, D. L. Satterfield, K. Sugavanam, P. W. Coteus, P. Heidelberger, M. A. Blumrich, R. W. Wisniewski, A. Gara, G. L. T. Chiu, P. A. Boyle, N. H. Chist, and C. Kim. The ibm blue gene/q compute chip. Micro, IEEE, 32(2):48--60, March 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. N. Hemsoth. The tiny chip that could disrupt exascale computing. The Next Platform, March 2015. http://www.nextplatform.com/2015/03/12/the-little-chip-that-could-disrupt-exascale-computing/.Google ScholarGoogle Scholar
  43. J. L. Hennessy and D. A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. L. Hennessy and D. A. Patterson. Computer Architecture, Fifth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Hruska. Beyond ddr4: The differences between wide i/o, hbm, and hybrid memory cube. Online, Jan 2015. ExtremTech.Google ScholarGoogle Scholar
  46. Hybrid Memory Cube Consortium. Hybrid memory cube specification 2.1. Technical report, 2014.Google ScholarGoogle Scholar
  47. Intel. 2015 annual report. Form 10-k, March 2016.Google ScholarGoogle Scholar
  48. Intel. Intel developer forum (idf16). Shenzhen, April 2016.Google ScholarGoogle Scholar
  49. ISSCC. Isscc 2014 trends. Technical report, 2014.Google ScholarGoogle Scholar
  50. J. Jeddeloh and B. Keeth. Hybrid memory cube new dram architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on, pages 87--88, June 2012.Google ScholarGoogle ScholarCross RefCross Ref
  51. K. Jothi, M. Sharafeddine, and H. Akkary. Simultaneous continual flow pipeline architecture. In Computer Design (ICCD), 2011 IEEE 29th International Conference on, pages 127--134, Oct 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Kagi, J. R. Goodman, and D. Burger. Memory bandwidth limitations of future microprocessors. In Computer Architecture, 1996 23rd Annual International Symposium on, pages 78--78, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Kahle. The cell processor architecture. In Microarchitecture, 2005. MICRO-38. Proceedings. 38th Annual IEEE/ACM International Symposium on, pages 3--3, Nov 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. Flexram: toward an advanced intelligent memory system. In Computer Design, 1999. (ICCD '99) International Conference on, pages 192--201, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. W. H. Kautz. Cellular logic-in-memory arrays. Computers, IEEE Transactions on, C-18(8):719--727, Aug 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. S. Kaxiras, R. Sugumar, and J. Schwarzmeier. Distributed vector architecture: Beyond a single vector-iram. In In First Workshop on Mixing Logic and DRAM: Chips that Compute and Remember, 1997.Google ScholarGoogle Scholar
  57. C. Keable. Data centric deep computing (dc2), 2012.Google ScholarGoogle Scholar
  58. M. J. Khurshid and M. Lipasti. Data compression for thermal mitigation in the hybrid memory cube. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 185--192, Oct 2013.Google ScholarGoogle ScholarCross RefCross Ref
  59. Y. Kim, T.-D. Han, S.-D. Kim, and S.-B. Yang. An effective memory-processor integrated architecture for computer vision. In Parallel Processing, 1997., Proceedings of the 1997 International Conference on, pages 266--269, Aug 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Kim and Y. H. Song. Analysis of thermal behavior for 3d integration of dram. In Consumer Electronics (ISCE 2014), The 18th IEEE International Symposium on, pages 1--2, June 2014.Google ScholarGoogle Scholar
  61. M. B. Kleiner, S. A. Kuhn, P. Ramm, and W. Weber. Performance improvement of the memory hierarchy of risc-systems by application of 3-d technology. Components, Packaging, and Manufacturing Technology, Part B: Advanced Packaging, IEEE Transactions on, 19(4):709--718, Nov 1996.Google ScholarGoogle Scholar
  62. G. Knittel and A. Schilling. Eliminating the z-buffer bottleneck. In European Design and Test Conference, 1995. ED TC 1995, Proceedings., pages 12--16, Mar 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. G. Knittel, A. Schilling, and W. Straßer. High Performance Computing for Computer Graphics and Visualisation: Proceedings of the International Workshop on High Performance Computing for Computer Graphics and Visualisation, Swansea 3-4 July 1995, chapter GRAMMY: High Performance Graphics Using Graphics Memories, pages 33--48. Springer London, London, 1996.Google ScholarGoogle Scholar
  64. P. Kogge. Exascale computing study: Technology challenges in achieving exascale systems. Technical Report TR-2008-13, University of Note Dame, 2008.Google ScholarGoogle Scholar
  65. P. M. Kogge. Execube-a new architecture for scaleable mpps. In Parallel Processing, 1994. Vol. 1. ICPP 1994. International Conference on, volume 1, pages 77--84, Aug 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. P. M. Kogge. Exploring the possible past futures of a single part type multi-core pim chip. In Innovative Architecture for Future Generation High Performance (IWIA), 2010 International Workshop on, pages 30--38, Jan 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. P. M. Kogge. Updating the Energy Model for Future Exascale Systems, chapter High Performance Computing: 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings, pages 323--339. Springer International Publishing, Cham, 2015.Google ScholarGoogle Scholar
  68. P. M. Kogge, J. B. Brockman, T. Sterling, and G. Gao. Processing in memory: Chips to petaflops. In In Workshop on Mixing Logic and DRAM: Chips that Compute and Remember at ISCA '97, 1997.Google ScholarGoogle Scholar
  69. P. M. Kogge, P. La Fratta, and M. Vance. Facing the exascale energy wall. In Innovative Architecture for Future Generation High Performance (IWIA), 2010 International Workshop on, pages 51--58, Jan 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. P. M. Kogge, T. Sunaga, H. Miyataka, K. Kitamura, and E. Retter. Combined dram and logic chip for massively parallel systems. In Advanced Research in VLSI, 1995. Proceedings., Sixteenth Conference on, pages 4--16, Mar 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. A. Kopser and D. Vollrath. Overview of the next generation cray xmt. In 53rd Cray User Group meeting, CUG 2011, Fairbanks, Alaska, 2011.Google ScholarGoogle Scholar
  72. A. Kugler, G. Knittel, A. G. Schilling, and W. Straßer. High-performance texture mapping architectures. In Proceedings of the 6th OMI Annual Conference on Embedded Microprocessor Systems, pages 189--198. IOS Press, sep 1996.Google ScholarGoogle Scholar
  73. G. Kyriazis. Heterogeneous system architecture: A technical review. Whitepaper, AMD, August 2012.Google ScholarGoogle Scholar
  74. E. S. Larsen and D. McAllister. Fast matrix multiplies using graphics hardware. In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, SC '01, pages 55--55, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, and et. al. A 1.2v 8gb 8-channel 128gb/s high-bandwidth memory (hbm) stacked dram with effective microbump i/o test methods using 29nm process and tsv. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International, pages 432--433, Feb 2014.Google ScholarGoogle ScholarCross RefCross Ref
  76. D. U. Lee, K. W. Kim, K. W. Kim, K. S. Lee, S. J. Byeon, J. H. Kim, J. H. Cho, J. Lee, and J. H. Chun. A 1.2v 8gb 8-channel 128gb/s high-bandwidth memory (hbm) stacked dram with effective i/o test circuits. Solid-State Circuits, IEEE Journal of, 50(1):191--203, Jan 2015.Google ScholarGoogle ScholarCross RefCross Ref
  77. C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari. Bridging the processor-memory performance gap with 3d ic technology. Design Test of Computers, IEEE, 22(6):556--564, Nov 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. G. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D. P. Zhang, and M. Ignatowski. A processing in memory taxonomy and a case for studying fixed-function pim. In WoNDP: 1st Workshop on Near-Data Processing in conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46), 2013.Google ScholarGoogle Scholar
  79. K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: a modular reconfigurable architecture. In Computer Architecture, 2000. Proceedings of the 27th International Symposium on, pages 161--171, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. M. Martonosi. Power-aware computing: Then, now, and into the future. 2014.Google ScholarGoogle Scholar
  81. J. Menon, L. De Carli, V. Thiruvengadam, K. Sankaralingam, and C. Estan. Memory processing units, 2014.Google ScholarGoogle Scholar
  82. Micron. 2016 analyst conference positioned for success, Feb 2016. http://files.shareholder.com/downloads/ABEA-45YXOQ/1517834575x0x875021/4BEAA02E-BBC2-402C-A51D-B3B2C6B8C3D4/Winter_Analyst_Day_2016.pdf.Google ScholarGoogle Scholar
  83. R. C. Minnick. A survey of microcellular research. J. ACM, 14(2):203--241, apr 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. R. C. Minnick, J. Goldberg, M. W. Green, W. H. Kautz, R. A. Short, H. S. Stone, and M. Yoeli. Cellular arrays for logic and storage. Final rept., Stanford Research Institute, Menlo Park, Calif., April 1966.Google ScholarGoogle Scholar
  85. M. Minutoli, S. Kuntz, A. Tumeo, and P. Kogge. Implementing radix sort on emu 1. In In the 3rd Workshop on Near-Data Processing (WoNDP), Waikiki, Hawaii, 2015.Google ScholarGoogle Scholar
  86. Mitsubishi, Electronic Device Group. 3d-ram: Frame buffer memory for high-performance 3d graphics. Data book, Mitsubishi, 1996.Google ScholarGoogle Scholar
  87. N. Miura, Y. Koizumi, E. Sasaki, Y. Take, H. Matsutani, T. Kuroda, H. Amano, R. Sakamoto, M. Namiki, K. Usami, M. Kondo, and H. Nakamura. A scalable 3d heterogeneous multi-core processor with inductive-coupling thruchip interface. In Cool Chips XVI (COOL Chips), 2013 IEEE, pages 1--3, April 2013.Google ScholarGoogle ScholarCross RefCross Ref
  88. G. E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE, 86(1):82--85, Jan 1998.Google ScholarGoogle ScholarCross RefCross Ref
  89. T. P. Morgan. Putting mroe brains in the network frees up compute. The Next Platform, June 2016. http://www.nextplatform.com/2016/06/08/putting-brains-network-frees-compute/.Google ScholarGoogle Scholar
  90. M. Motoyoshi. Through-silicon via (tsv). Proceedings of the IEEE, 97(1):43--48, Jan 2009.Google ScholarGoogle ScholarCross RefCross Ref
  91. C. Muller-Schloer, F. Geerinckx, and B. Stanford-Smit, editors. Embedded Microprocessor Systems. IOS Press, Amsterdam, The Netherlands, 1st edition, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O'Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. Active memory cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development, 59(2/3):17--1, March 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Nvidia. Nvidia tesla p100 - the most advanced datacenter accelerator ever built featuring pascal gp100, the world's fastest gpu. Whitepaper, 2016.Google ScholarGoogle Scholar
  94. A. Olofsson, T. Nordström, and Z. Ul-Abdin. Kickstarting high-performance energy-efficient manycore architectures with epiphany. In Signals, Systems and Computers, 2014 48th Asilomar Conference on, pages 1719--1726, Nov 2014.Google ScholarGoogle Scholar
  95. D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. Intelligent ram (iram): chips that remember and compute. In Solid-State Circuits Conference, 1997. Digest of Technical Papers. 43rd ISSCC., 1997 IEEE International, pages 224--225, Feb 1997.Google ScholarGoogle ScholarCross RefCross Ref
  96. D. Patterson, K. Asanovic, A. Brown, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, C. Kozyrakis, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft, and K. Yelick. Intelligent ram (iram): the industrial setting, applications, and architectures. In Computer Design: VLSI in Computers and Processors, 1997. ICCD '97. Proceedings., 1997 IEEE International Conference on, pages 2--7, Oct 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. D. A. Patterson. Latency lags bandwith. Commun. ACM, 47(10):71--75, oct 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. D. A. Patterson. Future of computer architecture. Berkeley EECS Annual Research Symposium (BEARS), Feb 2006.Google ScholarGoogle Scholar
  99. J. T. Pawlowski. Hybrid memory cube (hmc). HOT CHIPS 23, August 2011.Google ScholarGoogle ScholarCross RefCross Ref
  100. F. J. Pollack. New microarchitecture challenges in the coming generations of cmos process technologies (keynote address)(abstract only). In Proceedings of the 32Nd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 32, page 2, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. P. Ranganathan. From microprocessors to nanostores: Rethinking data-centric systems. Computer, 44(1):39--48, Jan 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. S. F. Reddaway. Dap - a distributed array processor. SIGARCH Comput. Archit. News, 2(4):61--65, dec 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle. Active disks for large-scale data processing. Computer, 34(6):68--74, jun 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. SIGARCH Comput. Archit. News, 37(3):371--382, jun 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. D. F. Rogers and R. Earnshaw. State of the Art in Computer Graphics: Visualization and Modeling. Springer Publishing Company, Incorporated, 1st edition, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. K. Sakuma, P. Andry, K. Sueoka, R. Horton, S. Wright, Y. Oyama, B. Webb, C. Patel, B. Dang, C. Tsang, E. Sprogis, R. Polastre, and J. Knickerbocker. Die cavity integration technology for through-silicon-vias stacking. San Diego, CA, Sept 2008.Google ScholarGoogle Scholar
  107. SanDisk. Sandisk and hp launch partnership to create memory-driven computing solutions, Oct 2015. https://www.sandisk.com/about/media-center/press-releases/2015/sandisk-and-hp-launch-partnership.Google ScholarGoogle Scholar
  108. A. Saulsbury, F. Pong, and A. Nowatzyk. Missing the memory wall: The case for processor/memory integration. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, ISCA '96, pages 90--101, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. M. Scrbak, M. Islam, K. Kavi, M. Ignatowski, and N. Jayasena. Processing-in-memory: Exploring the design space. In L. M. P. Pinho, W. Karl, A. Cohen, and U. Brinkschulte, editors, Architecture of Computing Systems -- ARCS 2015, volume 9017, chapter Lecture Notes in Computer Science, pages 43--54. Springer International Publishing, 2015.Google ScholarGoogle Scholar
  110. Seagate. Seagate demonstrates fastest-ever ssd flash drive. Press Release, March 2016. http://www.seagate.com/de/de/about-seagate/news/seagate-demonstrates-fastest-ever-ssd-flash-drive-pr/.Google ScholarGoogle Scholar
  111. T. Semiconductor. Tezzaron unveils 3d sram, January 2005.Google ScholarGoogle Scholar
  112. I. T. R. F. Semiconductors. Itrs 2.0 system integration whitepaper. Technical report, Dec 2014.Google ScholarGoogle Scholar
  113. G. Shainer. Intelligent networks: A new co-processor emerges. The Next Platform, March 2016. http://www.nextplatform.com/2016/03/02/intelligent-networks-a-new-co-processor-emerges/.Google ScholarGoogle Scholar
  114. J. M. Shalf and R. Leland. Computing beyond moore's law. Computer, 48(12):14--23, Dec 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. T. Shimizu, J. Korematu, M. Satou, H. Kondo, S. Iwata, K. Sawai, and et. al. A multimedia 32b risc microprocessor with 16mb dram. In Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC., 1996 IEEE International, pages 216--217, Feb 1996.Google ScholarGoogle Scholar
  116. P. Siegl, R. Buchty, and M. Berekovic. Revealing potential performance improvements by utilizing hybrid work-sharing for resource-intensive seismic applications. In Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE Computer Society, Mar 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. P. Stanley-Marbell, V. C. Cabezas, and R. P. Luijten. Pinned to the walls - impact of packaging and application properties on the memory and power walls. In Low Power Electronics and Design (ISLPED) 2011 International Symposium on, pages 51--56, Aug 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. W. J. Starke, J. Stuecheli, D. M. Daly, J. S. Dodson, F. Auernhammer, P. M. Sagmeister, G. L. Guthrie, C. F. Marino, M. Siegel, and B. Blaner. The cache and memory subsystems of the ibm power8 processor. IBM Journal of Research and Development, 59(1):3--1, Jan 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. H. S. Stone. A logic-in-memory computer. Computers, IEEE Transactions on, C-19(1):73--78, Jan 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. T. Thorolfsson, N. Moezzi-Madani, and P. D. Franzon. A low power 3d integrated fft engine using hypercube memory division. In ISLPED, ISLPED '09, pages 231--236, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. S. Thoziyoor, J. Brockman, and D. Rinzler. Pim lite: A multithreaded processor-in-memory prototype. In Proceedings of the 15th ACM Great Lakes Symposium on VLSI, GLSVLSI '05, pages 64--69, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. T. Trader. Mellanox touts arrival of intelligent interconnect. HPCwire, November 2015. http://www.hpcwire.com/2015/11/16/mellanox-touts-arrival-of-intelligent-interconnect/.Google ScholarGoogle Scholar
  123. J. von Neumann. First draft of a report on the edvac. Annals of the History of Computing, IEEE, 15(4):27--75, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. S. Vongehr and X. Meng. The missing memristor has not been found. In Nature Scientific Reports, volume 5. Macmillan Publishers Limited, 2015.Google ScholarGoogle Scholar
  125. M. M. WALDROP. More than moore. NATURE, 530:144--147, feb 2016.Google ScholarGoogle Scholar
  126. D. L. Weaver and T. Germond. The sparc architecture manual, version 9. Technical report, SPARC International, Inc., San Jose, California, 1994.Google ScholarGoogle Scholar
  127. S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, apr 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, mar 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Y. Xie. Future memory and interconnect technologies. In Design, Automation Test in Europe Conference Exhibition (DATE), 2013, pages 964--969, March 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. Top-pim: Throughput-oriented programmable processing in memory. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 85--98, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. D. P. Zhang, N. Jayasena, A. Lyashevsky, J. Greathouse, M. Meswani, M. Nutter, and M. Ignatowski. A new perspective on processing-in-memory architecture design. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC '13, pages 7--1, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Q. Zhu, B. Akin, H. E. Sumbul, F. Sadi, J. C. Hoe, L. Pileggi, and F. Franchetti. A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing. In 3DIC, pages 1--7, Oct 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
    October 2016
    463 pages
    ISBN:9781450343053
    DOI:10.1145/2989081

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 October 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader