skip to main content
research-article

Algorithm/Architecture Co-Design for Near-Memory Processing

Published:28 August 2018Publication History
Skip Abstract Section

Abstract

With mainstream technologies to couple logic tightly with memory on the horizon, near-memory processing has re-emerged as a promising approach to improving performance and energy for data-centric computing. DRAM, however, is primarily designed for density and low cost, with a rigid internal organization that favors coarse-grain streaming rather than byte-level random access. This paper makes the case that treating DRAM as a block-oriented streaming device yields significant efficiency and performance benefits, which motivate for algorithm/architecture co-design to favor streaming access patterns, even at the price of a higher order algorithmic complexity. We present the Mondrian Data Engine that drastically improves the runtime and energy efficiency of basic in-memory analytic operators, despite doing more work as compared to traditional CPU-optimized algorithms, which heavily rely on random accesses and deep cache hierarchies

References

  1. Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The Design and Implementation of Modern Column-Oriented Database Systems. Foundations and Trends in Databases 5, 3 (2013), 197-280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 105-117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42nd Annual International Symposium on Computer Archi- tecture (ISCA 2015). 336-348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd An- nual International Symposium on Computer Architecture (ISCA 2015). 131-143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. AMD. 2016. High Bandwidth Memory, Reinventing Memory Technology. (2016). Retrieved April 26, 2017 from http://www.amd.com/en-us/innovations/ software-technologies/hbm.Google ScholarGoogle Scholar
  6. ARM. 2017. Cortex-A35 Processor. (2017). Retrieved April 26, 2017 from https://www.arm.com/products/processors/cortex-a/cortex-a35-processor.php.Google ScholarGoogle Scholar
  7. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2012). 53-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multicore, Main-memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (Sept. 2013), 85-96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Ozsu. 2013. Multicore hash joins source code. (2013). Retrieved April 26, 2017 from https://www.systems.ethz.ch/node/334/.Google ScholarGoogle Scholar
  10. Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Mainmemory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proceedings of the 29th International Conference on Data Engineering, (ICDE 2013). 362-373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2011). 37-48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper- Pipelining Query Execution. In Preceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). 225-237. http://www.cidrdb. org/cidr2005/papers/P19.pdfGoogle ScholarGoogle Scholar
  13. John B. Carter, Wilson C. Hsieh, Leigh Stoller, Mark R. Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael A. Parker, Lambert Schaelicke, and Terry Tateyama. 1999. Impulse: Building a Smarter Memory Controller. In Proceedings of the 5th International Symposium on High- Performance Computer Architecture (HPCA 1999). 70-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012). 33-38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bill Dally. 2015. Keynote: Challenges for Future Computing Systems. (2015). Retrieved April 26, 2017 from https://www.cs.colostate.edu/~cs575dl/Sp2015/ Lectures/Dally2015.pdf.Google ScholarGoogle Scholar
  16. Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74-80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios N. Pnevmatikatos. 2017. The Mondrian Data Engine. In Proceedings of the 44th Annual International Sympo- sium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. 639-651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mario Drumond, Tao Lin, Martin Jaggi, and Babak Falsafi. 2018. Training DNNs with Hybrid Block Floating Point. CoRR abs/1804.01526 (2018). arXiv:1804.01526 http://arxiv.org/abs/1804.01526Google ScholarGoogle Scholar
  19. Hewlett-Packard Enterprise. 2015. The Machine: A new kind of computer. (2015). Retrieved April 26, 2017 from http://www.labs.hpe.com/research/themachine/.Google ScholarGoogle Scholar
  20. Babak Falsafi, Mircea Stan, Kevin Skadron, Nuwan Jayasena, Yunji Chen, Jinhua Tao, Ravi Nair, Jaime H. Moreno, Naveen Muralimanohar, Karthikeyan Sankaralingam, and Cristian Estan. 2016. Near-Memory Data Services. IEEE Micro 36, 1 (2016), 6-13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michael Ferdman, Almutaz Adileh, Yusuf Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012). 37-48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Apache Software Foundation. 2017. Apache Spark. (2017). Retrieved April 26, 2017 from http://spark.apache.org/.Google ScholarGoogle Scholar
  23. Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT 2015). 113-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In Proceedings of the 2016 International Symposium on High Performance Computer Architecture (HPCA 2016). 126-137.Google ScholarGoogle ScholarCross RefCross Ref
  25. Brian Gold, Anastassia Ailamaki, Larry Huston, and Babak Falsafi. 2005. Accelerating Database Operators Using a Network Processor. In Proceedings of the 1st International Workshop on Data Management on New Hardware (DaMoN '05). ACM, New York, NY, USA, Article 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting Signatures for Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 605-614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. 2011. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011). 401-412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Linley Gwennap. 2013. Qualcomm Krait 400 hits 2.3 GHz. Microprocessor report 27, 1 (January 2013), 1-6.Google ScholarGoogle Scholar
  29. Mary W. Hall, Peter M. Kogge, Jefferey G. Koller, Pedro C. Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John J. Granacki, Jay B. Brockman, Apoorv Srivastava, William C. Athas, Vincent W. Freeh, Jaewook Shin, and Joonseok Park. 1999. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In Proceedings of the ACM/IEEE Conference on Supercomputing, (SC 1999). 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA 2016). 243-254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4 (2011), 6-15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. IBM. 2017. IBM DB2. (2017). Retrieved April 26, 2017 from http://www.ibm. com/analytics/us/en/technology/db2/.Google ScholarGoogle Scholar
  33. Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 87-88.Google ScholarGoogle ScholarCross RefCross Ref
  34. JEDEC. 2013. Wide I/O 2 Standard. (2013). Retrieved April 26, 2017 from http://www.jedec.org/standards-documents/results/jesd229-2.Google ScholarGoogle Scholar
  35. JEDEC. 2015. High Bandwidth Memory (HBM) DRAM. (2015). Retrieved April 26, 2017 from https://www.jedec.org/standards-documents/docs/jesd235a.Google ScholarGoogle Scholar
  36. Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David M. Brooks. 2015. Profiling a warehousescale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 158-169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Vi Lam, Josep Torrellas, and Pratap Pattnaik. 1999. FlexRAM: Toward an Advanced Intelligent Memory System. In Proceedings of the IEEE International Conference On Computer Design, VLSI in Computers and Processors, (ICCD 1999). 192-201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proceedings of the VLDB Endowment 2, 2 (Aug. 2009), 1378-1389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yusuf Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin T. Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual Inter- national Symposium on Microarchitecture (MICRO 2013). 468-479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 1740-1750Google ScholarGoogle Scholar
  41. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-store 7 Years Later. Proceedings of the VLDB Endowment 5, 12 (Aug. 2012), 1790-1801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sheng Li, Ke Chen, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the 2011 International Conference on Computer-Aided Design (ICCAD 2011). 694-701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yousustain. Sustainability calculator. http://www.yousustain.com/footprintGoogle ScholarGoogle Scholar
  44. Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. 2002. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. Knowl. Data Eng. 14, 4 (2002), 709-730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mozhgan Mansuri, James E. Jaussi, Joseph T. Kennedy, Tzu-Chien Hsueh, Sudip Shekhar, Ganesh Balamurugan, Frank O'Mahony, Clark Roberts, Randy Mooney, and Bryan Casper. 2013. A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS. J. Solid-State Circuits 48, 12 (2013), 3229-3242.Google ScholarGoogle ScholarCross RefCross Ref
  46. MEMSQL. 2017. MEMSQL: The Fastest In-Memory Database. (2017). Retrieved April 26, 2017 from http://www.memsql.com/.Google ScholarGoogle Scholar
  47. Micron. 2014. Hybrid Memory Cube Second Generation. (2014). Retrieved April 26, 2017 from http://investors.micron.com/releasedetail.cfm?ReleaseID=828028. {48} Micron. 2017. DDR3 SDRAM System-Power Calculator. (2017). Retrieved April 26, 2017 from https://www.micron.com/support/tools-and-utilities/power-calc.Google ScholarGoogle Scholar
  48. Micron. 2017. DDR3 SDRAM System-Power Calculator. (2017). Retrieved April 26, 2017 from https://www.micron.com/support/tools-and-utilities/power-calc.Google ScholarGoogle Scholar
  49. Nooshin Mirzadeh, Yusuf Onur Kocberber, Babak Falsafi, and Boris Grot. 2015. Sort vs. hash join revisited for near-memory execution. In Proceedings of the 5th Workshop on Architectures and Systems for Big Data (ASBD 2015). http: //acs.ict.ac.cn/asbd2015/papers/ASBD_2015_submission_3.pdfGoogle ScholarGoogle Scholar
  50. Cavium Networks. 2014. Cavium Announces Availability of ThunderX: Industry's First 48 Core Family of ARMv8 Workload Optimized Processors for Next Generation Data Center & Cloud Infrastructure. (2014). Retrieved April 26, 2017 from http://www.cavium.com/ newsevents-Cavium-Announces-Availability-of-ThunderX.html.Google ScholarGoogle Scholar
  51. Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015). 677-689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2013). USENIX, 385-398. https: //www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA 1998). 192-203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazieres, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Commun. ACM 54, 7 (2011), 121-130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (Mar 1997), 34-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014. 743-758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Javier Picorel. 2017. Near-Memory Address Translation. Ph.D. Dissertation. EPFLGoogle ScholarGoogle Scholar
  58. Javier Picorel, Djordje Jevdjic, and Babak Falsafi. 2017. Near-Memory Address Translation. In 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017, Portland, OR, USA, September 9-13, 2017. 303-317.Google ScholarGoogle Scholar
  59. Seth H. Pugsley, Jeffrey Jestes, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34, 4 (2014), 44-52.Google ScholarGoogle ScholarCross RefCross Ref
  60. Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Proceedings of the 2014 International Symposium on Per- formance Analysis of Systems and Software (ISPASS 2014). 190-200.Google ScholarGoogle ScholarCross RefCross Ref
  61. Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. Computer Architecture Letters 10, 1 (2011), 16-19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edgecentric graph processing using streaming partitions. In ACM SIGOPS 24th Sympo- sium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, Novem- ber 3-6, 2013. 472-488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD 1979). ACM, New York, NY, USA, 23-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Minglong Shao, Anastassia Ailamaki, and Babak Falsafi. 2005. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative Research. 254-267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2014. GPUfs: Integrating a file system with GPUs. ACM Trans. Comput. Syst. 32, 1 (2014), 1:1-1:31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Mark Silberstein, Sangman Kim, Seonggu Huh, Xinya Zhang, Yige Hu, Amir Wated, and Emmett Witchel. 2016. GPUnet: Networking Abstractions for GPU Programs. ACM Trans. Comput. Syst. 34, 3 (2016), 9:1-9:31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. R. Sivaramakrishnan and S. Jairath. 2014. Next generation SPARC processor cache hierarchy. In IEEE Hot Chips 26 Symposium (HCS), 2014. 1-28.Google ScholarGoogle ScholarCross RefCross Ref
  68. Michael Stonebraker and Ariel Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 36, 2 (2013), 21-27. http://sites.computer.org/ debull/A13june/VoltDB1.pdfGoogle ScholarGoogle Scholar
  69. Tezzaron. 2017. DiRAM4 3D Memory. (2017). Retrieved April 26, 2017 from http://www.tezzaron.com/products/diram4-3d-memory/.Google ScholarGoogle Scholar
  70. Stavros Volos, Djordje Jevdjic, Babak Falsafi, and Boris Grot. 2017. Fat Caches for Scale-Out Servers. IEEE Micro 37, 2 (2017), 90-103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Stavros Volos, Javier Picorel, Babak Falsafi, and Boris Grot. 2014. BuMP: Bulk Memory Access Prediction and Streaming. In Proceedings of the 47th Annual International Symposium on Microarchitecture (MICRO 2014). 545-557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal streams in commercial server applications. In 4th International Symposium on Workload Characterization (IISWC 2008). 99-108.Google ScholarGoogle ScholarCross RefCross Ref
  73. Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. 2006. SimFlex: Statistical Sampling of Computer System Simulation. IEEE Micro 26, 4 (2006), 18-31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating big data with high-throughput, energy-efficient data partitioning. In Pro- ceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 2013). 249-260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. 2014. Q100: the architecture and design of a database processing unit. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014). 255-268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA 2003). 84-95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jason Zebchuk, Babak Falsafi, and Andreas Moshovos. 2013. Multi-grain coherence directories. In The 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, Davis, CA, USA, December 7-11, 2013. 359-370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In Proceedings of the 28th International Conference on Data Engineering (ICDE 2012). 1349-1350. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Algorithm/Architecture Co-Design for Near-Memory Processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 52, Issue 1
      Special Topics
      July 2018
      133 pages
      ISSN:0163-5980
      DOI:10.1145/3273982
      Issue’s Table of Contents

      Copyright © 2018 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 August 2018

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader