research-article

Algorithm/Architecture Co-Design for Near-Memory Processing

Authors:
Mario Drumond

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Alexandros Daglis

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Nooshin Mirzadeh

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Dmitrii Ustiugov

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Javier Picorel

Huawei

Huawei
View Profile

,
Babak Falsafi

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Boris Grot

University of Edinburgh

University of Edinburgh
View Profile

,
Dionisios Pnevmatikatos

FORTH-ICS & ECE-TUC

FORTH-ICS & ECE-TUC
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 52 Issue 1July 2018pp 109–122https://doi.org/10.1145/3273982.3273992

Published:28 August 2018Publication History

ACM SIGOPS Operating Systems Review

Abstract

With mainstream technologies to couple logic tightly with memory on the horizon, near-memory processing has re-emerged as a promising approach to improving performance and energy for data-centric computing. DRAM, however, is primarily designed for density and low cost, with a rigid internal organization that favors coarse-grain streaming rather than byte-level random access. This paper makes the case that treating DRAM as a block-oriented streaming device yields significant efficiency and performance benefits, which motivate for algorithm/architecture co-design to favor streaming access patterns, even at the price of a higher order algorithmic complexity. We present the Mondrian Data Engine that drastically improves the runtime and energy efficiency of basic in-memory analytic operators, despite doing more work as compared to traditional CPU-optimized algorithms, which heavily rely on random accesses and deep cache hierarchies

References

Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The Design and Implementation of Modern Column-Oriented Database Systems. Foundations and Trends in Databases 5, 3 (2013), 197-280. Google ScholarDigital Library
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 105-117. Google ScholarDigital Library
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42nd Annual International Symposium on Computer Archi- tecture (ISCA 2015). 336-348. Google ScholarDigital Library
Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd An- nual International Symposium on Computer Architecture (ISCA 2015). 131-143. Google ScholarDigital Library
AMD. 2016. High Bandwidth Memory, Reinventing Memory Technology. (2016). Retrieved April 26, 2017 from http://www.amd.com/en-us/innovations/ software-technologies/hbm.Google Scholar
ARM. 2017. Cortex-A35 Processor. (2017). Retrieved April 26, 2017 from https://www.arm.com/products/processors/cortex-a/cortex-a35-processor.php.Google Scholar
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2012). 53-64. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multicore, Main-memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (Sept. 2013), 85-96. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Ozsu. 2013. Multicore hash joins source code. (2013). Retrieved April 26, 2017 from https://www.systems.ethz.ch/node/334/.Google Scholar
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Mainmemory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proceedings of the 29th International Conference on Data Engineering, (ICDE 2013). 362-373. Google ScholarDigital Library
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2011). 37-48. Google ScholarDigital Library
Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper- Pipelining Query Execution. In Preceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). 225-237. http://www.cidrdb. org/cidr2005/papers/P19.pdfGoogle Scholar
John B. Carter, Wilson C. Hsieh, Leigh Stoller, Mark R. Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael A. Parker, Lambert Schaelicke, and Terry Tateyama. 1999. Impulse: Building a Smarter Memory Controller. In Proceedings of the 5th International Symposium on High- Performance Computer Architecture (HPCA 1999). 70-79. Google ScholarDigital Library
Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012). 33-38. Google ScholarDigital Library
Bill Dally. 2015. Keynote: Challenges for Future Computing Systems. (2015). Retrieved April 26, 2017 from https://www.cs.colostate.edu/~cs575dl/Sp2015/ Lectures/Dally2015.pdf.Google Scholar
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74-80. Google ScholarDigital Library
Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios N. Pnevmatikatos. 2017. The Mondrian Data Engine. In Proceedings of the 44th Annual International Sympo- sium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. 639-651. Google ScholarDigital Library
Mario Drumond, Tao Lin, Martin Jaggi, and Babak Falsafi. 2018. Training DNNs with Hybrid Block Floating Point. CoRR abs/1804.01526 (2018). arXiv:1804.01526 http://arxiv.org/abs/1804.01526Google Scholar
Hewlett-Packard Enterprise. 2015. The Machine: A new kind of computer. (2015). Retrieved April 26, 2017 from http://www.labs.hpe.com/research/themachine/.Google Scholar
Babak Falsafi, Mircea Stan, Kevin Skadron, Nuwan Jayasena, Yunji Chen, Jinhua Tao, Ravi Nair, Jaime H. Moreno, Naveen Muralimanohar, Karthikeyan Sankaralingam, and Cristian Estan. 2016. Near-Memory Data Services. IEEE Micro 36, 1 (2016), 6-13. Google ScholarDigital Library
Michael Ferdman, Almutaz Adileh, Yusuf Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012). 37-48. Google ScholarDigital Library
Apache Software Foundation. 2017. Apache Spark. (2017). Retrieved April 26, 2017 from http://spark.apache.org/.Google Scholar
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT 2015). 113-124. Google ScholarDigital Library
Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In Proceedings of the 2016 International Symposium on High Performance Computer Architecture (HPCA 2016). 126-137.Google ScholarCross Ref
Brian Gold, Anastassia Ailamaki, Larry Huston, and Babak Falsafi. 2005. Accelerating Database Operators Using a Network Processor. In Proceedings of the 1st International Workshop on Data Management on New Hardware (DaMoN '05). ACM, New York, NY, USA, Article 1. Google ScholarDigital Library
Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting Signatures for Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 605-614. Google ScholarDigital Library
Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. 2011. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011). 401-412. Google ScholarDigital Library
Linley Gwennap. 2013. Qualcomm Krait 400 hits 2.3 GHz. Microprocessor report 27, 1 (January 2013), 1-6.Google Scholar
Mary W. Hall, Peter M. Kogge, Jefferey G. Koller, Pedro C. Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John J. Granacki, Jay B. Brockman, Apoorv Srivastava, William C. Athas, Vincent W. Freeh, Jaewook Shin, and Joonseok Park. 1999. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In Proceedings of the ACM/IEEE Conference on Supercomputing, (SC 1999). 57. Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA 2016). 243-254. Google ScholarDigital Library
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4 (2011), 6-15. Google ScholarDigital Library
IBM. 2017. IBM DB2. (2017). Retrieved April 26, 2017 from http://www.ibm. com/analytics/us/en/technology/db2/.Google Scholar
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 87-88.Google ScholarCross Ref
JEDEC. 2013. Wide I/O 2 Standard. (2013). Retrieved April 26, 2017 from http://www.jedec.org/standards-documents/results/jesd229-2.Google Scholar
JEDEC. 2015. High Bandwidth Memory (HBM) DRAM. (2015). Retrieved April 26, 2017 from https://www.jedec.org/standards-documents/docs/jesd235a.Google Scholar
Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David M. Brooks. 2015. Profiling a warehousescale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015). 158-169. Google ScholarDigital Library
Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Vi Lam, Josep Torrellas, and Pratap Pattnaik. 1999. FlexRAM: Toward an Advanced Intelligent Memory System. In Proceedings of the IEEE International Conference On Computer Design, VLSI in Computers and Processors, (ICCD 1999). 192-201. Google ScholarDigital Library
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proceedings of the VLDB Endowment 2, 2 (Aug. 2009), 1378-1389. Google ScholarDigital Library
Yusuf Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin T. Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual Inter- national Symposium on Microarchitecture (MICRO 2013). 468-479. Google ScholarDigital Library
Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 1740-1750Google Scholar
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-store 7 Years Later. Proceedings of the VLDB Endowment 5, 12 (Aug. 2012), 1790-1801. Google ScholarDigital Library
Sheng Li, Ke Chen, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the 2011 International Conference on Computer-Aided Design (ICCAD 2011). 694-701. Google ScholarDigital Library
Yousustain. Sustainability calculator. http://www.yousustain.com/footprintGoogle Scholar
Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. 2002. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. Knowl. Data Eng. 14, 4 (2002), 709-730. Google ScholarDigital Library
Mozhgan Mansuri, James E. Jaussi, Joseph T. Kennedy, Tzu-Chien Hsueh, Sudip Shekhar, Ganesh Balamurugan, Frank O'Mahony, Clark Roberts, Randy Mooney, and Bryan Casper. 2013. A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS. J. Solid-State Circuits 48, 12 (2013), 3229-3242.Google ScholarCross Ref
MEMSQL. 2017. MEMSQL: The Fastest In-Memory Database. (2017). Retrieved April 26, 2017 from http://www.memsql.com/.Google Scholar
Micron. 2014. Hybrid Memory Cube Second Generation. (2014). Retrieved April 26, 2017 from http://investors.micron.com/releasedetail.cfm?ReleaseID=828028. {48} Micron. 2017. DDR3 SDRAM System-Power Calculator. (2017). Retrieved April 26, 2017 from https://www.micron.com/support/tools-and-utilities/power-calc.Google Scholar
Micron. 2017. DDR3 SDRAM System-Power Calculator. (2017). Retrieved April 26, 2017 from https://www.micron.com/support/tools-and-utilities/power-calc.Google Scholar
Nooshin Mirzadeh, Yusuf Onur Kocberber, Babak Falsafi, and Boris Grot. 2015. Sort vs. hash join revisited for near-memory execution. In Proceedings of the 5th Workshop on Architectures and Systems for Big Data (ASBD 2015). http: //acs.ict.ac.cn/asbd2015/papers/ASBD_2015_submission_3.pdfGoogle Scholar
Cavium Networks. 2014. Cavium Announces Availability of ThunderX: Industry's First 48 Core Family of ARMv8 Workload Optimized Processors for Next Generation Data Center & Cloud Infrastructure. (2014). Retrieved April 26, 2017 from http://www.cavium.com/ newsevents-Cavium-Announces-Availability-of-ThunderX.html.Google Scholar
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015). 677-689. Google ScholarDigital Library
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2013). USENIX, 385-398. https: //www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala Google ScholarDigital Library
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA 1998). 192-203. Google ScholarDigital Library
John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazieres, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Commun. ACM 54, 7 (2011), 121-130. Google ScholarDigital Library
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (Mar 1997), 34-44. Google ScholarDigital Library
Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014. 743-758. Google ScholarDigital Library
Javier Picorel. 2017. Near-Memory Address Translation. Ph.D. Dissertation. EPFLGoogle Scholar
Javier Picorel, Djordje Jevdjic, and Babak Falsafi. 2017. Near-Memory Address Translation. In 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017, Portland, OR, USA, September 9-13, 2017. 303-317.Google Scholar
Seth H. Pugsley, Jeffrey Jestes, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro 34, 4 (2014), 44-52.Google ScholarCross Ref
Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Proceedings of the 2014 International Symposium on Per- formance Analysis of Systems and Software (ISPASS 2014). 190-200.Google ScholarCross Ref
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. Computer Architecture Letters 10, 1 (2011), 16-19. Google ScholarDigital Library
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: edgecentric graph processing using streaming partitions. In ACM SIGOPS 24th Sympo- sium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, Novem- ber 3-6, 2013. 472-488. Google ScholarDigital Library
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD 1979). ACM, New York, NY, USA, 23-34. Google ScholarDigital Library
Minglong Shao, Anastassia Ailamaki, and Babak Falsafi. 2005. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative Research. 254-267. Google ScholarDigital Library
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2014. GPUfs: Integrating a file system with GPUs. ACM Trans. Comput. Syst. 32, 1 (2014), 1:1-1:31. Google ScholarDigital Library
Mark Silberstein, Sangman Kim, Seonggu Huh, Xinya Zhang, Yige Hu, Amir Wated, and Emmett Witchel. 2016. GPUnet: Networking Abstractions for GPU Programs. ACM Trans. Comput. Syst. 34, 3 (2016), 9:1-9:31. Google ScholarDigital Library
R. Sivaramakrishnan and S. Jairath. 2014. Next generation SPARC processor cache hierarchy. In IEEE Hot Chips 26 Symposium (HCS), 2014. 1-28.Google ScholarCross Ref
Michael Stonebraker and Ariel Weisberg. 2013. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 36, 2 (2013), 21-27. http://sites.computer.org/ debull/A13june/VoltDB1.pdfGoogle Scholar
Tezzaron. 2017. DiRAM4 3D Memory. (2017). Retrieved April 26, 2017 from http://www.tezzaron.com/products/diram4-3d-memory/.Google Scholar
Stavros Volos, Djordje Jevdjic, Babak Falsafi, and Boris Grot. 2017. Fat Caches for Scale-Out Servers. IEEE Micro 37, 2 (2017), 90-103. Google ScholarDigital Library
Stavros Volos, Javier Picorel, Babak Falsafi, and Boris Grot. 2014. BuMP: Bulk Memory Access Prediction and Streaming. In Proceedings of the 47th Annual International Symposium on Microarchitecture (MICRO 2014). 545-557. Google ScholarDigital Library
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal streams in commercial server applications. In 4th International Symposium on Workload Characterization (IISWC 2008). 99-108.Google ScholarCross Ref
Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. 2006. SimFlex: Statistical Sampling of Computer System Simulation. IEEE Micro 26, 4 (2006), 18-31. Google ScholarDigital Library
Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating big data with high-throughput, energy-efficient data partitioning. In Pro- ceedings of the 40th Annual International Symposium on Computer Architecture (ISCA 2013). 249-260. Google ScholarDigital Library
Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. 2014. Q100: the architecture and design of a database processing unit. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014). 255-268. Google ScholarDigital Library
Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA 2003). 84-95. Google ScholarDigital Library
Jason Zebchuk, Babak Falsafi, and Andreas Moshovos. 2013. Multi-grain coherence directories. In The 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, Davis, CA, USA, December 7-11, 2013. 359-370. Google ScholarDigital Library
Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In Proceedings of the 28th International Conference on Data Engineering (ICDE 2012). 1349-1350. Google ScholarDigital Library

Index Terms

Algorithm/Architecture Co-Design for Near-Memory Processing
1. Software and its engineering

Recommendations

A frequent-value based PRAM memory architecture
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation Conference

Phase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy ...
Read More
DRAMA: An Architecture for Accelerated Processing Near Memory
Improving energy efficiency is crucial for both mobile and high-performance computing systems while a large fraction of total energy is consumed to transfer data between storage and processing units. Thus, reducing data transfers across the memory ...
Read More
Efficient memory architecture for image processing

This Letter presents a novel purpose-designed architecture to realize efficient dual-port memory structures for image processing applications. The main innovation proposed here is the exploitation of single-port (SP) sub-banks to achieve the same data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGOPS Operating Systems Review Volume 52, Issue 1
Special Topics
July 2018
133 pages
ISSN:0163-5980
DOI:10.1145/3273982
Editors:
Mark Silberstein
Technion, Hafia, Israel
,
Christopher J. Rossbach
Stop D9500, Austin, TX
Issue’s Table of Contents
Copyright © 2018 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 August 2018
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 452
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithm/Architecture Co-Design for Near-Memory Processing

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

A frequent-value based PRAM memory architecture

DRAMA: An Architecture for Accelerated Processing Near Memory

Efficient memory architecture for image processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Algorithm/Architecture Co-Design for Near-Memory Processing

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

A frequent-value based PRAM memory architecture

DRAMA: An Architecture for Accelerated Processing Near Memory

Efficient memory architecture for image processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media