skip to main content
research-article

PicoServer: Using 3D stacking technology to build energy efficient servers

Published: 07 November 2008 Publication History

Abstract

This article extends our prior work to show that a straightforward use of 3D stacking technology enables the design of compact energy-efficient servers. Our proposed architecture, called PicoServer, employs 3D technology to bond one die containing several simple, slow processing cores to multiple memory dies sufficient for a primary memory. The multiple memory dies are composed of DRAM. This use of 3D stacks readily facilitates wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency means that thermal constraints, a concern with 3D stacking, are easily satisfied. We extend our original analysis on PicoServer to include: (1) a wider set of server workloads, (2) the impact of multithreading, and (3) the on-chip DRAM architecture and system memory usage. PicoServer is intentionally simple, requiring only the simplest form of 3D technology where die are stacked on top of one another. Our intent is to minimize risk of introducing a new technology (3D) to implement a class of low-cost, low-power compact server architectures.

References

[1]
3DRISC. 2004. FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.
[2]
ARM11MPcore. 2004. ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.
[3]
Banerjee, K., Souri, S. J., Kapur, P., and Saraswat, K. C. 2001. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. IEEE 89, 5 (May), 602--533.
[4]
Barford, P. and Crovella, M. 1998. Generating representative Web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems. 151--160.
[5]
Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4 (Jul.-Aug.), 52--60.
[6]
Black, B., Annavaram, M., Brekelbaum, N., DeVale, J., Jiang, L., Loh, G. H., McCaule, D., Morrow, P., Nelson, D. W., Pantuso, D., Reed, P., Rupley, J., Shankar, S., Shen, J. P., and Webb, C. 2006. Die stacking (3D) microarchitecture. In the International Symposium on Microar- chitecture.
[7]
Black, B., Nelson, D., Webb, C., and Samra, N. 2004. 3D processing technology and its impact on iA32 microprocessors. In Proceedings of the International Conference on Computer Design, 316--318.
[8]
Bryant, R., Hawkes, J., Steiner, J., Barnes, J., and Higdon, J. 2004. Scaling Linux to the extreme from 64 to 512 processors. In the Linux Symposium.
[9]
Chiang, T.-Y., Souri, S. J., Chui, C. O., and Saraswat, K. C. 2001. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Tech. Digest, 681--684.
[10]
Clark, L. T., Hoffman, E. J., Miller, J., Biyani, M., Liao, Y., Strazdus, S., Morrow, M., Verlarde, K. E., and Yarch, M. A. 2001. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE J. Solid State Circ. 36, 11 (Nov.), 1599--1608.
[11]
Congduc, E. L. 2004. Packet classification in the NIC for improved SMP-based Internet servers. In Proceedings of the International Conference on Networking.
[12]
Davis, W. R., Wilson, J., Mick, S., Xu, J., Hua, H., Mineo, C., Sule, A. M., Steer, M., and Franzon, P. D. 2005. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Des. Test Comput. 22, 6, 498--510.
[13]
Flynn, M. J. and Hung, P. 2004. Computer architecture and technology: Some thoughts on the road ahead. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, 3--16.
[14]
Ghosh, M. and Lee, H.-H. S. 2007. Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the International Symposium on Microarchitecture.
[15]
Goplen, B. and Sapatnekar, S. S. 2005. Thermal via placement in 3D ICs. In Proceedings of the International Symposium on Physical Design, 167--174.
[16]
Gupta, S., Hilbert, M., Hong, S., and Patti, R. 2004. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.
[17]
Ho, R. and Horowitz, M. 2001. The future of wires. Proc. IEEE 89, 4 (Apr.).
[18]
Huang, W., Stan, M. R., Skadron, K., Sankaranarayanan, K., Ghosh, S., and Velusam, S. 2004. Compact thermal modeling for temperature-aware design. In Proceedings of the Design Automation Conference.
[19]
ITRS 2005. ITRS roadmap. Tech. Rep.
[20]
Kgil, T. 2007. Architecting energy efficient servers. Ph.D. thesis, University of Michigan.
[21]
Kgil, T. and Mudge, T. 2006. FlashCache: A NAND flash memory file cache for low power Web servers. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems.
[22]
Kgil, T., Roberts, D., and Mudge, T. 2008. Improving NAND flash based disk caches. In Proceedings of the International Symposium on Computer Architecture.
[23]
Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro 25, 2 (Mar.), 21--29.
[24]
Koyanagi, M. 2005. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.
[25]
Kunkel, S. R., Eickemeyer, R. J., Lipasti, M. H., Mullins, T. J., O'Krafka, B., Rosenberg, H., VanderWiel, S. P., Vitale, P. L., and Whitley, L. D. 2000. A performance methodology for commercial servers. IBM J. Res. Develop. 44, 6.
[26]
Laudon, J. 2005. Performance/Watt: The new server focus. SIGARCH Comput. Archit. News 33, 4, 5--13.
[27]
Lee, K., Nakamura, T., Ono, T., Yamada, Y., Mizukusa, T., Hashimoto, H., Park, K., Kurino, H., and Koyanagi, M. 2000. Three-Dimensional shared memory fabricated using wafer stacking technology. In IEDM Tech. Digest, 165--168.
[28]
Lim, K., Ranganathan, P., Chang, J., Patel, C., Mudge, T., and Reinhardt, S. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the International Symposium on Computer Architecture.
[29]
Loi, G. L., Agrawal, B., Srivastava, N., Lin, S.-C., Sherwood, T., and Banerjee, K. 2006. A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. In Proceedings of the Design Automation Conference.
[30]
LS3 2007. (LS)3-Libre streaming, Libre software, Libre standards an open multimedia streaming project. http://streaming.polito.it/.
[31]
Lu, J. 2005. Wafer-Level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.
[32]
MacGillivray, G. 2005. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.
[33]
Maltz, D. A. and Bhagwat, P. 1998. TCP splicing for application layer proxy performance. Res. Rep. RC 21139, IBM. March.
[34]
Matick, R. E. and Schuster, S. E. 2005. Logic-Based eDRAM: Origins and rationale for use. IBM J. Res. Develop. 49, 1 (Jan.).
[35]
MicronDRAM 2008. The Micron system-power calculator. http://www.micron.com/support/part_info/powercalc.
[36]
Mudge, T. 2001. Power: A first-class architectural design constraint. IEEE Comput. 34, 4 (Apr.).
[37]
NetRAM. 2005. Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.
[38]
NSNIC 2001. National semiconductor DP83820 10 /100 /1000 Mb/s PCI ethernet network interface controller.
[39]
Ohsawa, T., Fujita, K., Hatsuda, K., Higashi, T., Shino, T., Minami, Y., Nakajima, H., Morikado, M., Inoh, K., Hamamoto, T., Watanabe, S., Fujii, S., and Furuyama, T. 2006. Design of a 128-Mb SOI DRAM Using the Floating Body Cell (FBC). IEEE J. Solid State Circ. 41, 1 (Jan).
[40]
OSDL. 2006. OSDL dataBase test suite. http://www.osdl.net/lab_activities/kernel_testing/osdl_database_test_suite/.
[41]
Rahman, A. and Reif, R. 2000. System-Level performance evaluation of three-dimensional integrated circuits. IEEE Trans. VLSI 8.
[42]
Ricci, F., Clark, L. T., Beatty, T., Yu, W., Bashmakov, A., Demmons, S., Fox, E., Miller, J., Biyani, M., and Haigh, J. 2005. A 1.5GHz 90nm embedded microprocessor core. In Proceedings of the Symposium on VLSI Circuits.
[43]
RLDRAM. 2008. RLDRAMA memory. http://www.micron.com/products/dram/rldram/.
[44]
Schutz, J. and Webb, C. 2004. A scalable X86 CPU design for 90 nm process. In Proceedings of the International Solid-State Circuits Conference.
[45]
Shah, M., Barreh, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Saha, B., Sheahan, D., Spracklen, L., and Wynn, A. 2007. UltraSPARC T2: A highly-threaded, power-efficient, SPARC SOC. In Asian Solid-State Circuirts Conference.
[46]
SPECWeb. 1999. SPECweb99 benchmark. http://www.spec.org/osg/web99/.
[47]
SPECWeb. 2005. SPECweb2005 benchmark. http://www.spec.org/web2005/.
[48]
Sun Fire T2000. 2008. Sun Fire T2000 server power calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.
[49]
Wendell, D., Lin, J., Kaushik, P., Seshadri, S., Wang, A., Sundararaman, V., Wang, P., McIntyre, H., Kim, S., Hsu, W., Park, H., Levinsky, G., Lu, J., Chirania, M., Heald, R., and Lazar, P. 2004. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proceedings of the International Solid-State Circuits Conference.
[50]
Xue, L., Liu, C. C., Kim, H.-S., Kim, S., and Tiwari, S. 2003. Three-Dimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. Electron Devices 50, 601--609.

Cited By

View all
  • (2021)An L2 Cache Architecture Supporting Bypassing for Low Energy and High PerformanceElectronics10.3390/electronics1011132810:11(1328)Online publication date: 1-Jun-2021
  • (2021)Microprocessor Processes and Devices in Post Exascale Computing Era2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE)10.1109/ICBAIE52039.2021.9390018(1067-1073)Online publication date: 26-Mar-2021
  • (2018)Dataflow Processing in Memory Achieves Significant Energy Efficiencyundefined10.12794/metadc1248478Online publication date: Aug-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 4, Issue 4
October 2008
123 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/1412587
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 07 November 2008
Accepted: 01 June 2008
Revised: 01 May 2008
Received: 01 November 2007
Published in JETC Volume 4, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D stacking technology
  2. Low power
  3. Tier-1/2/3 server
  4. chip multiprocessor
  5. full-system simulation

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)An L2 Cache Architecture Supporting Bypassing for Low Energy and High PerformanceElectronics10.3390/electronics1011132810:11(1328)Online publication date: 1-Jun-2021
  • (2021)Microprocessor Processes and Devices in Post Exascale Computing Era2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE)10.1109/ICBAIE52039.2021.9390018(1067-1073)Online publication date: 26-Mar-2021
  • (2018)Dataflow Processing in Memory Achieves Significant Energy Efficiencyundefined10.12794/metadc1248478Online publication date: Aug-2018
  • (2016)Exploiting accelerators for efficient high dimensional similarity searchACM SIGPLAN Notices10.1145/3016078.285114451:8(1-12)Online publication date: 27-Feb-2016
  • (2016)Exploiting accelerators for efficient high dimensional similarity searchProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851144(1-12)Online publication date: 27-Feb-2016
  • (2016)Survey of Techniques and Architectures for Designing Energy-Efficient Data CentersIEEE Systems Journal10.1109/JSYST.2014.231582310:2(507-519)Online publication date: Jun-2016
  • (2015)Die-stacking ArchitectureSynthesis Lectures on Computer Architecture10.2200/S00644ED1V01Y201505CAC03110:2(1-127)Online publication date: 10-Jun-2015
  • (2014)RhythmACM SIGARCH Computer Architecture News10.1145/2654822.254195642:1(19-34)Online publication date: 24-Feb-2014
  • (2014)Underprovisioning backup power infrastructure for datacentersACM SIGPLAN Notices10.1145/2644865.254196649:4(177-192)Online publication date: 24-Feb-2014
  • (2014)Deterministic galoisACM SIGPLAN Notices10.1145/2644865.254196449:4(499-512)Online publication date: 24-Feb-2014
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media