skip to main content
10.1145/3106989.3106993acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter

Published:03 August 2017Publication History

ABSTRACT

Limited by the small on-chip memory, hardware-based transport typically implements go-back-N loss recovery mechanism, which costs very few memory but is well-known to perform inferior even under small packet loss ratio. We present MELO, an efficient selective retransmission mechanism for hardware-based transport, which consumes only a constant small memory regardless of the number of concurrent connections. Specifically, MELO employs an architectural separation between data and meta data storage and uses a shared bits pool allocation mechanism to reduce meta data on-chip memory footprint. By only adding in average 23B extra on-chip states for each connection, MELO achieves up to 14.02x throughput while reduces 99% tail FCT by 3.11x compared with go-back-N under certain loss ratio.

References

  1. 2008. InfiniBand architecture volume 1, general specifications, release 1.2.1. InfiniBand Trade Association.Google ScholarGoogle Scholar
  2. 2010. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A16: RDMA over converged ethernet (RoCE). InfiniBand Trade Association.Google ScholarGoogle Scholar
  3. 2012. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A17: RoCEv2 (IP routable RoCE). InfiniBand Trade Association.Google ScholarGoogle Scholar
  4. Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM '10). ACM, New York, NY, USA, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Remzi H Arpaci-Dusseau and Andrea C Arpaci-Dusseau. 2014. Operating systems: Three easy pieces. Vol. 151. Arpaci-Dusseau Books Wisconsin.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, and others. 2016. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cisco. 2015. Priority Flow Control: Build Reliable Layer 2 Infrastructure. (2015). http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-542809_ns783_Networking_Solutions_White_Paper.html.Google ScholarGoogle Scholar
  8. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference. ACM, 202--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, and others. 2015. Pingmesh: A large-scale system for data center network latency measurement and analysis. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 139--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen. 2016. Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks. ACM, 92--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. ieee. 2010. 802.1Qbb - Priority-based Flow Control. (2010). http://www.ieee802.org/1/pages/802.1bb.html.Google ScholarGoogle Scholar
  12. Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Matt Mathis, Jamshid Mahdavi, Sally Floyd, and Allyn Romanow. 1996. TCP selective acknowledgment options. Technical Report. Google ScholarGoogle Scholar
  14. Mellanox. 2012. Mellanox EN Driver for Linux. (2012). http://www.mellanox.com/page/products_dyn?product_family=27&mtag=linux_driver.Google ScholarGoogle Scholar
  15. Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, and others. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. international symposium on computer architecture 42, 3 (2014), 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 523--536. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      APNet '17: Proceedings of the First Asia-Pacific Workshop on Networking
      August 2017
      127 pages
      ISBN:9781450352444
      DOI:10.1145/3106989

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 August 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader