Abstract
The emergence of programmable network devices and the increasing data traffic of datacenters motivate the idea of in-network computation. By offloading compute operations onto intermediate networking devices (e.g., switches, network accelerators, middleboxes), one can (1) serve network requests on the fly with low latency; (2) reduce datacenter traffic and mitigate network congestion; and (3) save energy by running servers in a low-power mode. However, since (1) existing switch technology doesn't provide general computing capabilities, and (2) commodity datacenter networks are complex (e.g., hierarchical fat-tree topologies, multipath communication), enabling in-network computation inside a datacenter is challenging.
In this paper, as a step towards in-network computing, we present IncBricks, an in-network caching fabric with basic computing primitives. IncBricks is a hardware-software co-designed system that supports caching in the network using a programmable network middlebox. As a key-value store accelerator, our prototype lowers request latency by over 30% and doubles throughput for 1024 byte values in a common cluster configuration. Our results demonstrate the effectiveness of in-network computing and that efficient datacenter network request processing is possible if we carefully split the computation across the different programmable computing elements in a datacenter, including programmable switches, network accelerators, and end hosts.
- Intel DPDK. http://dpdk.org.Google Scholar
- ECMP routing protocol. https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing.Google Scholar
- Zipf's law. https://en.wikipedia.org/wiki/Zipf%27s_law.Google Scholar
- Organizationally unique identifier. https://en.wikipedia.org/wiki/Organizationally_unique_identifier.Google Scholar
- Intel Ethernet Switch FM6000 Series, white paper, 2013.Google Scholar
- Arista 7150 Series Datasheet. https://www.arista.com/assets/data/pdf/Datasheets/7150S_Datasheet.pdf, 2016.Google Scholar
- Microsoft Azure Machine Learning. https://azure.microsoft.com/en-us/services/machine-learning/, 2016.Google Scholar
- OCTEON Development Kits. http://www.cavium.com/octeon_software_develop_kit.html, 2016.Google Scholar
- LiquidIO Server Adapters. http://www.cavium.com/LiquidIO_Server_Adapters.html, 2016.Google Scholar
- XPliant Ethernet Switch Product Family. http://www.cavium.com/XPliant-Ethernet-Switch-Product-Family.html, 2016.Google Scholar
- Google SyntaxNet. https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html, 2016.Google Scholar
- Mellanox NPS-400 Network Processor. http://www.mellanox.com/related-docs/prod_npu/PB_NPS-400.pdf, 2016.Google Scholar
- Netronome NFP-6000 Intelligent Ethernet Controller Family. https://www.netronome.com/media/redactor_files/PB_NFP-6000.pdf, 2016.Google Scholar
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 63--74, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0201--2. doi: 10.1145/1851182.1851192. URL http://doi.acm.org/10.1145/1851182.1851192. Google ScholarDigital Library
- B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMET-RICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMET-RICS '12, pages 53--64, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1097-0. doi: 10.1145/2254756.2254766. URL http://doi.acm.org/10.1145/2254756.2254766. Google ScholarDigital Library
- P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 99--110, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2056-6. doi: 10.1145/2486001.2486011. URL http://doi.acm.org/10.1145/2486001.2486011. Google ScholarDigital Library
- A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J.-Y. Kim, et al. A cloud-scale acceleration architecture. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--13. IEEE, 2016. Google ScholarCross Ref
- T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571--582, 2014.Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0036-0. doi: 10.1145/1807128.1807152. URL http://doi.acm.org/10.1145/1807128.1807152. Google ScholarDigital Library
- P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea. Camdoop: Exploiting in-network aggregation for big data applications. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 3--3, Berkeley, CA, USA, 2012. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2228298.2228302.Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008. ISSN 0001-0782. doi: 10.1145/1327452.1327492. URL http://doi.acm.org/10.1145/1327452.1327492. Google ScholarDigital Library
- K. Fall, G. Iannaccone, M. Manesh, S. Ratnasamy, K. Argyraki, M. Dobrescu, and N. Egi. Routebricks: Enabling general purpose network infrastructure. SIGOPS Oper. Syst. Rev., 45(1):112--125, Feb. 2011. ISSN 0163-5980. doi: 10.1145/1945023.1945037. URL http://doi.acm.org/10.1145/1945023.1945037. Google ScholarDigital Library
- B. Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5--, Aug. 2004. ISSN 1075--3583. URL http://dl.acm.org/citation.cfm?id=1012889.1012894.Google ScholarDigital Library
- A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM '09, pages 51--62, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-594-9. doi: 10.1145/1592568.1592576. URL http://doi.acm.org/10.1145/1592568.1592576. Google ScholarDigital Library
- S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated software router. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 195--206, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0201-2. doi: 10.1145/1851182.1851207. URL http://doi.acm.org/10.1145/1851182.1851207. Google ScholarDigital Library
- T. L. Harris. A pragmatic implementation of non-blocking linked-lists. In International Symposium on Distributed Computing, pages 300--314. Springer, 2001. Google ScholarCross Ref
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, EuroSys '07, pages 59--72, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-636-3. doi: 10.1145/1272996.1273005. URL http://doi.acm.org/10.1145/1272996.1273005. Google ScholarDigital Library
- V. Jeyakumar, M. Alizadeh, Y. Geng, C. Kim, and D. Maziéres. Millions of little minions: Using packets for low latency network programming and visibility. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM'14, pages 3--14, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2836-4. doi: 10.1145/2619239.2626292. URL http://doi.acm.org/10.1145/2619239.2626292. Google ScholarDigital Library
- A. Kaufmann, S. Peter, N. K. Sharma, T. Anderson, and A. Krishnamurthy. High performance packet processing with flexnic. SIGPLAN Not., 51(4):67--81, Mar. 2016. ISSN 0362-1340. doi: 10.1145/2954679.2872367. URL http://doi.acm.org/10.1145/2954679.2872367. Google ScholarDigital Library
- E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The Click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263--297, 2000. Google ScholarDigital Library
- M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, volume 6, page 2, 2013.Google Scholar
- X. Li, R. Sethi, M. Kaminsky, D. G. Andersen, and M. J. Freedman. Be fast, cheap and in control with SwitchKV. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 31--44, Santa Clara, CA, Mar. 2016. USENIX Association. ISBN 978-1-931971-29-4. URL https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/li-xiaozhou.Google ScholarDigital Library
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. Mica: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 429--444, Seattle, WA, 2014. USENIX Association. ISBN 978-1-931971-09-6. URL https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim.Google ScholarDigital Library
- L. Mai, L. Rupprecht, A. Alim, P. Costa, M. Migliavacca, P. Pietzuch, and A. L. Wolf. NetAgg: Using middleboxes for application-specific on-path aggregation in data centres. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, CoNEXT '14, pages 249--262, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3279-8. doi: 10.1145/2674005.2674996. URL http://doi.acm.org/10.1145/2674005.2674996. Google ScholarDigital Library
- Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys'12, pages 183--196, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1223-3. doi: 10.1145/2168836.2168855. URL http://doi.acm.org/10.1145/2168836.2168855. Google ScholarDigital Library
- N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Open-Flow: Enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev., 38(2):69--74, Mar. 2008. ISSN 0146-4833. doi: 10.1145/1355734.1355746. URL http://doi.acm.org/10.1145/1355734.1355746. Google ScholarDigital Library
- M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73--82. ACM, 2002. Google ScholarDigital Library
- J. P. Morrison. Flow-Based Programming, 2Nd Edition: A New Approach to Application Development. CreateSpace, Paramount, CA, 2010. ISBN 1451542321, 9781451542325.Google Scholar
- R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM '09, pages 39--50, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-594-9. doi: 10.1145/1592568.1592575. URL http://doi.acm.org/10.1145/1592568.1592575. Google ScholarDigital Library
- R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling Memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 385--398, 2013.Google ScholarDigital Library
- S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 1--16, Broomfield, CO, Oct. 2014. USENIX Association. ISBN 978-1-931971-16-4. URL https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter.Google ScholarDigital Library
- P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 396--407, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2784--8. doi: 10.1145/2594291.2594339. URL http://doi.acm.org/10.1145/2594291.2594339. Google ScholarDigital Library
- L. Popa, N. Egi, S. Ratnasamy, and I. Stoica. Building extensible networks with rule-based forwarding. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 379--392, Berkeley, CA, USA, 2010. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1924943.1924970.Google ScholarDigital Library
- C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with multipath TCP. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, pages 266--277, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0797-0. doi: 10.1145/2018436.2018467. URL http://doi.acm.org/10.1145/2018436.2018467. Google ScholarDigital Library
- A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the social network's (datacenter) network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, pages 123--137, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3542-3. doi: 10.1145/2785956.2787472. URL http://doi.acm.org/10.1145/2785956.2787472. Google ScholarDigital Library
- B. Schwartz, A. W. Jackson, W. T. Strayer, W. Zhou, R. D. Rockwell, and C. Partridge. Smart packets: Applying active networks to network management. ACM Trans. Comput. Syst., 18(1):67--88, Feb. 2000. ISSN 0734--2071. doi: 10.1145/332799.332893. URL http://doi.acm.org/10.1145/332799.332893. Google ScholarDigital Library
- O. Shalev and N. Shavit. Split-ordered lists: Lock-free extensible hash tables. J. ACM, 53(3), May 2006. Google ScholarDigital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 1--14, Berkeley, CA, USA, 2008. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1855741.1855742.Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1863103.1863113.Google ScholarDigital Library
Index Terms
- IncBricks: Toward In-Network Computation with an In-Network Cache
Recommendations
IncBricks: Toward In-Network Computation with an In-Network Cache
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsThe emergence of programmable network devices and the increasing data traffic of datacenters motivate the idea of in-network computation. By offloading compute operations onto intermediate networking devices (e.g., switches, network accelerators, ...
IncBricks: Toward In-Network Computation with an In-Network Cache
ASPLOS '17The emergence of programmable network devices and the increasing data traffic of datacenters motivate the idea of in-network computation. By offloading compute operations onto intermediate networking devices (e.g., switches, network accelerators, ...
CONET: a content centric inter-networking architecture
ICN '11: Proceedings of the ACM SIGCOMM workshop on Information-centric networkingCONET is a content-centric inter-network that provides users with a network access to remote named-resources, rather than to remote hosts. Named-resources can be either data (named-data) or service-access-points (named-sap), identified by a network-...
Comments