ABSTRACT
Processing workloads may have very high IO demands, exceeding the capabilities provided by resource virtualization and requiring direct access to the physical hardware. For computers that are interconnected in PCI Express (PCIe) networks, we have previously proposed Device Lending as a solution for assigning devices to remote hosts. In this paper, we explain how we have extended our implementation with support for the Linux Kernel-based Virtual Machine (KVM) hypervisor. Using our extended Device Lending, it becomes possible to dynamically "pass through" physical remote devices to VM guests while still retaining the flexibility of virtualization, something that previously required extensive facilitation in both hypervisor and device drivers in the form of paravirtualization.
We have also improved our original implementation with support for interoperability between remote devices. We show that it is possible to use multiple devices residing in different hosts, while still achieving the same bandwidth and latency as native PCIe, and without requiring any additional support in device drivers.
- {n. d.}. Linux IOMMU Support. Retrieved April 28, 2018 from https://www.kernel.org/doc/Documentation/Intel-IOMMU.txtGoogle Scholar
- {n. d.}. VFIO - "Virtual Function I/O". Retrieved April 28, 2018 from https://www.kernel.org/doc/Documentation/vfio.txtGoogle Scholar
- Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajes Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Weigert. 2006. Intel Virtualization Technology for Directed I/O. Intel Technology Journal 10, 03 (2006).Google ScholarCross Ref
- Knut Alnæs, Ernst H. Kristiansen, David B. Gustavson, and David V. James. 1990. Scalable Coherent Interface. In Proceedings of International Conference on Computer Systems and Software Engineering (CompEuro). 446--453.Google Scholar
- Chelsio Communications Inc. 2015. The Case Against iWARP. Retrieved April 28, 2018 from https://www.chelsio.com/wp-content/uploads/resources/iWARP-Myths.pdfGoogle Scholar
- Paolo Costa, Hitesh Ballani, Kaveh Razavi, and Ian Kash. 2015. R2C2: A network stack for rack-scale computers. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 551--564. Google ScholarDigital Library
- Alexandros Daglis, Stanko Novaković, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2015. Manycore network interfaces for in-memory rack-scale computing. ACM SIGARCH Computer Architecture News 43, 3 (2015), 567--579. Google ScholarDigital Library
- Dolphin Interconnect Solutions AS. {n. d.}. PXH830 Gen3 PCI Express NTB Host Adapter. Retrieved March 1, 2018 from http://www.dolphinics.no/products/PXH830.htmlGoogle Scholar
- J. Duato, A.J. Pena, F. Silla, R. Mayo, and E.S. Quintana-Ortí. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of International Conference on High Performance Computing and Simulation (HPCS). 224--231.Google Scholar
- T. Fountain, A. McCarthy, and F. Peng. 2005. PCI Express: An Overview of PCI Express, Cabled PCI Express and PXI Express. In Proceedings of International Conference on Accelerator & Large Expt. Physics Control Systems (ICALEPCS).Google Scholar
- John P Hayes, Trevor Mudge, Quentin F Stout, Stephen Colley, and John Palmer. 1986. A Microprocessor-based Hypercube Supercomputer. IEEE Micro 6, 5 (1986), 6--17. Google ScholarDigital Library
- Jian Huang, Xiangyong Ouyang, Jithin Jose, Md Wasi-Ur-Rahman, Hao Wang, Miao Luo, Hari Subramoni, Chet Murthy, and Dhabaleswar K. Panda. 2012. High-performance design of hbase with RDMA over InfiniBand. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS). 774--785. Google ScholarDigital Library
- Neo Jia and Kirti Wankhede. {n.d.}. VFIO Mediated Devices. Retrieved April 29, 2018 from https://www.kernel.org/doc/Documentation/vfio-mediated-device.txtGoogle Scholar
- Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, D K Panda, W Gropp, and R Thakur. 2004. High performance MPI-2 one-sided communication over InfiniBand. In Proceedings of International Symposium on Cluster Computing and the Grid (CCGrid). 531--538. Google ScholarDigital Library
- Lars Bjørlykke Kristiansen, Jonas Markussen, Håkon Kvale Stensland, Michael Riegler, Hugo Kohmann, Friedrich Seifert, Roy Nordstrom, Carsten Griwodz, and Pål Halvorsen. 2016. Device Lending in PCI Express Networks. In Proceedings of International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV). 10:1--10:6. Google ScholarDigital Library
- Mellanox Technologies. 2017. RoCE vs. iWARP Competitive Analysis. Retrieved April 28, 2018 from http://www.mellanox.com/related-docs/whitepapers/WP_RoCE_vs_iWARP.pdfGoogle Scholar
- NVIDIA Corporation. {n. d.}. Nvidia Virtual GPU Technology (vGPU). Retrieved April 28, 2018 from http://www.nvidia.com/object/virtual-gpus.htmlGoogle Scholar
- NVIDIA Corporation. 2017. CUDA Toolkit Documentation 9.1.85. Retrieved April 29, 2018 from http://docs.nvidia.com/cuda/Google Scholar
- Peripheral Component Interconnect Special Interest Group (PCI-SIG). 2008. Multi-root I/O Virtualization and Sharing Specification. https://www.pcisig.com/specifications/iov/multi-root/Google Scholar
- Peripheral Component Interconnect Special Interest Group (PCI-SIG) 2009. Address Translation Services Revision 1.1. Peripheral Component Interconnect Special Interest Group (PCI-SIG). https://www.pcisig.com/specifications/iov/ats/Google Scholar
- Peripheral Component Interconnect Special Interest Group (PCI-SIG). 2010. PCI Express 3.1 Base Specification. https://pcisig.com/specificationsGoogle Scholar
- Peripheral Component Interconnect Special Interest Group (PCI-SIG). 2010. Single-root I/O Virtualization and Sharing Specification. https://www.pcisig.com/specifications/iov/single-root/Google Scholar
- Murali Ravindran. 2008. Extending Cabled PCI Express to Connect Devices with Independent PCI Domains. In Proceedings of the 2nd annual IEEE Systems Conference (SysCon). 1--7.Google ScholarCross Ref
- Jack Regula. 2004. Using Non-transparent Bridging in PCI Express Systems. PLX Technology, Inc. White paper.Google Scholar
- Davide Rosetti. 2014. Benchmarking GPUDirect RDMA on Modern Server Platforms. Retrieved April 29, 2018 from http://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/Google Scholar
- Kazuo Saito, Koji Anai, Keiju Igarashi, Takeshi Nishikawa, Ryoichi Himeno, and Kazuhiro Yoguchi. 1998. ATM bus system. US patent No. 5,796,741 A.Google Scholar
- Mark J. Sullivan. 2010. Intel Xeon Processor C5500/C3500 Series Non-Transparent Bridge. Technical Report. Intel Corporation.Google Scholar
- Jun Suzuki, Yoichi Hidaka, Junichi Higuchi, Teruyuki Baba, Nobuharu Kami, and Takashi Yoshikawa. 2010. Multi-root Share of Single-Root I/O Virtualization (SR-IOV) Compliant PCI Express Device. In Proceedings of Symposium on High Performance Interconnects (HOTI). IEEE, 25--31. Google ScholarDigital Library
- A Trivedi, B Metzler, and P Stuedi. 2011. A case for RDMA in clouds. In Proceedings of the Second Asia-Pacific Workshop on Systems (APSys). 17:1--17:5. Google ScholarDigital Library
- Cheng-Chun Tu, Chao-tang Lee, and Tzi-cker Chiueh. 2013. Secure I/O Device Sharing Among Virtual Machines on Multiple Hosts. ACM SIGARCH Computing Architecture News 41, 3 (2013), 108--119. Google ScholarDigital Library
- A. Venkatesh, H. Subramoni, K. Hamidouche, and Dhabaleswar K. Panda. 2014. A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters. In Proceedings of International Conference on High Performance Computing (HiPC).Google Scholar
- Colin Whitby-Strevens. 1985. The transputer. ACM SIGARCH Computer Architecture News 13, 3 (1985), 292--300. Google ScholarDigital Library
- Heymian Wong. {n. d.}. PCI Express Multi-Root Switch Reconfiguration During System Operation. Master's thesis. Massachusetts Institute of Technology.Google Scholar
Index Terms
- Flexible Device Sharing in PCIe Clusters using Device Lending
Recommendations
Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending
AbstractModern workloads often exceed the processing and I/O capabilities provided by resource virtualization, requiring direct access to the physical hardware in order to reduce latency and computing overhead. For computers interconnected in a cluser, ...
I/o paravirtualization at the device file boundary
ASPLOS '14Paravirtualization is an important I/O virtualization technology since it uniquely provides all of the following benefits: the ability to share the device between multiple VMs, support for legacy devices without virtualization hardware, and high ...
Platform Device Assignment to KVM-on-ARM Virtual Machines via VFIO
EUC '14: Proceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous ComputingVFIO (Virtual Function I/O) is a Linux kernel infrastructure that allows to leverage the capabilities of modern IOMMUs to drive a device directly from user space without any additional specialized kernel driver being involved. When used by QEMU/KVM, a ...
Comments