research-article

Public Access

Interference between I/O and MPI Traffic on Fat-tree Networks

Authors:

Kevin A. Brown,

Satoshi Matsuoka,

Abhinav BhateleAuthors Info & Claims

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Article No.: 7, Pages 1 - 10

https://doi.org/10.1145/3225058.3225144

Published: 13 August 2018 Publication History

Abstract

Network congestion arising from simultaneous data transfers can be a significant performance bottleneck for many applications, especially when network resources are shared by multiple concurrently running jobs. Many studies have focused on the impact of network congestion on either MPI performance or I/O performance but the interaction between MPI and I/O traffic is rarely studied and not well understood. In this paper, we analyze and characterize the interference between MPI and I/O traffic on fat-tree networks, highlighting the role of important factors such as message sizes, communication intervals, and job sizes. We also investigate several strategies for reducing MPI-I/O interference, and the benefits and trade-offs of each approach for different scenarios.

References

[1]

{n. d.}. Cab: Intel Xeon system in Livermore Computing. http://computation.llnl.gov/computers/cab.

[2]

2018. High-Performance Storage List. Virtual Institute for I/O. http://www.vi4io.org

[3]

2018. The Lustre Filesystem. http://lustre.org/ Accessed: April 01, 2019.

[4]

Abhinav Bhatele, Nikhil Jain, Katherine E. Isaacs, Ronak Buch, Todd Gamblin, Steven H. Langer, and Laxmikant V. Kale. 2014. Improving application performance via task mapping on IBM Blue Gene/Q. In Proceedings of IEEE International Conference on High Performance Computing (HiPC '14). IEEE Computer Society. LLNL-CONF-655465.

[5]

Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs. 2013. There Goes the Neighborhood: Performance Degradation due to Nearby Jobs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13. ACM Press.

Digital Library

[6]

Kevin A. Brown, J. Domke, and S. Matsuoka. 2015. Hardware-Centric Analysis of Network Performance for MPI Applications. In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). 692--699.

Digital Library

[7]

Nikhil Jain, Abhinav Bhatele, Louis H. Howell, David Böhme, Ian Karlin, Edgar A. León, Misbah Mubarak, Noah Wolfe, Todd Gamblin, and Matthew L. Leininger. 2017. Predicting the Performance Impact of Different Fat-tree Configurations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 50, 13 pages.

Digital Library

[8]

J. Kim, W. Dally, S. Scott, and D. Abts. 2009. Cost-Efficient Dragonfly Topology for Large-Scale Systems. IEEE Micro 29, 1 (Jan 2009), 33--40.

Digital Library

[9]

Thorsten Kurth, Jian Zhang, Nadathur Satish, Evan Racah, Ioannis Mitliagkas, Md. Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, and Pradeep Dubey. 2017. Deep Learning at 15PF: Supervised and Semi-supervised Classification for Scientific Data. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 7, 11 pages.

Digital Library

[10]

Rob Latham, Chris Daley, Wei keng Liao, Kui Gao, Rob Ross, Anshu Dubey, and Alok Choudhary. 2012. A case study for scientific I/O: improving the FLASH astrophysics code. Computational Science & Discovery 5, 1 (2012), 015001. http://stacks.iop.org/1749-4699/5/i=1/a=015001

[11]

C.E. Leiserson. 1985. Fat-trees: Universal Networks for Hardware-efficient Supercomputing. IEEE Trans. Comput. C-34 (Oct 1985), 892--901.

Digital Library

[12]

Charles E. Leiserson. 1985. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 34, 10 (1985), 892--901.

Digital Library

[13]

Edgar A. Leon, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, and Matthew L. Leininger. 2016. Characterizing Parallel Scientific Applications on Commodity Clusters: An Empirical Study of a Tapered Fat-tree. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Computer Society, Article 78, 12 pages. LLNL-CONF-681011.

Digital Library

[14]

Qing Liu, Norbert Podhorszki, Jeremy Logan, and Scott Klasky. 2013. Runtime I/O Re-Routing + Throttling on HPC Storage. In Presented as part of the 5th USENIX Workshop on Hot Topics in Storage and File Systems. USENIX, San Jose, CA. https://www.usenix.org/conference/hotstorage13/workshop-program/presentation/Liu

Digital Library

[15]

Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A Multiplatform Study of I/O Behavior on Petascale Supercomputer. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, New York, NY, USA, 33--44.

Digital Library

[16]

Hans Meuer, Erich Strohmaier, Jack Dongarra, and Horst Simon. 2009. "Top500 Supercomputer Sites". http://www.top500.org.

[17]

Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D. Carothers, Abhinav Bhatele, and Kwan-Liu Ma. 2017. Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE.

[18]

Misbah Mubarak, Christopher D. Carothers, Robert B. Ross, and Philip Carns. 2016. Enabling Parallel Simulation of Large-Scale HPC Network Systems. IEEE Trans. Parallel Distrib. Syst. (2016).

Digital Library

[19]

M. Mubarak and Robert B. Ross. 2017. Validation Study of CODES Dragonfly Network Model with Theta Cray XC System. Technical Report ANL-MCS-TM-369. Argonne National Laboratory. http://www.mcs.anl.gov/papers/MCS-TM-369.pdf

[20]

S.R. Öhring, M. Ibel, S.K. Das, and M.J. Kumar. 1995. On Generalized Fat Trees. In Proceedings of the 9th International Parallel Processing Symposium, 1995. 37--44.

Digital Library

[21]

Sarp Oral, James Simmons, Jason Hill, Dustin Leverman, Feiyi Wang, Matt Ezell, Ross Miller, Douglas Fuller, Raghul Gunasekaran, Youngjae Kim, Saurabh Gupta, Devesh Tiwari Sudharshan S. Vazhkudai, James H. Rogers, David Dillow, Galen M. Shipman, and Arthur S. Bland. 2014. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE.

Digital Library

[22]

Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A Configurable Rule Based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 6, 12 pages.

Digital Library

[23]

Edgar Solomonik, Abhinav Bhatele, and James Demmel. 2011. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA.

Digital Library

[24]

Venkatram Vishwanath, Mark Hereld, Vitali Morozov, and Michael E. Papka. 2011. Topology-aware Data Movement and Staging for I/O Acceleration on Blue Gene/P Supercomputing Systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 19, 11 pages.

Digital Library

[25]

Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, and Zhiling Lan. 2016. Watch Out for the Bully! Job Interference Study on Dragonfly Network. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis. 750--760.

Digital Library

Cited By

Bartelheimer NZhu ZNeuwirth S(2024)Automated Network Performance Characterization for HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.14.1_214:1(2-25)Online publication date: 2024
https://doi.org/10.15803/ijnc.14.1_2
Luo CGu HYu XZhu LZhou ZZhang HHou W(2024)Alarm: An Adaptive Routing Algorithm Based on One-Way Delay for InfinibandIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.338229511:4(3653-3666)Online publication date: Jul-2024
https://doi.org/10.1109/TNSE.2024.3382295
Luo CGu HZhu LZhang H(2024)FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBandIEEE/ACM Transactions on Networking10.1109/TNET.2024.336365832:3(2662-2674)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3363658
Show More Cited By

Index Terms

Interference between I/O and MPI Traffic on Fat-tree Networks
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Networks
  1. Network performance evaluation
    1. Network performance analysis
    2. Network simulations

Recommendations

Hardware supported multicast in fat-tree-based InfiniBand networks
Abstract
The multicast operation is a very commonly used operation in parallel applications. It can be used to implement many collective communication operations as well. Therefore, its performance will affect parallel applications and collective ...
MPI-StarT: delivering network performance to numerical applications
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

We describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the ...
Performance Modeling and Evaluation of MPI

Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

August 2018

945 pages

ISBN:9781450365109

DOI:10.1145/3225058

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ICPP 2018

ICPP 2018: 47th International Conference on Parallel Processing

August 13 - 16, 2018

OR, Eugene, USA

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
543
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)16

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bartelheimer NZhu ZNeuwirth S(2024)Automated Network Performance Characterization for HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.14.1_214:1(2-25)Online publication date: 2024
https://doi.org/10.15803/ijnc.14.1_2
Luo CGu HYu XZhu LZhou ZZhang HHou W(2024)Alarm: An Adaptive Routing Algorithm Based on One-Way Delay for InfinibandIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.338229511:4(3653-3666)Online publication date: Jul-2024
https://doi.org/10.1109/TNSE.2024.3382295
Luo CGu HZhu LZhang H(2024)FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBandIEEE/ACM Transactions on Networking10.1109/TNET.2024.336365832:3(2662-2674)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3363658
Awoleke OSachdev KBrown K(2024)On the utility of probabilistic closed-form proxy models for describing supercomputer network traffic dataInternational Journal of Data Science and Analytics10.1007/s41060-024-00592-zOnline publication date: 23-Aug-2024
https://doi.org/10.1007/s41060-024-00592-z
Salimi Beni MHunold SCosenza B(2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
https://doi.org/10.1007/s11227-024-06040-w
Zhang YMeng QLiu YRen F(2023)Revisiting Congestion Detection in Lossless NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2023.325048431:5(2361-2375)Online publication date: Oct-2023
https://doi.org/10.1109/TNET.2023.3250484
Salimi Beni MCosenza B(2023)An Analysis of Long-Tailed Network Latency Distribution and Background Traffic on Dragonfly+Benchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_8(123-142)Online publication date: 13-May-2023
https://doi.org/10.1007/978-3-031-31180-2_8
Zhang YQian KRen F(2021)Receiver-Driven Congestion Control for InfiniBandProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472466(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472466
Zhang YLiu YMeng QRen FKuipers FCaesar M(2021)Congestion detection in lossless networksProceedings of the 2021 ACM SIGCOMM 2021 Conference10.1145/3452296.3472899(370-383)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3452296.3472899
Brown KMcGlohon NChunduri SBorch ERoss RCarothers CHarms K(2021)A Tunable Implementation of Quality-of-Service Classes for HPC NetworksHigh Performance Computing10.1007/978-3-030-78713-4_8(137-156)Online publication date: 24-Jun-2021
https://dl.acm.org/doi/10.1007/978-3-030-78713-4_8
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten