skip to main content
10.1145/3225058.3225144acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Interference between I/O and MPI Traffic on Fat-tree Networks

Published: 13 August 2018 Publication History

Abstract

Network congestion arising from simultaneous data transfers can be a significant performance bottleneck for many applications, especially when network resources are shared by multiple concurrently running jobs. Many studies have focused on the impact of network congestion on either MPI performance or I/O performance but the interaction between MPI and I/O traffic is rarely studied and not well understood. In this paper, we analyze and characterize the interference between MPI and I/O traffic on fat-tree networks, highlighting the role of important factors such as message sizes, communication intervals, and job sizes. We also investigate several strategies for reducing MPI-I/O interference, and the benefits and trade-offs of each approach for different scenarios.

References

[1]
{n. d.}. Cab: Intel Xeon system in Livermore Computing. http://computation.llnl.gov/computers/cab.
[2]
2018. High-Performance Storage List. Virtual Institute for I/O. http://www.vi4io.org
[3]
2018. The Lustre Filesystem. http://lustre.org/ Accessed: April 01, 2019.
[4]
Abhinav Bhatele, Nikhil Jain, Katherine E. Isaacs, Ronak Buch, Todd Gamblin, Steven H. Langer, and Laxmikant V. Kale. 2014. Improving application performance via task mapping on IBM Blue Gene/Q. In Proceedings of IEEE International Conference on High Performance Computing (HiPC '14). IEEE Computer Society. LLNL-CONF-655465.
[5]
Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs. 2013. There Goes the Neighborhood: Performance Degradation due to Nearby Jobs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13. ACM Press.
[6]
Kevin A. Brown, J. Domke, and S. Matsuoka. 2015. Hardware-Centric Analysis of Network Performance for MPI Applications. In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). 692--699.
[7]
Nikhil Jain, Abhinav Bhatele, Louis H. Howell, David Böhme, Ian Karlin, Edgar A. León, Misbah Mubarak, Noah Wolfe, Todd Gamblin, and Matthew L. Leininger. 2017. Predicting the Performance Impact of Different Fat-tree Configurations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 50, 13 pages.
[8]
J. Kim, W. Dally, S. Scott, and D. Abts. 2009. Cost-Efficient Dragonfly Topology for Large-Scale Systems. IEEE Micro 29, 1 (Jan 2009), 33--40.
[9]
Thorsten Kurth, Jian Zhang, Nadathur Satish, Evan Racah, Ioannis Mitliagkas, Md. Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, and Pradeep Dubey. 2017. Deep Learning at 15PF: Supervised and Semi-supervised Classification for Scientific Data. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 7, 11 pages.
[10]
Rob Latham, Chris Daley, Wei keng Liao, Kui Gao, Rob Ross, Anshu Dubey, and Alok Choudhary. 2012. A case study for scientific I/O: improving the FLASH astrophysics code. Computational Science & Discovery 5, 1 (2012), 015001. http://stacks.iop.org/1749-4699/5/i=1/a=015001
[11]
C.E. Leiserson. 1985. Fat-trees: Universal Networks for Hardware-efficient Supercomputing. IEEE Trans. Comput. C-34 (Oct 1985), 892--901.
[12]
Charles E. Leiserson. 1985. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 34, 10 (1985), 892--901.
[13]
Edgar A. Leon, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, and Matthew L. Leininger. 2016. Characterizing Parallel Scientific Applications on Commodity Clusters: An Empirical Study of a Tapered Fat-tree. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Computer Society, Article 78, 12 pages. LLNL-CONF-681011.
[14]
Qing Liu, Norbert Podhorszki, Jeremy Logan, and Scott Klasky. 2013. Runtime I/O Re-Routing + Throttling on HPC Storage. In Presented as part of the 5th USENIX Workshop on Hot Topics in Storage and File Systems. USENIX, San Jose, CA. https://www.usenix.org/conference/hotstorage13/workshop-program/presentation/Liu
[15]
Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A Multiplatform Study of I/O Behavior on Petascale Supercomputer. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, New York, NY, USA, 33--44.
[16]
Hans Meuer, Erich Strohmaier, Jack Dongarra, and Horst Simon. 2009. "Top500 Supercomputer Sites". http://www.top500.org.
[17]
Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D. Carothers, Abhinav Bhatele, and Kwan-Liu Ma. 2017. Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE.
[18]
Misbah Mubarak, Christopher D. Carothers, Robert B. Ross, and Philip Carns. 2016. Enabling Parallel Simulation of Large-Scale HPC Network Systems. IEEE Trans. Parallel Distrib. Syst. (2016).
[19]
M. Mubarak and Robert B. Ross. 2017. Validation Study of CODES Dragonfly Network Model with Theta Cray XC System. Technical Report ANL-MCS-TM-369. Argonne National Laboratory. http://www.mcs.anl.gov/papers/MCS-TM-369.pdf
[20]
S.R. Öhring, M. Ibel, S.K. Das, and M.J. Kumar. 1995. On Generalized Fat Trees. In Proceedings of the 9th International Parallel Processing Symposium, 1995. 37--44.
[21]
Sarp Oral, James Simmons, Jason Hill, Dustin Leverman, Feiyi Wang, Matt Ezell, Ross Miller, Douglas Fuller, Raghul Gunasekaran, Youngjae Kim, Saurabh Gupta, Devesh Tiwari Sudharshan S. Vazhkudai, James H. Rogers, David Dillow, Galen M. Shipman, and Arthur S. Bland. 2014. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE.
[22]
Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A Configurable Rule Based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 6, 12 pages.
[23]
Edgar Solomonik, Abhinav Bhatele, and James Demmel. 2011. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA.
[24]
Venkatram Vishwanath, Mark Hereld, Vitali Morozov, and Michael E. Papka. 2011. Topology-aware Data Movement and Staging for I/O Acceleration on Blue Gene/P Supercomputing Systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 19, 11 pages.
[25]
Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, and Zhiling Lan. 2016. Watch Out for the Bully! Job Interference Study on Dragonfly Network. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis. 750--760.

Cited By

View all
  • (2024)Automated Network Performance Characterization for HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.14.1_214:1(2-25)Online publication date: 2024
  • (2024)Alarm: An Adaptive Routing Algorithm Based on One-Way Delay for InfinibandIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.338229511:4(3653-3666)Online publication date: Jul-2024
  • (2024)FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBandIEEE/ACM Transactions on Networking10.1109/TNET.2024.336365832:3(2662-2674)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O
  2. Interference
  3. MPI
  4. fat-tree
  5. network
  6. simulation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)16
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automated Network Performance Characterization for HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.14.1_214:1(2-25)Online publication date: 2024
  • (2024)Alarm: An Adaptive Routing Algorithm Based on One-Way Delay for InfinibandIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.338229511:4(3653-3666)Online publication date: Jul-2024
  • (2024)FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBandIEEE/ACM Transactions on Networking10.1109/TNET.2024.336365832:3(2662-2674)Online publication date: Jun-2024
  • (2024)On the utility of probabilistic closed-form proxy models for describing supercomputer network traffic dataInternational Journal of Data Science and Analytics10.1007/s41060-024-00592-zOnline publication date: 23-Aug-2024
  • (2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
  • (2023)Revisiting Congestion Detection in Lossless NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2023.325048431:5(2361-2375)Online publication date: Oct-2023
  • (2023)An Analysis of Long-Tailed Network Latency Distribution and Background Traffic on Dragonfly+Benchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_8(123-142)Online publication date: 13-May-2023
  • (2021)Receiver-Driven Congestion Control for InfiniBandProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472466(1-10)Online publication date: 9-Aug-2021
  • (2021)Congestion detection in lossless networksProceedings of the 2021 ACM SIGCOMM 2021 Conference10.1145/3452296.3472899(370-383)Online publication date: 9-Aug-2021
  • (2021)A Tunable Implementation of Quality-of-Service Classes for HPC NetworksHigh Performance Computing10.1007/978-3-030-78713-4_8(137-156)Online publication date: 24-Jun-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media