skip to main content
10.1145/3219104.3229276acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Navigating the Unexpected Realities of Big Data Transfers in a Cloud-based World

Published:22 July 2018Publication History

ABSTRACT

The emergence of big data has created new challenges for researchers transmitting big data sets across campus networks to local (HPC) cloud resources, or over wide area networks to public cloud services. Unlike conventional HPC systems where the network is carefully architected (e.g., a high speed local interconnect, or a wide area connection between Data Transfer Nodes), today's big data communication often occurs over shared network infrastructures with many external and uncontrolled factors influencing performance.

This paper describes our efforts to understand and characterize the performance of various big data transfer tools such as rclone, cyberduck, and other provider-specific CLI tools when moving data to/from public and private cloud resources. We analyze the various parameter settings available on each of these tools and their impact on performance. Our experimental results give insights into the performance of cloud providers and transfer tools, and provide guidance for parameter settings when using cloud transfer tools. We also explore performance when coming from HPC DTN nodes as well as researcher machines located deep in the campus network, and show that emerging SDN approaches such as the VIP Lanes system can deliver excellent performance even from researchers' machines.

References

  1. W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. 2005. The Globus Striped GridFTP Framework and Server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon. 2018. AWS Command Line Interface. https://aws.amazon.com/cli/. (2018).Google ScholarGoogle Scholar
  3. J. Basney and P. Duda. 2007. Clustering the Reliable File Transfer Service. In Proceedings of the 2007 TeraGrid Conference.Google ScholarGoogle Scholar
  4. E. Bocchi, I. Drago, and M. Mellia. 2017. Personal Cloud Storage Benchmarks and Comparison. IEEE Transactions on Cloud Computing 5, 4 (Oct 2017), 751--764.Google ScholarGoogle ScholarCross RefCross Ref
  5. E. Bocchi, M. Mellia, and S. Sarni. 2014. Cloud storage service benchmarking: Methodologies and experimentations. In 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet). 395--400.Google ScholarGoogle Scholar
  6. Dropbox. 2018. dbxcli: A command line tool for Dropbox users and team admins. https://github.com/dropbox/dbxcli. (2018).Google ScholarGoogle Scholar
  7. J. GriffiRoen, K. Calvert, Z. Fei, S. Rivera, J. Chappell, M. Hayashida, C. Carpenter, Y. Song, and H. Nasir. 2017. VIP Lanes: High-Speed Custom Communication Paths for Authorized Flows. In 2017 26th International Conference on Computer Communication and Networks (ICCCN). 1--9.Google ScholarGoogle Scholar
  8. D. Kocher, Y. Langisch, and J. Malek. 2018. Cyberduck. https://cyberduck.io/. (2018).Google ScholarGoogle Scholar
  9. Microsoft. 2018. Azure CLI 2.0. https://docs.microsoft.com/en-us/cli/azure/?view=azure-cli-latest. (2018).Google ScholarGoogle Scholar
  10. Nick Craig Wood. 2018. Rclone - rsync for cloud storage. https://rclone.org/. (2018).Google ScholarGoogle Scholar
  11. The University of Utah. 2018. Exploring the Effects of Options on Performance. https://www.chpc.utah.edu/documentation/software/rclone.php. (2018).Google ScholarGoogle Scholar
  12. V. Persico, A. Montieri, and A. PescapÃĺ. 2016. On the Network Performance of Amazon S3 Cloud-Storage Service. In 2016 5th IEEE International Conference on Cloud Networking (Cloudnet). 113--118.Google ScholarGoogle Scholar
  13. Petter Rasmussen. 2017. Google Drive CLI client. https://github.com/prasmussen/gdrive. (2017).Google ScholarGoogle Scholar
  14. P. Shen, K. Guo, and M. Xiao. 2014. Measuring the QoS of Personal Cloud Storage. In Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT). 1--6.Google ScholarGoogle Scholar

Index Terms

  1. Navigating the Unexpected Realities of Big Data Transfers in a Cloud-based World

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing
        July 2018
        652 pages
        ISBN:9781450364461
        DOI:10.1145/3219104

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        PEARC '18 Paper Acceptance Rate79of123submissions,64%Overall Acceptance Rate133of202submissions,66%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader