skip to main content
10.1145/3093338.3104164acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
poster

Container-based Analysis Environments for Low-Barrier Access to Research Data

Authors Info & Claims
Published:09 July 2017Publication History

ABSTRACT

The growing size of high-value sensor-born or computationally derived scientific datasets are pushing the boundaries of traditional models of data access and discovery. Due to their size, these datasets are often accessible only through the systems on which they were created. Access for scientific exploration and reproducibility is limited to file transfer or by applying for access to the systems used to store or generate the original data, which is often infeasible. There is a growing trend toward providing access to large-scale research datasets in-place via container-based analysis environments. This paper describes the National Data Service (NDS) Labs Workbench platform and DataDNS initiative. The Labs Workbench platform is designed to provide scalable and low-barrier access to research data via container-based services. The DataDNS effort is a new initiative designed to enable discovery, access, and in-place analysis for large-scale data, providing a suite of interoperable services to enable researchers, as well as the tools they are most familiar with, to access and analyze these datasets where they reside.

References

  1. W. Allcock, J. Bresnahan, R. Kettimuthu, and M. Link. The globus striped gridftp framework and server. ACM/IEEE Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bernstein. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing, 1(3), Sept 2014.Google ScholarGoogle Scholar
  3. U. K. Devisetty, K. Kennedy, P. Sarando, N. Merchant, and E. Lyons. Bringing your tools to cyverse discovery environment using docker. F1000Research, 5, 2016.Google ScholarGoogle Scholar
  4. DOE Advanced Research Project Agency - Energy. Transportation energy resources from renewable agriculture (TERRA). https://arpa-e.energy.gov/?q=arpa-e-programs/terra.Google ScholarGoogle Scholar
  5. W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An updated performance comparison of virtual machines and linux containers. IEEE ISPASS, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  6. I. Foster. Globus online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Gerlach, W. Tang, K. Keegan, T. Harrison, A. Wilke, J. Bischof, M. D'Souza, S. Devoid, D. Murphy-Olson, N. Desai, and F. Meyer. Skyport: Container-based execution environment management for multi-cloud scientific workflows. DataCloud '14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Kaebler and G. Bradski. Learning OpenCV, Computer Vision in C++ with the OpenCV Library. O'Reilly. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, et al. Jupyter notebooks -- a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas, 2016.Google ScholarGoogle Scholar
  10. B. Ludaescher, K. Chard, N. Gaffney, M. B. Jones, J. Nabrzyski, V. Stodden, and M. Turk. Capturing the" whole tale" of computational research: Reproducibility in computing environments. arXiv preprint arXiv:1610.09958, 2016.Google ScholarGoogle Scholar
  11. L. Marini, R. Kooper, J. Futrelle, J. Plutchak, A. Craig, T. McLaren, and J. Myers. Medici: A scalable multimedia environment for research. Microsoft eScience Workshop, 2010.Google ScholarGoogle Scholar
  12. MathWorks. Matlab. https://www.mathworks.com/.Google ScholarGoogle Scholar
  13. D. Medvedev, G. Lemson, and M. Rippin. Sciserver compute: Bringing analysis close to the data. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM '16. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Meng, R. Kommineni, Q. Pham, R. Gardner, T. Malik, and D. Thain. An invariant framework for conducting reproducible computational science. Journal of Computational Science, 9, 2015. Computational Science at the Gates of Nature.Google ScholarGoogle Scholar
  15. National Data Service. Crop improvement research gets a boost from nds labs workbench at phenome 2017. http://www.nationaldataservice.org/news/170329_workbench.html.Google ScholarGoogle Scholar
  16. B. W. O'Shea, J. H. Wise, H. Xu, and M. L. Norman. Probing the ultraviolet luminosity function of the earliest galaxies with the renaissance simulations. The Astrophysical Journal Letters, 807(1), 2015.Google ScholarGoogle Scholar
  17. RStudio Team. RStudio: Integrated Development Environment for R. RStudio, Inc., 2015.Google ScholarGoogle Scholar
  18. S. Soltesz, H. Pötzl, M. E. Fiuczynski, A. Bavier, and L. Peterson. Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper. Syst. Rev., 41(3), Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Sophocleous, L. Marini, R. Georgiou, M. Elfarargy, and K. McHenry. Medici 2: A scalable content management system for cultural heritage datasets. Code4Lib Jounral, 2017.Google ScholarGoogle Scholar
  20. C. Willis, D. LeBauer, M. Lambert, and M. Burnette. TERRA-REF Analysis Workbench: Container-based Environments for Low-Barrier Access to Research Data, May 2017.Google ScholarGoogle Scholar
  21. J. ZuHone and K. Kowalik. The galaxy cluster merger catalog: An online repository of mock observations from simulated galaxy cluster mergers. arXiv preprint arXiv:1609.04121, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Container-based Analysis Environments for Low-Barrier Access to Research Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PEARC '17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact
      July 2017
      451 pages
      ISBN:9781450352727
      DOI:10.1145/3093338
      • General Chair:
      • David Hart

      Copyright © 2017 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 July 2017

      Check for updates

      Qualifiers

      • poster
      • Research
      • Refereed limited

      Acceptance Rates

      PEARC '17 Paper Acceptance Rate54of79submissions,68%Overall Acceptance Rate133of202submissions,66%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader