ABSTRACT
The growing size of high-value sensor-born or computationally derived scientific datasets are pushing the boundaries of traditional models of data access and discovery. Due to their size, these datasets are often accessible only through the systems on which they were created. Access for scientific exploration and reproducibility is limited to file transfer or by applying for access to the systems used to store or generate the original data, which is often infeasible. There is a growing trend toward providing access to large-scale research datasets in-place via container-based analysis environments. This paper describes the National Data Service (NDS) Labs Workbench platform and DataDNS initiative. The Labs Workbench platform is designed to provide scalable and low-barrier access to research data via container-based services. The DataDNS effort is a new initiative designed to enable discovery, access, and in-place analysis for large-scale data, providing a suite of interoperable services to enable researchers, as well as the tools they are most familiar with, to access and analyze these datasets where they reside.
- W. Allcock, J. Bresnahan, R. Kettimuthu, and M. Link. The globus striped gridftp framework and server. ACM/IEEE Supercomputing, 2005. Google ScholarDigital Library
- D. Bernstein. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing, 1(3), Sept 2014.Google Scholar
- U. K. Devisetty, K. Kennedy, P. Sarando, N. Merchant, and E. Lyons. Bringing your tools to cyverse discovery environment using docker. F1000Research, 5, 2016.Google Scholar
- DOE Advanced Research Project Agency - Energy. Transportation energy resources from renewable agriculture (TERRA). https://arpa-e.energy.gov/?q=arpa-e-programs/terra.Google Scholar
- W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An updated performance comparison of virtual machines and linux containers. IEEE ISPASS, 2015.Google ScholarCross Ref
- I. Foster. Globus online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing, 2011. Google ScholarDigital Library
- W. Gerlach, W. Tang, K. Keegan, T. Harrison, A. Wilke, J. Bischof, M. D'Souza, S. Devoid, D. Murphy-Olson, N. Desai, and F. Meyer. Skyport: Container-based execution environment management for multi-cloud scientific workflows. DataCloud '14, 2014. Google ScholarDigital Library
- A. Kaebler and G. Bradski. Learning OpenCV, Computer Vision in C++ with the OpenCV Library. O'Reilly. Google ScholarDigital Library
- T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, et al. Jupyter notebooks -- a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas, 2016.Google Scholar
- B. Ludaescher, K. Chard, N. Gaffney, M. B. Jones, J. Nabrzyski, V. Stodden, and M. Turk. Capturing the" whole tale" of computational research: Reproducibility in computing environments. arXiv preprint arXiv:1610.09958, 2016.Google Scholar
- L. Marini, R. Kooper, J. Futrelle, J. Plutchak, A. Craig, T. McLaren, and J. Myers. Medici: A scalable multimedia environment for research. Microsoft eScience Workshop, 2010.Google Scholar
- MathWorks. Matlab. https://www.mathworks.com/.Google Scholar
- D. Medvedev, G. Lemson, and M. Rippin. Sciserver compute: Bringing analysis close to the data. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM '16. ACM, 2016. Google ScholarDigital Library
- H. Meng, R. Kommineni, Q. Pham, R. Gardner, T. Malik, and D. Thain. An invariant framework for conducting reproducible computational science. Journal of Computational Science, 9, 2015. Computational Science at the Gates of Nature.Google Scholar
- National Data Service. Crop improvement research gets a boost from nds labs workbench at phenome 2017. http://www.nationaldataservice.org/news/170329_workbench.html.Google Scholar
- B. W. O'Shea, J. H. Wise, H. Xu, and M. L. Norman. Probing the ultraviolet luminosity function of the earliest galaxies with the renaissance simulations. The Astrophysical Journal Letters, 807(1), 2015.Google Scholar
- RStudio Team. RStudio: Integrated Development Environment for R. RStudio, Inc., 2015.Google Scholar
- S. Soltesz, H. Pötzl, M. E. Fiuczynski, A. Bavier, and L. Peterson. Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper. Syst. Rev., 41(3), Mar. 2007. Google ScholarDigital Library
- C. Sophocleous, L. Marini, R. Georgiou, M. Elfarargy, and K. McHenry. Medici 2: A scalable content management system for cultural heritage datasets. Code4Lib Jounral, 2017.Google Scholar
- C. Willis, D. LeBauer, M. Lambert, and M. Burnette. TERRA-REF Analysis Workbench: Container-based Environments for Low-Barrier Access to Research Data, May 2017.Google Scholar
- J. ZuHone and K. Kowalik. The galaxy cluster merger catalog: An online repository of mock observations from simulated galaxy cluster mergers. arXiv preprint arXiv:1609.04121, 2016.Google Scholar
Index Terms
- Container-based Analysis Environments for Low-Barrier Access to Research Data
Recommendations
Research data explored: an extended analysis of citations and altmetrics
In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI). We investigate if cited research ...
Domain Administration of Task-role Based Access Control for Process Collaboration Environments
IAS '09: Proceedings of the 2009 Fifth International Conference on Information Assurance and Security - Volume 01The fast evolving workflow technologies facilitate organizations to interact and cooperate with each other to achieve their business goals by process collaborations. Task-role based access control is an important security mechanism to protect data and ...
Security analysis in role-based access control
The administration of large role-based access control (RBAC) systems is a challenging problem. In order to administer such systems, decentralization of administration tasks by the use of delegation is an effective approach. While the use of delegation ...
Comments