ABSTRACT
In numerous scientific disciplines, terabyte and soon petabyte-scale data collections are emerging as critical community resources. A new class of Data Grid infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the Climate Modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user's desktop. We report on experiments conducted over SciNET at SC'2000, where we achieved peak performance of 1.55Gb/s and sustained performance of 512.9Mb/s for data transfers between Texas and California.
- "Climate Data Analysis Tool," http://www.pcmdi.llnl.gov/software/cdat/index.html.Google Scholar
- W. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, "Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing," presented at Mass Storage Conference, 2001. Google ScholarDigital Library
- C. Baru, R. Moore, A. Rajasekar, and M. Wan, "The SDSC Storage Resource Broker," presented at Proc. CASCON'98 Conference, 1998. Google ScholarDigital Library
- A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets," J. Network and Computer Applications, pp. 187-200, 2001.Google Scholar
- K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, "Grid Information Services for Distributed Resource Sharing," presented at IEEE International Symposium on High Performance Distributed Computing, 2001. Google ScholarDigital Library
- I. Foster and C. Kesselman, "Globus: A Metacomputing Infrastructure Toolkit," International Journal of Supercomputer Applications, vol. 11, pp. 115-128, 1997.Google ScholarDigital Library
- I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, "A Security Architecture for Computational Grids," in ACM Conference on Computers and Security, 1998, pp. 83-91. Google ScholarDigital Library
- I. Foster and C. Kesselman, "The Grid: Blueprint for a New Computing Infrastructure,".: Morgan Kaufmann, 1999. Google ScholarDigital Library
- I. Foster and C. Kesselman, "Globus: A Toolkit-Based Grid Architecture," in The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, Eds.: Morgan Kaufmann, 1999, pp. 259-278. Google ScholarDigital Library
- I. Foster and C. Kesselman, "A Data Grid Reference Architecture," GriPhyN 2001-6, 2001.Google Scholar
- I. Foster, C. Kesselman, and S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," Intl. J. Supercomputer Applications, vol. (to appear), 2001. Google ScholarDigital Library
- P. A. Fox, J. Garcia, and P. Kellogg, "The HAO Data Service: Experience in Interdisciplinary Data Delivery," presented at Proc. of the CODATA 2000 Workshop, US National Academy, 2000.Google Scholar
- D. Gunter, B. Tierney, B. Crowley, M. Holding, and J. Lee, "NetLogger: a toolkit for distributed system performance analysis.," presented at 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2000. Google ScholarDigital Library
- NTONC, "NTON Connection in support of SC2000," http://www.ntonc.org/docs/NTON_ConnectionsForSC2000v1.1.ppt, 2000.Google Scholar
- L. Qiu, Y. Zhang, and S. Keshav, "On Individual and Aggregate TCP Performance," presented at 7th Intl. Conference on Network Protocols (ICNP'99), Toronto, Canada, 1999. Google ScholarDigital Library
- B. Tierney, "TCP Tuning Guide for Distributed Applications on Wide Area Networks," presented at Usenix; login, 2001.Google Scholar
- S. Vazhkudai, S. Tuecke, and I. Foster, "Replica Selection in the Globus Data Grid," presented at International Workshop on Data Models and Databases on Clusters and the Grid (DataGrid 2001), 2001. Google ScholarDigital Library
- R. Wolski, "Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service," in Proc. 6th IEEE Symp. on High Performance Distributed Computing. Portland, Oregon, 1997. Google ScholarDigital Library
Index Terms
- High-performance remote access to climate simulation data: a challenge problem for data grid technologies
Recommendations
High-performance remote access to climate simulation data: a challenge problem for data grid technologies
Special issue: High performance computing with geographical dataIn numerous scientific disciplines, terabyte and petabyte-scale data collections are emerging as critical community resources. A new class of "data grid" infrastructure is required to support management, transport, distributed access to, and analysis of ...
Introducing High-Performance and High-Throughput Processing in the TRENCADIS Data Sharing Architecture
ADVCOMP '08: Proceedings of the 2008 The Second International Conference on Advanced Engineering Computing and Applications in SciencesTRENCADIS (Towards a Grid Environment to Process and Share DICOM objects) is a Data Sharing infrastructure based on Grid Technologies that Federate Distributed repositories of Radiological Studies using ontological models on the information of the ...
Evaluating parameter sweep workflows in high performance computing
SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and TechnologiesScientific experiments based on computer simulations can be defined, executed and monitored using Scientific Workflow Management Systems (SWfMS). Several SWfMS are available, each with a different goal and a different engine. Due to the exploratory ...
Comments