| Semantic-based distributed i/o with the paramedic framework |
| Full text |
Pdf
(401 KB)
|
Source
|
High Performance Distributed Computing
archive
Proceedings of the 17th international symposium on High performance distributed computing
table of contents
Boston, MA, USA
SESSION: Storage and I/O
table of contents
Pages 175-184
Year of Publication: 2008
ISBN:978-1-59593-997-5
|
|
Authors
|
|
Pavan Balaji
|
Argonne National Laboratory, Argonne, IL, USA
|
|
Wuchun Feng
|
Virginia Tech, Blacksburg, VA, USA
|
|
Heshan Lin
|
North Carolina State University, Raleigh, NC, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 28, Citation Count: 0
|
|
|
ABSTRACT
Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called "ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing" which uses application-specific semantic information to convert the generated data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 25:3389--3402, 1997.
|
| |
2
|
T. Baer and P. Wyckoff. A Parallel I/O Mechanism for Distributed Systems. In Cluster, 2004.
|
| |
3
|
San Diego Supercomputing Center. Parallel 3D FFT Library. http://www.sdsc.edu/us/resources/p3dfft.php.
|
| |
4
|
A. Darling, L. Carey, and W. Feng. The Design, Implementation, and Evaluation of mpiBLAST. In International Conference on Linux Clusters: The HPC Revolution, 2003.
|
| |
5
|
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
|
| |
6
|
I. Foster, D. Kohr, R. Krishnaiyer, and J. Mogill. Remote I/O: Fast access to distant storage. In Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, 1997.
|
| |
7
|
Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.
|
| |
8
|
M. Gardner, W. Feng, J. Archuleta, H. Lin, and X. Ma. Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications. In SC, 2006.
|
| |
9
|
J. Lee, X. Ma, R. Ross, R. Thakur, and M. Winslett. RFS: Efficient and flexible remote file access for MPI-IO. In Cluster, 2004.
|
| |
10
|
J. Lee, R. Ross, S. Atchley, M. Beck, and R. Thakur. MPI-IO/L: efficient remote i/o for mpi-io via logistical networking. In IPDPS, 2006.
|
| |
11
|
TCP Linda. http://www.lindaspaces.com/products/linda_overview.html.
|
| |
12
|
R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, February 1999.
|
|