ABSTRACT
For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.
- UV-CDAT Spatio-Temporal Parallel Processing Tools. http://uv-cdat.llnl.gov/presentations/PDF/ParaViewSTPWiki.pdf, 2013.Google Scholar
- J. Biddiscombe, B. Geveci, K. Martin, K. Moreland, and D. Thompson. Time dependent processing in a parallel pipeline architecture. IEEE Transactions on Visualization and Computer Graphics, 13(6): 1376--1383, Nov. 2007. Google ScholarDigital Library
- D. Camp, H. Childs, A. Chourasia, C. Garth, and K. I. Joy. Evaluating the benefits of an extended memory hierarchy for parallel streamline algorithms. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 57--64. IEEE, 2011.Google ScholarCross Ref
- H. Childs, D. Pugmire, S. Ahern, B. Whitlock, M. Howison, G. H. Weber, E. W. Bethel, et al. Extreme scaling of production visualization software on diverse architectures. Computer Graphics and Applications, IEEE, 30(3): 22--31, 2010. Google ScholarDigital Library
- N. Fabian, K. Moreland, D. Thompson, A. C. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. E. Jansen. The paraview coprocessing library: A scalable, general purpose in situ visualization library. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 89--96. IEEE, 2011.Google ScholarCross Ref
- W. Kendall, J. Huang, T. Peterka, R. Latham, and R. Ross. Toward a general i/o layer for parallel-visualization applications. Computer Graphics and Applications, IEEE, 31(6): 6--10, 2011. Google ScholarDigital Library
- C. Michell, J. Ahrens, and J. Wang. Visio: Enabling interactive visualization of ultra-scale, time series data via high-bandwidth distributed i/o systems. pages 1--12. IEEE International Parallel and Distributed Processing Symposium, May 2011. Google ScholarDigital Library
- M. L. Norman and A. Snavely. Accelerating data-intensive science with gordon and dash. In Proceedings of the 2010 TeraGrid Conference, page 14. ACM, 2010. Google ScholarDigital Library
- T. Peterka, R. Ross, A. Gyulassy, V. Pascucci, W. Kendall, H.-W. Shen, T.-Y. Lee, and A. Chaudhuri. Scalable parallel building blocks for custom data analysis. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 105--112. IEEE, 2011.Google ScholarCross Ref
- Prabhat, O. Rbel, S. Byna, K. Wu, F. Li, M. Wehner, and W. Bethel. Teca: A parallel toolkit for extreme climate analysis. Procedia Computer Science, 9(0): 866--876, 2012. Proceedings of the International Conference on Computational Science, 2012.Google Scholar
- V. Vishwanath, M. Hereld, and M. E. Papka. Toward simulation-time data analysis and i/o acceleration on leadership-class systems. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 9--14. IEEE, 2011.Google ScholarCross Ref
- B. Whitlock, J. M. Favre, and J. S. Meredith. Parallel in situ coupling of simulation with a fully featured visualization system. In Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization, pages 101--109. Eurographics Association, 2011. Google ScholarDigital Library
- D. Williams, C. Doutriaux, J. Patchett, S. Williams, G. Shipman, R. Miller, C. Steed, H. Krishnan, C. Silva, A. Chaudhary, P. Bremer, D. Pugmire, W. Bethel, H. Childs, M. Prabhat, B. Geveci, A. Bauer, A. Pletzer, J. Poco, T. Ellqvist, E. Santos, G. Potter, B. Smith, T. Maxwell, D. Kindig, and D. Koop. The ultra-scale visualization climate data analysis tools (uv-cdat): Data analysis and visualization for geoscience data. Computer, PP(99): 1--1, 2013.Google Scholar
- M. Woitaszek, J. M. Dennis, and T. R. Sines. Parallel high-resolution climate data analysis using swift. In Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers, MTAGS '11, pages 5--14, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Woodring, S. Mniszewski, C. Brislawn, D. DeMarle, and J. Ahrens. Revisiting wavelet compression for large-scale climate data using jpeg 2000 and ensuring data precision. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 31--38. IEEE, 2011.Google ScholarCross Ref
- H. Yu and K.-L. Ma. A study of i/o methods for parallel visualization of large-scale data. Parallel Computing, 31(2): 167--183, 2005. Parallel Graphics and Visualization. Google ScholarDigital Library
- H. Yu, K.-L. Ma, and J. Welling. A parallel visualization pipeline for terascale earthquake simulations. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, SC '04, pages 49--, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
Index Terms
A model for optimizing file access patterns using spatio-temporal parallelism
Recommendations
Intra-disk Parallelism: An Idea Whose Time Has Come
Server storage systems use a large number of disks to achieve high performance, thereby consuming a significant amount of power. In this paper, we propose to significantly reduce the power consumed by such storage systems via intra-disk parallelism, ...
Intra-disk Parallelism: An Idea Whose Time Has Come
ISCA '08: Proceedings of the 35th Annual International Symposium on Computer ArchitectureServer storage systems use a large number of disks to achieve high performance, thereby consuming a significant amount of power. In this paper, we propose to significantly reduce the power consumed by such storage systems via intra-disk parallelism, ...
A high-performance distributed parallel file system for data-intensive computations
One of the challenges brought by large-scale scientific applications is how to avoid remote storage access by collectively using sufficient local storage resources to hold huge amounts of data generated by the simulation while providing high-performance ...
Comments