ABSTRACT
Severe I/O bottlenecks on High End Computing platforms call for running data analytics in situ. Demonstrating that there exist considerable resources in compute nodes un-used by typical high end scientific simulations, we leverage this fact by creating an agile runtime, termed GoldRush, that can harvest those otherwise wasted, idle resources to efficiently run in situ data analytics. GoldRush uses fine-grained scheduling to "steal" idle resources, in ways that minimize interference between the simulation and in situ analytics. This involves recognizing the potential causes of on-node resource contention and then using scheduling methods that prevent them. Experiments with representative science applications at large scales show that resources harvested on compute nodes can be leveraged to perform useful analytics, significantly improving resource efficiency, reducing data movement costs incurred by alternate solutions, and posing negligible impact on scientific simulations.
- H. Abbasi, G. Eisenhauer, M. Wolf, K. Schwan, and S. Klasky, Just in time: adding value to the i/o pipelines of high performance applications with jitstaging, In HPDC, 2011. Google ScholarDigital Library
- J. C. Bennett, H. Abbasi, P. Bremer, R. Grout, A. Gyulassy, T. Jin, et al. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In SC, 2012. Google ScholarDigital Library
- BOINC. Open-source software for volunteer computing and grid computing. http://boinc.berkeley.edu/. 2013.Google Scholar
- Cray Inc. CrayPat Performance Analysis Tool. http://docs.cray.com/. 2013.Google Scholar
- C. Docan, M. Parashar, S. Klasky. DataSpaces: an interaction and coordination framework for coupled simulation workflows. In HPDC, 2010. Google ScholarDigital Library
- M. Dorier, Using dedicated i/o cores for scalable post-petascale hpc simulations. In ICS, 2011. Google ScholarDigital Library
- N. Fabian, K. Moreland, D. Thompson, A. C. Bauer, P. Marion, B. Geveci, M. Rasquin, K. E. Jansen, The paraview coprocessing library: a scalable, general purpose in situ visualization library. In LDAV, 2011.Google ScholarCross Ref
- GROMACS. http://www.gromacs.org/. 2013.Google Scholar
- E. R. Hawkes, R. Sankaran, and J. H. Chen, Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models. In Journal of Physics: Conference Series, 2005, pp. 65--79.Google ScholarCross Ref
- Hopper Cray XE6 at NERSC. http://www.nersc.gov/systems/hopper-cray-xe6/, 2013.Google Scholar
- T. Hoefler, T. Schneider and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In SC, 2010. Google ScholarDigital Library
- C. Jones, K.-L. Ma, S. Ethier, W.-L. Lee. An ontegrated exploration approach to visualizing multivariate particle data. In Computing in Science & Engineering. Volume 10, Number 4, July/August, 2008, pp. 20--29. Google ScholarDigital Library
- S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney, Grid-based parallel data streaming implemented for the gyrokinetic toroidal code. In SC, 2003. Google ScholarDigital Library
- S. Lakshminarasimhan, N. Shah, S. Ethier, S. Klasky, R. Latham, R. Ross, N. F. Samatova. Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In Euro-Par, 2011. Google ScholarDigital Library
- M. Lee, K. Schwan. Region scheduling: efficiently using the cache architectures via page-level affinity. In ASPLOS, 2012. Google ScholarDigital Library
- M. Li, S. S. Vazhkudai, A. R. Butt, F. Meng, X. Ma, Y. Kim, C. Engelmann, G. Shipman. Functional partitioning to optimize end-to-end performance on many-core architectures. In SC, 2010. Google ScholarDigital Library
- M. J. Litzkow, M. Livny, M. W. Mutka. Condor-a hunter of idle workstations. In ICDCS, 1988.Google ScholarCross Ref
- D. Li, B. Supinski, M. Schulz, D. Nikolopoulos, K. Cameron. Hybrid mpi/openmp power-aware computing. In IPDPS, 2010.Google Scholar
- J. F. Lofstead, F. Zheng, S. Klasky, and K. Schwan. Adaptable, metadata rick i/o methods for portable high performance i/o. In IPDPS, 2009. Google ScholarDigital Library
- Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan. Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In PACT, 2009. Google ScholarDigital Library
- J. Mars, N. Vachharajani, R. Hundt, M. L. Soffa: Contention aware execution: online contention detection and response. In CGO, 2010. Google ScholarDigital Library
- B. Miller, A. Bernat. Anywhere, any time binary instrumentation, In PASTE, 2011. Google ScholarDigital Library
- B. Mohr, A. D. Malony, S. Shende, F. Wolf. Design and prototype of a performance tool interface for openmp. In LACSI, 2001.Google Scholar
- NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html. 2013.Google Scholar
- R. Oldfield, G. Sjaardema, J. F. Lofstead, T. Kordenborck. Trilinos i/o support (trios). In Scientific Programming, August 2012.Google Scholar
- PAPI: Performance Application Programming Interface, http://icl.cs.utk.edu/papi/, 2013.Google Scholar
- T. Peterka, R. Ross, B. Nouanesengsey, T.-Y. Le, H.-W. Shen, W. Kendall, J. Huang. A study of parallel particle tracing for steady-state and time-varying flow fields. In IPDPS, 2011. Google ScholarDigital Library
- S. Plimpton. Fast parallel algorithms for short-range molecular dynamics, In J Comp Phys, 117, 1--19 (1995). Google ScholarDigital Library
- D. Pugmire, H. Childs, C. Garth, S. Ahern, G. Weber. Scalable computation of streamlines on very large datasets. In SC, 2009. Google ScholarDigital Library
- R. Rolf, G. Hager, G. Jost. Hybrid mpi/openmp parallel programming on clusters of multi-core smp nodes. In PDP, 2009.Google Scholar
- O. Rubel, Prabhat, K. Wu, H. Childs, J. Meredith, C. G. R. Geddes, E. Cormier-Michel, S. Ahern, G. H. Weber, P. Messmer, H. Hagen, B. Hamann, E. W. Bethel. High performance multivariate visual data exploration for extremely large data. In SC, 2008. Google ScholarDigital Library
- K. D. Ryu, J. K. Hollingsworth. Linger longer: fine-grain cycle stealing for networks of workstations. In SC, 1998. Google ScholarDigital Library
- A. Sandberg, D. Eklov, E. Hagersten. Reducing cache pollution through detection and elimination of non-temporal memory accesses. In SC, 2010. Google ScholarDigital Library
- Smoky Cluster. http://www.olcf.ornl.gov/computing-resources/smoky/, 2013.Google Scholar
- R. Stevens, A. White, et al. Architectures and technology for extreme scale computing. Technical report, ASCR Scientific Grand Challenges Workshop Series, December 2009.Google Scholar
- L. Tang, J. Mars, and M. L. Soffa. Compiling for niceness: mitigating contention for qos in warehouse scale computers. In CGO, 2012. Google ScholarDigital Library
- L. Tang, J. Mars, W. Wang, T. Dey, M. L. Soffa: ReQoS: reactive static/dynamic compilation for qos in warehouse scale computers. In ASPLOS, 2013. Google ScholarDigital Library
- Vampir Performance Tool. http://www.vampir.eu/. 2013.Google Scholar
- V. Vishwanath, M. Hereld, M. E. Papka, Toward simulation-time data analysis and i/o acceleration on leadership-class systems. In LDAV, 2011.Google ScholarCross Ref
- V. Vishwanath, M. Hereld, V. Morozov, M. E. Papka. Topology-aware data movement and staging for i/o acceleration on blue gene/p supercomputing systems. In SC, 2011. Google ScholarDigital Library
- W. X. Wang, Z. Lin, W. M. Tang, W. W. Lee, S. Ethier, J. L. V. Lewandowski, G. Rewoldt, T. S. Hahm, J. Manickam, Gyro-kinetic simulation of global trubulent tranport properties in tokamak experiments. In Physics of Plasmas, 2006, pp 59--64.Google Scholar
- H. Yu, C. Wang, R. W. Grout, J. H. Chen, K. Ma, In-situ visualizaiton for large-scale combustion simulations. In CGA, 2010. Google ScholarDigital Library
- K. Wu, S. Ahern, E. W. Bethel, J. Chen, H. Childs, E. Cormier-Michel, et al. FastBit: interactively searching massive data. In SciDAC, Journal of Physics: Conference Series, 2009.Google ScholarCross Ref
- H. Yu, C. Wang, K.-L. Ma. Parallel volume rendering using 2-3 swap image compositing for an arbitrary number of processors. In SC, 2008. Google ScholarDigital Library
- F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki and H. Abbasi. Enabling in-situ execution of coupled scientific workflow on multi-core platform. In IPDPS, 2012. Google ScholarDigital Library
- F. Zheng, H. Abbasi, C. Docan, J. F. Lofstead, Q. Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, M. Wolf, Predata-preparatory data analytics on peta-scale machines. In IPDPS, 2010.Google ScholarCross Ref
- F. Zheng, H. Zou, G. Eisenhauer, K. Schwan, M. Wolf, J. Dayal, T.-A. Nguyen, J. Cao, H. Abbasi, S. Klasky, N. Podhorszki, H. Yu. FlexIO: i/o middleware for location-flexible scientific data analytics. In IPDPS, 2013. Google ScholarDigital Library
- S. Zhuravlev, S. Blagodurov, A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010. Google ScholarDigital Library
Index Terms
- GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
Comments