skip to main content
10.1145/3149393.3149396acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

CoSS: proposing a contract-based storage system for HPC

Published: 12 November 2017 Publication History

Abstract

Data management is a critical component of high-performance computing, with storage as a cornerstone. Yet the traditional model of parallel file systems fails to meet users' needs, in terms of both performance and features. In this paper, we propose CoSS, a new storage model based on contracts. Contracts encapsulate in the same entity the data model (type, dimensions, units, etc.) and the intended uses of the data. They enable the storage system to work with much more knowledge about the input and output expected from an application and how it should be exposed to the user. This knowledge enables CoSS to optimize data formatting and placement to best fit user's requirements, storage space, and performance. This concept paper introduces the idea of contract-based storage systems and presents some of the opportunities it offers, in order to motivate further research in this direction.

References

[1]
E. Zadok, D. Hildebrand, G. Kuenning, and K. A. Smith, "POSIX is dead! long live... errr... what exactly?" in 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, 2017.
[2]
R. Thakur, W. Gropp, and E. Lusk, "Data sieving and collective I/O in ROMIO," in Frontiers of Massively Parallel Computation, 1999. Frontiers' 99. The Seventh Symposium on the. IEEE, 1999, pp. 182--189.
[3]
J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "PLFS: A checkpoint filesystem for parallel applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC). IEEE, 2009, pp. 1--12.
[4]
M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, "An overview of the HDF5 technology suite and its applications," in Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. ACM, 2011, pp. 36--47.
[5]
R. Rew and G. Davis, "NetCDF: An interface for scientific data access," IEEE Computer Graphics and Applications, vol. 10, no. 4, pp. 76--82, 1990.
[6]
S. A. Weil, A. W. Leung, S. A. Brandt, and C. Maltzahn, "RADOS: A scalable, reliable storage service for petabyte-scale storage clusters," in Proceedings of the 2nd international workshop on Petascale Data Storage: held in conjunction with Super computing'07. ACM, 2007, pp. 35--44.
[7]
C. Mommessin, M. Dreher, and T. Peterka, "Automatic data filtering for in situ workflows," in 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2017.
[8]
H. Childs, E. Brugger, K. Bonnell, J. Meredith, M. Miller, B. Whitlock, and N. Max, "A contract based system for large data visualization," in Visualization, 2005. VIS 05. IEEE. IEEE, 2005, pp. 191--198.
[9]
M. Dorier, G. Antoniu, F. Cappello, M. Snir, R. Sisneros, O. Yildiz, S. Ibrahim, T. Peterka, and L. Orf, "Damaris: Addressing performance variability in data management for post-petascale simulations," ACM Transactions on Parallel Computing (TOPC), vol. 3, no. 3, p. 15, 2016.
[10]
Q. Liu, J. Logan, Y. Tian, H. Abbasi, N. Podhorszki, J. Y. Choi, S. Klasky, R. Tchoua, J. Lofstead, R. Oldfield et ah, "Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks," Concurrency and Computation: Practice and Experience, vol. 26, no. 7, pp. 1453--1473, 2014.
[11]
D. Boyuka, S. Lakshminarasimham, X. Zou, Z. Gong, J. Jenkins, E. Schendel, N. Podhorszki, Q. Liu, S. Klasky, and N. Samatova, "Transparent I Situ Data Transformations in ADIOS," in 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2014, pp. 256--266.
[12]
T. Kuhlen, R. Pajarola, and K. Zhou, "Parallel in situ coupling of simulation with a fully featured visualization system," in Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization (EGPGV), 2011.
[13]
N. Fabian, K. Moreland, D. Thompson, A. C. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. E. Jansen, "The ParaView coprocessing library: A scalable, general purpose in situ visualization library," in IEEE Symposium on Large Data Analysis and Visualization (LDAV). IEEE, 2011, pp. 89--96.
[14]
G. Eisenhauer, M. Wolf, H. Abbasi, S. Klasky, and K. Schwan, "A type system for high performance communication and computation," in IEEE Seventh International Conference on e-Science Workshops (eScienceW). IEEE, 2011, pp. 183--190.
[15]
G. Eisenhauer, M. Wolf, H. Abbasi, and K. Schwan, "Event-based systems: opportunities and challenges at exascale," in Proceedings of the Third ACM International Conference on Distributed Event-Based Systems. ACM, 2009, p. 2.
[16]
J. Dayal, D. Bratcher, G. Eisenhauer, K. Schwan, M. Wolf, X. Zhang, H. Abbasi, S. Klasky, and N. Podhorszki, "Flexpath: Type-based publish/subscribe system for large-scale science analytics," in 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 2014, pp. 246--255.
[17]
M. Dreher and T. Peterka, "Bredala: Semantic data redistribution for in situ applications," in IEEE International Conference on Cluster Computing (CIUSTER). IEEE, 2016, pp. 279--288.
[18]
L. L. N. Laboratory, "Conduit: A scientific data exchange library for HPC simulations," http://software.llnl.gov/conduit/index.html.
[19]
F. Zheng, H. Abbasi, C. Docan, J. Lofstead, Q. Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf, "PreDatA - preparatory data analytics on peta-scale machines," in Parallel Distributed Processing (IPDPS'10), 2010, pp. 1--12.
[20]
M. Dreher and T. Peterka, "Decaf: Decoupled dataflows for in situ high-performance workflows," Tech. Rep., July 2017.
[21]
M. Dorier, R. Sisneros, L. B. Gomez, T. Peterka, L. Orf, L. Rahmani, G. Antoniu, and L. Bougé, "Adaptive performance-constrained in situ visualization of atmospheric simulations," in IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2016, pp. 269--278.
[22]
S. Donovan, G. Huizenga, A. J. Hutton, C. C. Ross, M. K. Petersen, and P. Schwan, "Lustre: Building a file system for 1000-node clusters," in Proceedings of the Linux Symposium, 2003.
[23]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, "Ceph: A scalable, high-performance distributed file system," in Proceedings of the 7th symposium on Operating Systems Design and Implementation. USENIX Association, 2006, pp. 307--320.
[24]
P. H. Carns, W. B. Ligon, R. B. Ross, and R. Thakur, "PVFS: A parallel file system for Linux clusters," in Proceedings of the 4th annual Linux Showcase and Conference, 2000.
[25]
B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable performance of the Panasas parallel file system." in FAST, vol. 8, 2008, pp. 1--17.
[26]
P. F. Corbett and D. G. Feitelson, "The vesta parallel file system," ACM Transactions on Computer Systems (IOCS), vol. 14, no. 3, pp. 225--264, 1996.
[27]
P. Cudré-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla et al., "A demonstration of SciDB: a science-oriented DBMS," Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1534--1537, 2009.
[28]
M. A. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, and C. Maltzahn, "Malacology: A programmable storage system," in Proceedings of the Twelfth European Conference on Computer Systems. ACM, 2017, pp. 175--190.

Cited By

View all
  • (2020)Priority research directions for in situ data management: Enabling scientific discovery from diverse data sourcesThe International Journal of High Performance Computing Applications10.1177/1094342020913628(109434202091362)Online publication date: 27-Mar-2020
  • (2018)Methodology for the Rapid Development of Scalable HPC Data Services2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)10.1109/PDSW-DISCS.2018.00013(76-87)Online publication date: Nov-2018

Index Terms

  1. CoSS: proposing a contract-based storage system for HPC

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems
    November 2017
    74 pages
    ISBN:9781450351348
    DOI:10.1145/3149393
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CoSS
    2. HPC
    3. I/O
    4. contract
    5. data model
    6. metadata
    7. storage

    Qualifiers

    • Research-article

    Conference

    SC '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 17 of 41 submissions, 41%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Priority research directions for in situ data management: Enabling scientific discovery from diverse data sourcesThe International Journal of High Performance Computing Applications10.1177/1094342020913628(109434202091362)Online publication date: 27-Mar-2020
    • (2018)Methodology for the Rapid Development of Scalable HPC Data Services2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)10.1109/PDSW-DISCS.2018.00013(76-87)Online publication date: Nov-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media