skip to main content
research-article

Towards scalable array-oriented active storage: the pyramid approach

Published: 16 February 2012 Publication History

Abstract

The recent explosion in data sizes manipulated by distributed scientific applications has prompted the need to develop specialized storage systems capable to deal with specific access patterns in a scalable fashion. In this context, a large class of applications focuses on parallel array processing: small parts of huge multi-dimensional arrays are concurrently accessed by a large number of clients, both for reading and writing. A specialized storage system that deals with such an access pattern faces several challenges at the level of data/metadata management. We introduce Pyramid, an active arrayoriented storage system that addresses these challenges. Experimental evaluation demonstrates substantial scalability improvements brought by Pyramid with respect to state-ofart approaches both in weak and strong scaling scenarios, with gains of 100% to 150%.

References

[1]
Hdf5. http://www.hdfgroup.org/about/hdf technologies.html.
[2]
Information technology - Portable Operating System Interface (POSIX). Institute of Electrical & Electronics Engineers, 2009.
[3]
P. Brown. Overview of SciDB: large scale array storage, processing and analysis. In SIGMOD '10: Proceedings of the 2010 International conference on Management of data, pages 963--968, Indiana, USA, 2010. ACM.
[4]
P. H. Carns, W. B. Ligon, R. B. Ross, and R. Thakur. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317--327, Atlanta, GA, 2000. USENIX Association.
[5]
J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databases: Why, how, and where. Foundations and trends in databases, 1:379--474, April 2009.
[6]
R. L. Graham. The MPI 2.2 Standard and the Emerging MPI 3 Standard. In EuroMPI '09: Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 2--2, Espoo, Finland, 2009.
[7]
Y. Jégou, S. Lantéri, J. Leduc, M. Noredine, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E.-G. Talbi, and T. Iréa. Grid'5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications, 20(4):481--494, November 2006.
[8]
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. I/o performance challenges at leadership scale. In SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 40:1--40:12, Portland, USA, 2009.
[9]
J. Li, W.-k. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netcdf: A high-performance scientific i/o interface. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, pages 39--47, Phoneix, USA, 2003.
[10]
J. F. Lofstead, S. Klasky, K. Schwan, N. Podhorszki, and C. Jin. Flexible io and integration for scientific codes through the adaptable io system (adios). In CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, pages 15--24, Boston, USA, 2008.
[11]
B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-Amarie. BlobSeer: Next-generation data management for large scale infrastructures. Journal of Parallel and Distributed Computing, 71:169--184, February 2011.
[12]
S. Sarawagi and M. Stonebraker. Efficient organization of large multidimensional arrays. In ICDE '94: Proceedings of the 10th International Conference on Data Engineering, pages 328--336, Houston, USA, 1994.
[13]
E. Smirni, R. Aydt, A. Chien, and D. Reed. I/O requirements of scientific applications: An evolutionary view. In HPDC '02: Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing, pages 49--59, Edinburgh, UK, 2002. IEEE.
[14]
E. Soroush, M. Balazinska, and D. Wang. Arraystore: a storage manager for complex parallel array processing. In SIGMOD '11: Proceedings of the 2011 International conference on management of data, pages 253--264, Athens, Greece, 2011. ACM.
[15]
M. Stonebraker, J. Becla, D. Dewitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. Zdonik. Requirements for Science Data Bases and SciDB. In CIDR '09: Proceedings of the 4th Conference on Innovative Data Systems Research, 2009.
[16]
M. Stonebraker and U. Cetintemel. One size fits all: An idea whose time has come and gone. In ICDE '05: Proceedings of the 21st International Conference on Data Engineering, pages 2--11, Tokyo, Japan, 2005.
[17]
V.-T. Tran, B. Nicolae, G. Antoniu, and L. Bougé. Efficient support for MPI-I/O atomicity based on versioning. In CCGrid 2011: Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pages 514--523, Newport Beach, USA, 2011.
[18]
V.-T. Tran, B. Nicolae, G. Antoniu, and L. Bougé. Pyramid: A large-scale array-oriented active storage system. In LADIS '11: Proceedings of the 5th Workshop on Large-Scale Distributed Systems and Middleware, Seattle, USA, 2011.

Cited By

View all
  • (2016)A scalable storage system for structured data based on higher order index arrayProceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies10.1145/3006299.3006333(247-252)Online publication date: 6-Dec-2016
  • (2016)Towards an Efficient Maintenance of Address Space Overflow for Array Based Storage System2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)10.1109/PDCAT.2016.040(133-138)Online publication date: Dec-2016
  • (2014)SELF-TUNING OPTIMIZATION ON STORAGE SERVERS IN PARALLEL FILE SYSTEMSJournal of Circuits, Systems and Computers10.1142/S021812661450052223:04(1450052)Online publication date: 15-Apr-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 46, Issue 1
January 2012
99 pages
ISSN:0163-5980
DOI:10.1145/2146382
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2012
Published in SIGOPS Volume 46, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)A scalable storage system for structured data based on higher order index arrayProceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies10.1145/3006299.3006333(247-252)Online publication date: 6-Dec-2016
  • (2016)Towards an Efficient Maintenance of Address Space Overflow for Array Based Storage System2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)10.1109/PDCAT.2016.040(133-138)Online publication date: Dec-2016
  • (2014)SELF-TUNING OPTIMIZATION ON STORAGE SERVERS IN PARALLEL FILE SYSTEMSJournal of Circuits, Systems and Computers10.1142/S021812661450052223:04(1450052)Online publication date: 15-Apr-2014
  • (2014)Region templatesParallel Computing10.1016/j.parco.2014.09.00340:10(589-610)Online publication date: 1-Dec-2014
  • (2013)Dynamical Re-striping Data on Storage Servers in Parallel File SystemsProceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference10.1109/COMPSAC.2013.13(65-73)Online publication date: 22-Jul-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media