Abstract
Scaling up a RAID-0 volume with added disks can increase its storage capacity and I/O bandwidth simultaneously. For preserving a round-robin data distribution, existing scaling approaches require all the data to be migrated. Such large data migration results in a long redistribution time as well as a negative impact on application performance. In this article, we present a new approach to RAID-0 scaling called FastScale. First, FastScale minimizes data migration, while maintaining a uniform data distribution. It moves only enough data blocks from old disks to fill an appropriate fraction of new disks. Second, FastScale optimizes data migration with access aggregation and lazy checkpoint. Access aggregation enables data migration to have a larger throughput due to a decrement of disk seeks. Lazy checkpoint minimizes the number of metadata writes without compromising data consistency. Using several real system disk traces, we evaluate the performance of FastScale through comparison with SLAS, one of the most efficient existing scaling approaches. The experiments show that FastScale can reduce redistribution time by up to 86.06% with smaller application I/O latencies. The experiments also illustrate that the performance of RAID-0 scaled using FastScale is almost identical to, or even better than, that of the round-robin RAID-0.
- Alemany, J. and Thathachar, J. S. 1997. Random striping news on demand servers. Tech. rep. TR-97-02-02, University of Washington.Google Scholar
- Brigham Young University. 2010. TPC-C Postgres 20 iterations. DTB v1.1. Performance Evaluation Laboratory, Trace distribution center. http://tds.cs.byu.edu/tds/.Google Scholar
- Brinkmann, A., Salzwedel, K., and Scheideler, C. 2000. Efficient, distributed data placement strategies for storage area networks. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures. 119--128. Google ScholarDigital Library
- Brown, N. 2006. Online RAID-5 resizing. drivers/md/ raid5.c in the source code of Linux Kernel 2.6.18. http://www.kernel.org/.Google Scholar
- Bucy, J., Schindler, J., Schlosser, S., and Ganger, G. 2008. The DiskSim Simulation Environment Version 4.0 Reference Manual. Tech. rep. CMU-PDL-08-101, Carnegie Mellon University.Google Scholar
- Franklin, C. R. and Wong, J. T. 2006. Expansion of RAID subsystems using spare space with immediate access to new space. US Patent 10/033,997.Google Scholar
- Goel, A., Shahabi, C., Yao, S., and Zimmermann, R. 2002. SCADDAR: An efficient randomized technique to reorganize continuous media blocks. In Proceedings of the 18th International Conference on Data Engineering (ICDE). 473--482. Google ScholarDigital Library
- Gonzalez, J. L. and Cortes, T. 2004. Increasing the capacity of RAID5 by online gradual assimilation. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--24. Google ScholarDigital Library
- Gonzalez, J. L. and Cortes, T. 2007. Adaptive data block placement based on deterministic zones (AdaptiveZ). In Lecture Notes in Computer Science, vol. 4804, 1214--1232. Google ScholarDigital Library
- Hennessy, J. and Patterson, D. 2003. Computer Architecture: A Quantitative Approach, 3rd Ed. Morgan Kaufmann Publishers, Inc., San Francisco, CA. Google ScholarDigital Library
- Hetzler, S. R. 2008. Data storage array scaling method and system with minimal data movement. US Patent 20080276057.Google Scholar
- Hitachi. 2001. Hard disk drive specifications Ultrastar 36Z15. http://www.hitachigst.com/tech/techlib.nsf/techdocs/85256AB8006A31E587256A7800739FEB/$file/U36Z15 sp10.PDF. Revision 1.0, April.Google Scholar
- Honicky, R. J. and Miller, E. L. 2003. A fast algorithm for online placement and reorganization of replicated data. In Proceedings of the 17th International Parallel and Distributed Processing Symposium. Google ScholarDigital Library
- Honicky, R. J. and Miller, E. L. 2004. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. In Proceedings of the 18th International Parallel and Distributed Processing Symposium.Google Scholar
- Kim, C., Kim, G., and Shin, B. 2001. Volume management in SAN environment. In Proceedings of the 8th International Conference on Parallel and Distributed Systems (ICPADS). 500--505. Google ScholarDigital Library
- Legg, C. B. 1999. Method of increasing the storage capacity of a level five RAID disk array by adding, in a single step, a new parity block and N-1 new data blocks which respectively reside in new columns, where N is at least two. US Patent: 6000010, December 1999.Google Scholar
- Muller, K. and Vignaux, T. 2009. SimPy 2.0.1 documentation. http://simpy.sourceforge.net/SimPyDocs/index.html.Google Scholar
- Patterson, D. A. 2002. A simple way to estimate the cost of down-time. In Proceedings of the 16th Large Installation Systems Administration Conference (LISA). 185--188. Google ScholarDigital Library
- Patterson, D. A., Gibson, G. A., and Katz, R. H. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the International Conference on Management of Date (SIGMOD). 109--116. Google ScholarDigital Library
- Santos, J. R., Muntz, R. R., and Ribeiro-Neto, B. A. 2000. Comparing random data allocation and data striping in multimedia servers. ACM SIGMETRICS Perform. Eval. Rev. 28, 1, 44--55. Google ScholarDigital Library
- Seo, B. and Zimmermann, R. 2005. Efficient disk replacement and data migration algorithms for large disk subsystems. ACM Trans. Storage 1, 3, 316--345. Google ScholarDigital Library
- Sivathanu, M., Prabhakaran, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2004. Improving storage system availability with D-GRAID. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST). Google ScholarDigital Library
- Storage Performance Council. 2010. http://www.storageperformance.org/home.Google Scholar
- UMass Trace Repository. 2007. OLTP Application I/O and Search Engine I/O. http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
- Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the International Conference on Supercomputing (SC). Google ScholarDigital Library
- Wilkes, J., Golding, R., Staelin, C., and Sullivan, T. 1996. The HP AutoRAID hierarchical storage system. ACM Trans. Comput. Syst. 14, 1, 108--136. Google ScholarDigital Library
- Wu, S. J., Jiang, H., Feng, D., Tian, L., and Mao, B. 2009. WorkOut: I/O workload outsourcing for boosting the RAID reconstruction performance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). 239--252. Google ScholarDigital Library
- Zhang, G. Y., Shu, J. W., Xue, W., and Zheng, W. M. 2007. SLAS: An efficient approach to scaling round-robin striped volumes. ACM Trans. Storage 3, 1, 1--39. Google ScholarDigital Library
- Zheng, W. M. and Zhang, G. Y. 2011. FastScale: Accelerate RAID scaling by minimizing data migration. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST). Google ScholarDigital Library
Index Terms
- Design and Evaluation of a New Approach to RAID-0 Scaling
Recommendations
Accelerate RAID scaling by reducing disk I/Os and XOR operations
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and CommunicationsIn order to suffice the storage requirements under the big data environment, scaling method is generally adopted to increase the storage capacity of the storage system with the exponential growth of data in the current. RAID has received wide attention ...
Research on a new RAID-6 capacity expand layout
ICCIP '19: Proceedings of the 5th International Conference on Communication and Information ProcessingThe RAID characteristics of lower cost, high data reliability and disk scalability provide possibility for storing massive data. RAID-6 provides more reliability for the storage of massive data. However, due to the complexity of RAID-6 encoding and the ...
Performance Evaluation of 2FT RAID
NBIS '11: Proceedings of the 2011 14th International Conference on Network-Based Information SystemsRecently, there has been increased demand for large-scale online storage for clouds, life logs, and other applications. Previously, we developed the VLSD (Virtual Large-Scale Disks) toolkit for constructing large-scale online storage, using RAID to ...
Comments