Abstract
The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations, drastically reducing the required amount of randomness while delivering a perfect load distribution.
- S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally, E. Elnohazy, M. Hall, et al. 2010. ExaScale software study: Software challenges in extreme scale systems. Tech. Rep., sponsored by DARPA IPTO in the context of the ExaScale Computing Study.Google Scholar
- A. Azagury, V. Dreizin, M. Factor, E. Henis, D. Naor, Y. Rinetzky, O. Rodeh, J. Satran, A. Tavory, and L. Yerushalmi. 2003. Towards an object store. In Proceedings of the 20th IEEE Conference on Mass Storage Systems and Technologies (MSST). 165--176. Google ScholarDigital Library
- Y. Azar, A. Broder, A. Karlin, and E. Upfal. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200. Google ScholarDigital Library
- J. L. Bentley. 1977. Solutions to Klees rectangle problems. Tech. Rep. Carnegie-Mellon University, Pittsburgh, PA.Google Scholar
- M. Blaum, J. Brady, J. Bruck, and J. Menon. 1994. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. In Proceedings of the 21st International Symposium on Computer Architecture (ISCA). 245--254. Google ScholarDigital Library
- R. P. Brent. 1992. Uniform random number generators for supercomputers. In Proceedings of the 5th Australian Supercomputer Conference. 95--104.Google Scholar
- A. Brinkmann and S. Effert. 2008. Redundant data placement strategies for cluster storage environments. In Proceedings of the 12th International Conference on Principles of DIstributed Systems (OPODIS). Google ScholarDigital Library
- A. Brinkmann, S. Effert, F. Meyer Auf Der Heide, and C. Scheideler. 2007. Dynamic and redundant data placement. In Proceedings of the 27th IEEE International Conference on Distributed Computing Systems (ICDCS). Google ScholarDigital Library
- A. Brinkmann, M. Heidebuer, F. Meyer Auf Der Heide, U. Rückert, K. Salzwedel, and M. Vodisek. 2004. V: Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System. In Proceedings of the 21st IEEE Conference on Mass Storage Systems and Technologies (MSST). 153--157.Google Scholar
- A. Brinkmann, K. Salzwedel, and C. Scheideler. 2000. Efficient, distributed data placement strategies for storage area networks. In Proceedings of the 12th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 119--128. Google ScholarDigital Library
- A. Brinkmann, Kay Salzwedel, and C. Scheideler. 2002. Compact, adaptive placement schemes for non-uniform distribution requirements. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 53--62. Google ScholarDigital Library
- W. S. Chou. 1995. On inversive maximal period polynomials over finite fields. Appl. Algeb. Eng. Commun. Comput. 6, 4--5, 245--250.Google Scholar
- P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST). 1--14. Google ScholarDigital Library
- T. Cortes and J. Labarta. 2001. Extending heterogeneity to RAID level 5. In Proceedings of the USENIX Annual Technical Conference. 119--132. Google ScholarDigital Library
- M. De Berg, O. Cheong, and M. Van Kreveld. 2008. Computational Geometry: Algorithms and Applications. Springer. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41, 6, 205--220. Google ScholarDigital Library
- A. Devulapalli, D. Dalessandro, and P. Wyckoff. 2008. Data structure consistency using atomic operations in storage devices. In Proceedings of the 5th International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--73. Google ScholarDigital Library
- D. Eastlake and P. Jones. 2001. US secure hash algorithm 1 (SHA1).Google Scholar
- J. Gonzalez and T. Cortes. 2008. Distributing orthogonal redundancy on adaptive disk arrays. In Proceedings of the International Conference on Grid Computing, High-Performance and Distributed Applications (GADA). Google ScholarDigital Library
- R. J. Honicky and E. L. Miller. 2003. A fast algorithm for online placement and reorganization of replicated data. In Proceedings of the 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS). Google ScholarDigital Library
- R. J. Honicky and E. L. Miller. 2004. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. In Proceedings of the 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS).Google Scholar
- N. L. Johnson and S. Kotz. 1977. Urn Models and Their Applications. Wiley, New York.Google Scholar
- D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC). 654--663. Google ScholarDigital Library
- D. E. Knuth. 1997. Volume 2: Seminumerical Algorithms. In the Art of Computer Programming, 192.Google Scholar
- D. H. Lehmer. 1951. Mathematical methods in large-scale computing units. Ann. Comput. Lab. Harvard Univ. 26, 141--146.Google Scholar
- M. Luby. 1996. Pseudorandomness and Cryptographic Applications. Princeton University Press. Google ScholarDigital Library
- G. Marsaglia and A. Zaman. 1991. A new class of random number generators. Ann. Appl. Probab. 462--480.Google Scholar
- M. Mense and C. Scheideler. 2008. SPREAD: An adaptive scheme for redundant and fair storage in dynamic heterogeneous storage systems. In Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarDigital Library
- M. Mitzenmacher. 1996. The power of two choices in randomized load balancing. Ph.D. thesis. Computer Science Department, University of California, Berkeley. Google ScholarDigital Library
- D. A. Patterson, G. Gibson, and R. H. Katz. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM Conference on Management of Data (SIGMOD). 109--116. Google ScholarDigital Library
- I. Popov, A. Brinkmann, and T. Friedetzky. 2012. On the influence of PRNGs on data distribution. In Proceedings of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 536--543. Google ScholarDigital Library
- M. Raab and A. Steger. 1998. Balls into BINSA simple and tight analysis. Random. Approx. Tech. Comput. Sci., 159--170. Google ScholarDigital Library
- P. Sanders. 2001. Reconciling simplicity and realism in parallel disk models. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 67--76. Google ScholarDigital Library
- C. Schindelhauer and G. Schomaker. 2005. Weighted distributed hash tables. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 218--227. Google ScholarDigital Library
- F. L. Severence. 2009. System Modeling and Simulation: An Introduction. Wiley.Google Scholar
- M. Stevens, A. Lenstra, and B. de Weger. 2007. Chosen-prefix collisions for md5 and colliding x.509 certificates for different identities. In Proceedings of the 26th Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT). 1--22. Google ScholarDigital Library
- M. Stevens, A. Sotirov, J. Appelbaum, A. Lenstra, D. Molnar, D. Osvik, and B. De Weger. 2009. Short Chosen-Prefix Collisions for MD5 and the Creation of a rogue CA certificate. In Proceedings of the 29th Annual International Cryptology Conference (CRYPTO). 55--69. Google ScholarDigital Library
- I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan. 2003. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11, 1, 17--32. Google ScholarDigital Library
- J. Viega. 2003. Practical random number generation in software. In Proceedings of the 19th Annual Computer Security Applications Conference (ACSAC). 129--141. Google ScholarDigital Library
- J. Walker. 1998. ENT Test suite. http://www.fourmilab.ch/random.Google Scholar
- T. Wang. 2007. Integer hash function. http://www.concentric.net/ttwang/tech/inthash.htm.Google Scholar
- X. Wang, Y. Yin, and H. Yu. 2005. Finding Collisions in the Full SHA-1. In Proceedings of the 25th Annual International Cryptology Conference (CRYPTO). 17--36. Google ScholarDigital Library
- S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. 2006a. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI). 307--320. Google ScholarDigital Library
- S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. 2006b. CRUSH: Controlled, scalable and decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing. Google ScholarDigital Library
- W. Zheng and G. Zhang. 2011. FastScale: Accelerate RAID Scaling by minimizing data migration. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST). Google ScholarDigital Library
Index Terms
- Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems
Recommendations
Opportunities and challenges of storage grid enabled by grid service
The explosive growth of data generated worldwide by information digitization has been identified as the key driver of storage system evolution. In this paper, we discuss the existing storage system architectures and pose the challenges they are facing. ...
The Tiger Shark file system
COMPCON '96: Proceedings of the 41st IEEE International Computer ConferenceTiger Shark is a parallel file system for IBM's AIX operating system. It is designed to support interactive multimedia, particularly large-scale systems such as interactive television (ITV). Tiger Shark scales across the entire RS/6000 product line, ...
Efficient detection of large-scale redundancy in enterprise file systems
In order to catch and reduce waste in the exponentially increasing demand for disk storage, we have developed very efficient technology to detect approximate duplication of large directory hierarchies. Such duplication can be caused, for example, by ...
Comments