ABSTRACT
Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of disk blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks then rt(Q) is at least |Q|/M. One way to evaluate the performance of D with respect to a set of queries 𝑄 is to measure its additive error - the maximum difference between rt(Q) from |Q|/M over all range queries Q ε 𝑄.In this paper, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that such declustering schemes will have an additive error of Ω(log M) when d = 2 and Ω(log d-1/2 M) when d > 2 with respect to range queries.Asymptotically optimal declustering schemes exist for 2-dimensional data. For data in larger dimensions, however, the best bound for additive errors is O(Md-1), which is extremely large. In this paper, we propose the two declustering schemes based on low discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(logd-1 M) with respect to range queries provided certain conditions are satisfied: the first scheme requires d ≥ 3 and M to be a power of a prime where the prime is at least d while the second scheme requires the size of the data to grow within some polynomial of M, with no restriction on M. These are the first known multidimensional declustering schemes with additive errors near optimal.
- NCR WorldMark/Teradata 1 TB TPC-D Executive Summary. www.tpc.org.]]Google Scholar
- Sun Fire 6800 midframe server with Sun Storedge arrays. www.sun.com/smi/Press/sunflash/2002-05/sunflash.20020514.2.html.]]Google Scholar
- K. Abdel-Ghaffar and A. E. Abbadi. Optimal allocation of two-dimensional data. In Proceedings of International Conference on Database Theory, pages 409-418, 1997.]] Google ScholarDigital Library
- M. Atallah and S. Prabhakar. (Almost) optimal parallel block access for range queries. In Proceedings of ACM Principles of Database Systems, pages 205-215, 2000.]] Google ScholarDigital Library
- R. Bhatia, R. K. Sinha, and C. M. Chen. Declustering using golden ratio sequences. In Proceedings of International Conference on Data Engineering, pages 271-280, Feb 2000.]] Google ScholarDigital Library
- R. Bhatia, R. K. Sinha, and C. M. Chen. Hierarchical declustering schemes for range queries. In Proceedings of Conference on Extending Database Technology, pages 525-537, Mar 2000.]] Google ScholarDigital Library
- C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz. Titan: a high-performance remote-sensing database. In Proceedings of International Conference on Data Engineering, pages 375-384, 1997.]] Google ScholarDigital Library
- C. Chen and R. Sinha, Analysis and comparison of declustering schemes for interactive navigation queries. IEEE Transactions on Knowledge and Data Engineering, 12(5):763-778, Sep./Oct. 2000.]] Google ScholarDigital Library
- C. Chen, R. Sinha, and R. Bhatia. Efficient disk allocation schemes for parallel retrieval of multidimensional grid data. In Proceedings of Scientific and Statistical Database Management, 2001.]]Google Scholar
- L. Chen and D. Rotem. Declustering objects for visualization. In Proceedings of the 19th International Conference, on Very Large Data Bases, 1993.]] Google ScholarDigital Library
- H. Du and J. Sobolewski. Disk allocation for cartesian product files on multiple disk systems. ACM Transactions of Database Systems, pages 82-101, 1982.]] Google ScholarDigital Library
- C. Faloutsos. Gray codes for partial match and range queries. IEEE Transactions on Software Engineering, 14(10):1381-1393, 1988.]] Google ScholarDigital Library
- C. Faloutsos and P. Bhagwat. Declustering using fractals. In Proceedings of International Conference on Parallel and Distributed Information Systems, pages 18-25, 1993.]] Google ScholarDigital Library
- M. Fang, R. Lee, and C. Chang. The idea of declustering and its applications. In Proceedings of International Conference on Very Large Databases, pages 181-188, 1986.]] Google ScholarDigital Library
- H. Faure. Discrepancy of sequences associated with a number system (in dimension s) (in french). Acta Arithmetica, 41(4):337-351, 1982.]]Google ScholarCross Ref
- N. Gershon and C. Miller. Dealing with the data deluge. IEEE Spectrum, pages 28-32, 1993.]] Google ScholarDigital Library
- J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik, 2:84-90, 1960.]]Google ScholarDigital Library
- B. Himatsingka, J. Srivastava, J.-Z. Li, and D. Rotem. Latin hypercubes: A class of multidimensional declustering techniques, 1994. Technical Report TR94-05, Department of Computer Science, University of Minnesota.]]Google Scholar
- X. Jia, J. Richards, and D. Ricken, editors. Remote: sensing digital image analysis: an introduction. Springer-Verlag, Berlin, Germany, 1999.]] Google ScholarDigital Library
- M. Kim and S. Pramanik. Optimal file distribution for partial match retrieval. In Proceedings of ACM SIGMOD Conference, pages 173 -182, 1988.]] Google ScholarDigital Library
- S. Kou, M. Winslett, Y. Cho, and J. Lee. New GDM-based declustering methods for parallel range queries. In IDEAS Conf., pages 119-127, 1999.]] Google ScholarDigital Library
- J. Li, J. Srivastava, and D. Rotem. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of International Conference on Very Large Databases, pages 3-14, 1992.]] Google ScholarDigital Library
- D. Liu and S. Shekhar. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 21(6):475-496, 1996.]] Google ScholarDigital Library
- J. Matousek. Geometric discrepancy, an illustrated guide. Springer-Verlag, 1999.]]Google ScholarCross Ref
- J. May. Parallel I/O for high performance computing. Morgan Kaufmann, 2001.]] Google ScholarDigital Library
- B. Moon and J. Saltz. Scalability analysis of declustering methods for multidimensional range queries. IEEE Transactions on Knowledge and Data Engineering, 10(2):310-327, 1998.]] Google ScholarDigital Library
- H. Niederreiter. Nets, (t, s) sequences, and algebraic curves over finite fields with many rational points. Documenta Math. J. DMV, 3:377-386, 1998.]]Google Scholar
- S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. Abbadi. Efficient retrieval of multidimensional datasets through parallel I/O. In Proceedings of High Performance Computing Conference, 1998.]] Google ScholarDigital Library
- S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. E. Abbadi. Cyclic allocation of two-dimensional data. In Proceedings of International Conference on Data Engineering, pages 94-101, 1998.]] Google ScholarDigital Library
- H. E. Rose. A course in number theory. Oxford University Press, Walton Street, Oxford, 1988.]]Google Scholar
- K. F. Roth. On irregularities of distribution. Mathematika, 1:73-79, 1954.]]Google ScholarCross Ref
- W. Schmidt. On irregularities of distribution vii. Acta Arithmetica, 21:45-50, 1972.]]Google ScholarCross Ref
- R. K. Sinha, R. Bhatia, and C. M. Chen. Asymptotically optimal declustering schemes for range queries, 2001. Proceedings of International Conference on Database Theory.]] Google ScholarDigital Library
- J. Srivastava, T. Niccum, and B. Himatsingka. Data declustering in PADMA: A parallel database manager. Bulletin of Technical Communication on Data Engineering, 17(3):3-13, 1994.]]Google Scholar
- Y. Zhou, S. Shekhar, and M. Coyle. Disk allocation methods for parallelizing grid files. In Proceedings of International Conference on Data Engineering, pages 243-252, 1994.]] Google ScholarDigital Library
Index Terms
- From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries
Recommendations
From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries
Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of data blocks of the query stored by the ...
Declustering Using Golden Ratio Sequences
ICDE '00: Proceedings of the 16th International Conference on Data EngineeringWe propose a new data declustering scheme for range queries. Our scheme is based on Golden Ratio Sequences (GRS), which have found applications in broadcast disks, hashing, packet routing, etc.We show by analysis and simulation that GRS is nearly the ...
Scalability Analysis of Declustering Methods for Multidimensional Range Queries
Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and ...
Comments