skip to main content
10.1145/543613.543618acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Published:03 June 2002Publication History

ABSTRACT

Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of disk blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks then rt(Q) is at least |Q|/M. One way to evaluate the performance of D with respect to a set of queries 𝑄 is to measure its additive error - the maximum difference between rt(Q) from |Q|/M over all range queries Q ε 𝑄.In this paper, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that such declustering schemes will have an additive error of Ω(log M) when d = 2 and Ω(log d-1/2 M) when d > 2 with respect to range queries.Asymptotically optimal declustering schemes exist for 2-dimensional data. For data in larger dimensions, however, the best bound for additive errors is O(Md-1), which is extremely large. In this paper, we propose the two declustering schemes based on low discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(logd-1 M) with respect to range queries provided certain conditions are satisfied: the first scheme requires d ≥ 3 and M to be a power of a prime where the prime is at least d while the second scheme requires the size of the data to grow within some polynomial of M, with no restriction on M. These are the first known multidimensional declustering schemes with additive errors near optimal.

References

  1. NCR WorldMark/Teradata 1 TB TPC-D Executive Summary. www.tpc.org.]]Google ScholarGoogle Scholar
  2. Sun Fire 6800 midframe server with Sun Storedge arrays. www.sun.com/smi/Press/sunflash/2002-05/sunflash.20020514.2.html.]]Google ScholarGoogle Scholar
  3. K. Abdel-Ghaffar and A. E. Abbadi. Optimal allocation of two-dimensional data. In Proceedings of International Conference on Database Theory, pages 409-418, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Atallah and S. Prabhakar. (Almost) optimal parallel block access for range queries. In Proceedings of ACM Principles of Database Systems, pages 205-215, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bhatia, R. K. Sinha, and C. M. Chen. Declustering using golden ratio sequences. In Proceedings of International Conference on Data Engineering, pages 271-280, Feb 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Bhatia, R. K. Sinha, and C. M. Chen. Hierarchical declustering schemes for range queries. In Proceedings of Conference on Extending Database Technology, pages 525-537, Mar 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz. Titan: a high-performance remote-sensing database. In Proceedings of International Conference on Data Engineering, pages 375-384, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Chen and R. Sinha, Analysis and comparison of declustering schemes for interactive navigation queries. IEEE Transactions on Knowledge and Data Engineering, 12(5):763-778, Sep./Oct. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chen, R. Sinha, and R. Bhatia. Efficient disk allocation schemes for parallel retrieval of multidimensional grid data. In Proceedings of Scientific and Statistical Database Management, 2001.]]Google ScholarGoogle Scholar
  10. L. Chen and D. Rotem. Declustering objects for visualization. In Proceedings of the 19th International Conference, on Very Large Data Bases, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Du and J. Sobolewski. Disk allocation for cartesian product files on multiple disk systems. ACM Transactions of Database Systems, pages 82-101, 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Faloutsos. Gray codes for partial match and range queries. IEEE Transactions on Software Engineering, 14(10):1381-1393, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Faloutsos and P. Bhagwat. Declustering using fractals. In Proceedings of International Conference on Parallel and Distributed Information Systems, pages 18-25, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Fang, R. Lee, and C. Chang. The idea of declustering and its applications. In Proceedings of International Conference on Very Large Databases, pages 181-188, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Faure. Discrepancy of sequences associated with a number system (in dimension s) (in french). Acta Arithmetica, 41(4):337-351, 1982.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Gershon and C. Miller. Dealing with the data deluge. IEEE Spectrum, pages 28-32, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik, 2:84-90, 1960.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Himatsingka, J. Srivastava, J.-Z. Li, and D. Rotem. Latin hypercubes: A class of multidimensional declustering techniques, 1994. Technical Report TR94-05, Department of Computer Science, University of Minnesota.]]Google ScholarGoogle Scholar
  19. X. Jia, J. Richards, and D. Ricken, editors. Remote: sensing digital image analysis: an introduction. Springer-Verlag, Berlin, Germany, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kim and S. Pramanik. Optimal file distribution for partial match retrieval. In Proceedings of ACM SIGMOD Conference, pages 173 -182, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kou, M. Winslett, Y. Cho, and J. Lee. New GDM-based declustering methods for parallel range queries. In IDEAS Conf., pages 119-127, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Li, J. Srivastava, and D. Rotem. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of International Conference on Very Large Databases, pages 3-14, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Liu and S. Shekhar. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 21(6):475-496, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Matousek. Geometric discrepancy, an illustrated guide. Springer-Verlag, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  25. J. May. Parallel I/O for high performance computing. Morgan Kaufmann, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Moon and J. Saltz. Scalability analysis of declustering methods for multidimensional range queries. IEEE Transactions on Knowledge and Data Engineering, 10(2):310-327, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Niederreiter. Nets, (t, s) sequences, and algebraic curves over finite fields with many rational points. Documenta Math. J. DMV, 3:377-386, 1998.]]Google ScholarGoogle Scholar
  28. S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. Abbadi. Efficient retrieval of multidimensional datasets through parallel I/O. In Proceedings of High Performance Computing Conference, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. E. Abbadi. Cyclic allocation of two-dimensional data. In Proceedings of International Conference on Data Engineering, pages 94-101, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. E. Rose. A course in number theory. Oxford University Press, Walton Street, Oxford, 1988.]]Google ScholarGoogle Scholar
  31. K. F. Roth. On irregularities of distribution. Mathematika, 1:73-79, 1954.]]Google ScholarGoogle ScholarCross RefCross Ref
  32. W. Schmidt. On irregularities of distribution vii. Acta Arithmetica, 21:45-50, 1972.]]Google ScholarGoogle ScholarCross RefCross Ref
  33. R. K. Sinha, R. Bhatia, and C. M. Chen. Asymptotically optimal declustering schemes for range queries, 2001. Proceedings of International Conference on Database Theory.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Srivastava, T. Niccum, and B. Himatsingka. Data declustering in PADMA: A parallel database manager. Bulletin of Technical Communication on Data Engineering, 17(3):3-13, 1994.]]Google ScholarGoogle Scholar
  35. Y. Zhou, S. Shekhar, and M. Coyle. Disk allocation methods for parallelizing grid files. In Proceedings of International Conference on Data Engineering, pages 243-252, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2002
      311 pages
      ISBN:1581135076
      DOI:10.1145/543613

      Copyright © 2002 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 June 2002

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      PODS '02 Paper Acceptance Rate24of109submissions,22%Overall Acceptance Rate642of2,707submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader