skip to main content
10.1145/1055558.1055577acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Replicated declustering of spatial data

Published: 14 June 2004 Publication History

Abstract

The problem of disk declustering is to distribute data among multiple disks to reduce query response times through parallel I/O. A strictly optimal declustering technique is one that achieves optimal parallel I/O for all possible queries. In this paper, we focus on techniques that are optimized for spatial range queries. Current declustering techniques, which have single copies of the data, have been shown to be suboptimal for range queries. The lower bound on extra disk accesses is proved to be Ω(log N) for N disks even in the restricted case of an N-by-N grid, and all current approaches have been trying to achieve this bound. Replication is a well-known and effective solution for several problems in databases, especially for availability and load balancing. In this paper, we explore the idea of replication in the context of declustering and propose a framework where strictly optimal parallel I/O is achievable using a small amount of replication. We provide some theoretical foundations for replicated declustering, e.g., a bound for number of copies for strict optimality on any number of disks, and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal. The results for optimal disk allocation are extended for larger number of disks by increasing replication. Our techniques and results are valid for any arbitrary a-by-b grids, and any declustering scheme can be further improved using our replication framework. Using the framework, we perform experiments to identify a strictly optimal disk access schedule for any given arbitrary range query. In addition to the theoretical bounds, we compare the proposed replication based scheme to other existing techniques by performing experiments on real datasets.

References

[1]
K. A. S. Abdel-Ghaffar and A. El Abbadi. Optimal allocation of two-dimensional data. In International Conference on Database Theory, pages 409--418, Delphi, Greece, January 1997.]]
[2]
M. J. Atallah and S. Prabhakar. (Almost) optimal parallel block access for range queries. In Proc. ACM Symp. on Principles of Database Systems, pages 205--215, Dallas, Texas, May 2000.]]
[3]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R* tree: An efficient and robust access method for points and rectangles. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 322--331, May 23--25 1990.]]
[4]
S. Berchtold, C. Bohm, B. Braunmuller, D. A. Keim, and H.-P. Kriegel. Fast parallel similarity search in multimedia databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1--12, Arizona, U.S.A., 1997.]]
[5]
R. Bhatia, R. K. Sinha, and C. Chen. Hierarchical declustering schemes for range queries. In Advances in Database Technology - EDBT 2000, 7th International Conference on Extending Database Technology, Lecture Notes in Computer Science, pages 525--537, Konstanz, Germany, March 2000.]]
[6]
R. Bose and S. Shrikhande. On the construction of sets of mutually orthogonal latin squares and the falsity of a conjecture of euler. Euler. Trans. Am. Math. Sm., 95:191--209, 1960.]]
[7]
C. Chen, R. Bhatia, and R. Sinha. Declustering using golden ratio sequences. In International Conference on Data Engineering, pages 271--280, San Diego, California, Feb 2000.]]
[8]
C. Chen and C. T. Cheng. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proc. ACM Symp. on Principles of Database Systems, pages 29--38, Wisconsin, Madison, 2002.]]
[9]
L. Chen and D. Rotem. Optimal response time retrieval of replicated data. In Proc. ACM Symp. on Principles of Database Systems, pages 36--44, Minneapolis, Minnesota, May 1994.]]
[10]
M. Chen, H. Hsiao, C. Lie, and P. Yu. Using rotational mirrored declustering for replica placement in a disk array-based video server. In Proceedings of the ACM Multimedia, pages 121--130, 1995.]]
[11]
P. Ciaccia and A. Veronesi. Dynamic declustering methods for parallel grid files. In Proceedings of Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O, pages 110--123, Berlin, Germany, Sept. 1996.]]
[12]
H. C. Du and J. S. Sobolewski. Disk allocation for cartesian product files on multiple-disk systems. ACM Transactions of Database Systems, 7(1):82--101, March 1982.]]
[13]
C. Faloutsos and P. Bhagwat. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, pages 18--25, San Diego, CA, Jan 1993.]]
[14]
C. Faloutsos and D. Metaxas. Declustering using error correcting codes. In Proc. ACM Symp. on Principles of Database Systems, pages 253--258, 1989.]]
[15]
H. Ferhatosmanoglu, D. Agrawal, and A. E. Abbadi. Concentric hyperspaces and disk allocation for fast parallel range searching. In Proc. Int. Conf. Data Engineering, pages 608--615, Sydney, Australia, Mar. 1999.]]
[16]
H. Ferhatosmanoglu, A. S. Tosun, and A. Ramachandran. Replicated declustering of spatial data: A technical report. http://www.cse.ohiostate.edu/~hakan/repdec-report.pdf, 2003.]]
[17]
V. Gaede and O. Gunther. Multidimensional access methods. ACM Computing Surveys, 30:170--231, 1998.]]
[18]
S. Ghandeharizadeh and D. J. DeWitt. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of 16th International Conference on Very Large Data Bases, pages 481--492, August 1990.]]
[19]
S. Ghandeharizadeh and D. J. DeWitt. A performance analysis of alternative multi-attribute declustering strategies. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 29--38, San Diego, 1992.]]
[20]
L. Golubchik, S. Khanna, S. Khuller, R. Thurimella, and A. Zhu. Approximation algorithms for data placement on parallel disks. In Symposium on Discrete Algorithms, pages 223--232, 2000.]]
[21]
J. Gray, B. Horst, and M. Walker. Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 148--161, Washington DC., Aug. 1990.]]
[22]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 47--57, 1984.]]
[23]
K. A. Hua and H. C. Young. A general multidimensional data allocation method for multicomputer database systems. In Database and Expert System Applications, pages 401--409, Toulouse, France, Sept. 1997.]]
[24]
M. H. Kim and S. Pramanik. Optimal file distribution for partial match retrieval. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 173--182, Chicago, 1988.]]
[25]
J. Li, J. Srivastava, and D. Rotem. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 3--14, Vancouver, Canada, Aug. 1992.]]
[26]
B. Moon, A. Acharya, and J. Saltz. Study of scalable declustering algorithms for parallel grid files. In Proc. of the Parallel Processing Symposium, Apr. 1996.]]
[27]
R. Muntz, J. Santos, and S. Berson. A parallel disk storage system for real-time multimedia applications. International Journal of Intelligent Systems, Special Issue on Multimedia Computing System, 13(12):1137--74, December 1998.]]
[28]
S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. EI Abbadi. Cyclic allocation of two-dimensional data. In International Conference on Data Engineering, pages 94--101, Orlando, Florida, Feb 1998.]]
[29]
S. Prabhakar, D. Agrawal, and A. El Abbadi. Efficient disk allocation for fast similarity searching. In 10th International Symposium on Parallel Algorithms and Architectures, SPAA '98, pages 78--87, Puerto Vallarta, Mexico, June 1998.]]
[30]
H. Samet. The Design and Analysis of Spatial Structures. Addison Wesley Publishing Company, Inc., Massachusetts, 1989.]]
[31]
P. Sanders, S. Egner, and J. H. M. Korst. Fast concurrent access to parallel disks. In Symposium on Discrete Algorithms, pages 849--858, 2000.]]
[32]
J. Santos and R. Muntz. Design of the RIO (randomized I/O) storage server. Technical Report TR970032, UCLA Computer Science Department, 1997. http://mml.cs.ucla.edu/publications/papers/cstech970032.ps.]]
[33]
Seagate. Seagate specifications. http://www.seagate.com/pdf/datasheets/, December 2003.]]
[34]
R. K. Sinha, R. Bhatia, and C. Chen. Asymptotically optimal declustering schemes for range queries. In 8th International Conference on Database Theory, Lecture Notes in Computer Science, pages 144--158, London, UK, Jan. 2001. Springer.]]
[35]
A. S. Tosun. Replicated declustering for arbitrary queries. In 19th ACM Symposium on Applied Computing, March 2004.]]
[36]
A. S. Tosun and H. Ferhatosmanoglu. Optimal parallel I/O using replication. In Proceedings of International Workshops on Parallel Processing (ICPP), Vancouver, Canada, Aug. 2002.]]

Cited By

View all
  • (2018)A Cost-Effective Data Replica Placement Strategy Based on Hybrid Genetic Algorithm for Cloud ServicesResearch and Practical Issues of Enterprise Information Systems10.1007/978-3-319-99040-8_4(43-56)Online publication date: 15-Aug-2018
  • (2016)Multithreaded Maximum Flow Based Optimal Replica Selection Algorithm for Heterogeneous Storage ArchitecturesIEEE Transactions on Computers10.1109/TC.2015.245162065:5(1543-1557)Online publication date: 1-May-2016
  • (2014)Continuous Retrieval of Replicated Data from Heterogeneous Storage ArraysProceedings of the 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2014.43(285-294)Online publication date: 9-Sep-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Cost-Effective Data Replica Placement Strategy Based on Hybrid Genetic Algorithm for Cloud ServicesResearch and Practical Issues of Enterprise Information Systems10.1007/978-3-319-99040-8_4(43-56)Online publication date: 15-Aug-2018
  • (2016)Multithreaded Maximum Flow Based Optimal Replica Selection Algorithm for Heterogeneous Storage ArchitecturesIEEE Transactions on Computers10.1109/TC.2015.245162065:5(1543-1557)Online publication date: 1-May-2016
  • (2014)Continuous Retrieval of Replicated Data from Heterogeneous Storage ArraysProceedings of the 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2014.43(285-294)Online publication date: 9-Sep-2014
  • (2014)SWORDThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-014-0362-123:6(845-870)Online publication date: 1-Dec-2014
  • (2013)Generalized Optimal Response Time Retrieval of Replicated Data from Storage ArraysACM Transactions on Storage10.1145/2491472.24914749:2(1-36)Online publication date: 1-Jul-2013
  • (2012)Equivalent Disk AllocationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2011.17723:3(538-546)Online publication date: 1-Mar-2012
  • (2012)Integrated Maximum Flow Algorithm for Optimal Response Time Retrieval of Replicated DataProceedings of the 2012 41st International Conference on Parallel Processing10.1109/ICPP.2012.34(11-20)Online publication date: 10-Sep-2012
  • (2012)Replication Based QoS Framework for Flash ArraysProceedings of the 2012 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2012.53(182-190)Online publication date: 24-Sep-2012
  • (2009)Divide-and-conquer scheme for strictly optimal retrieval of range queriesACM Transactions on Storage10.1145/1629075.16290775:3(1-32)Online publication date: 30-Nov-2009
  • (2007)Equivalent disk allocationsProceedings of the 2007 ACM symposium on Applied computing10.1145/1244002.1244118(500-505)Online publication date: 11-Mar-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media