skip to main content
10.1145/73721.73744acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article
Free Access

Clustered multiattribute hash files

Authors Info & Claims
Published:29 March 1989Publication History

ABSTRACT

Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79].

In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25.

Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88].

Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].

References

  1. Aho79.Aho, A. V. and Ullman. J.D. "Optimal Partial Match Retrieval when Fields are independently Specified", ACM TODS,vol 4, no 2,pp 168-179, June 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bol79.Bolour, A," Optimality Properties of Multikey Hashing Functions',JACM vo12, no 6, April 1979 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chri88.Christodoulakis, S. and Ford D., "Performance Analysis and Funadamental Performance Tradeoffs for CLV Optical Disks", SIGMOD 1988, pp.286-294 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fag79.Fagin, R.,Nievergelt, J, Pippenger, N. and Strong, H., "Extendible Hashing- A fast Ae~ese Method for Dynamic Files", ACM TODS ,vol 4, no 3,pp, 315-344,Sept 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fal86.Faloutsoe, C. "Multiattribute Hashing Using Graycodes" Proc SIGMOD 1986,pp. 227-238 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mor83.Moran, S."On the Complexity of Designing Optimal Partial Match Retrieval Systems", ACM TODS Vol 8, No 4, pp. 543-551, December 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nie84.Nievergelt, J.H.,Hinterberger. H. and Sevcik, K.C. "The Grid File: An Adaptable, Symmetric Multikey File Structure,"AM TODS, volg,no 1, pp 38-7!, March 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ore84.Orenstein, J. and Merret, T. "A Class of Data Structures for Associative Searehing~,Proe SIGMOD 1984, pp. 181-190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ram83.Ramamohanarao K, Loyd, J. ud "morn, J." Partial Match Retrieval Using Hashing and Descriptom', ACM TODS, vol 8, no 4, pp. 552-576, Dee 1983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rei77.Reingold,E.M., Nieverselt, J and Deo, N. Combinatorial Algorithm: Theory and Practice , Prentice Hall Inc, Englewood Cliffs, New Jersey, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Riv76.Rivest, R."Partial Match Retrieval Algorithms" ,SIAM J. of Computing , vol 5, no 1, pp. 19- 50, March 1976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rob81.Robinson, J.T.'The K-D-B tree, a Search Structure for Large Multidimensional Dynamic Indexes',Proc SIGMOD 1981, ACM, New York pp. 10-18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rot74.Rothnie, J.B. and Lozano, T. "Attribute Based File Organizations in a Paged Memory Environment",CACM, vol 17, no 2, pp 63-69, Feb 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sal88.Salzberg, Betty., File Structures, An Analytic Approach. Prentice Hall, New Jersey, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sam84.Sammet, H. "The Quad Tree and Related Hierarchical Data Structures",ACM Computing Surveys(1984),vol 16, no 2, pp.187-260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sch82.Scheuermann, P. and Ouksel, M, "Multidimensional B-trees for associative searching in database Systems" ,Inf. Systems 1982 vol 7 no 2 pp. 123-137.Google ScholarGoogle Scholar

Index Terms

  1. Clustered multiattribute hash files

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
                  March 1989
                  401 pages
                  ISBN:0897913086
                  DOI:10.1145/73721

                  Copyright © 1989 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 29 March 1989

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate642of2,707submissions,24%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader