Article

Free Access

Clustered multiattribute hash files

Author:
D. Rotem

Lawrence Berkeley Laboratory, Univenity of California Berkeley, Berkeley, Ca

Lawrence Berkeley Laboratory, Univenity of California Berkeley, Berkeley, Ca
View Profile

PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systemsMarch 1989Pages 225–234https://doi.org/10.1145/73721.73744

Published:29 March 1989Publication History

PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems

Pages 225–234

ABSTRACT

Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79].

In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25.

Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88].

Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].

References

Aho79.Aho, A. V. and Ullman. J.D. "Optimal Partial Match Retrieval when Fields are independently Specified", ACM TODS,vol 4, no 2,pp 168-179, June 1979. Google ScholarDigital Library
Bol79.Bolour, A," Optimality Properties of Multikey Hashing Functions',JACM vo12, no 6, April 1979 Google ScholarDigital Library
Chri88.Christodoulakis, S. and Ford D., "Performance Analysis and Funadamental Performance Tradeoffs for CLV Optical Disks", SIGMOD 1988, pp.286-294 Google ScholarDigital Library
Fag79.Fagin, R.,Nievergelt, J, Pippenger, N. and Strong, H., "Extendible Hashing- A fast Ae~ese Method for Dynamic Files", ACM TODS ,vol 4, no 3,pp, 315-344,Sept 1979. Google ScholarDigital Library
Fal86.Faloutsoe, C. "Multiattribute Hashing Using Graycodes" Proc SIGMOD 1986,pp. 227-238 Google ScholarDigital Library
Mor83.Moran, S."On the Complexity of Designing Optimal Partial Match Retrieval Systems", ACM TODS Vol 8, No 4, pp. 543-551, December 1983. Google ScholarDigital Library
Nie84.Nievergelt, J.H.,Hinterberger. H. and Sevcik, K.C. "The Grid File: An Adaptable, Symmetric Multikey File Structure,"AM TODS, volg,no 1, pp 38-7!, March 1984. Google ScholarDigital Library
Ore84.Orenstein, J. and Merret, T. "A Class of Data Structures for Associative Searehing~,Proe SIGMOD 1984, pp. 181-190. Google ScholarDigital Library
Ram83.Ramamohanarao K, Loyd, J. ud "morn, J." Partial Match Retrieval Using Hashing and Descriptom', ACM TODS, vol 8, no 4, pp. 552-576, Dee 1983 Google ScholarDigital Library
Rei77.Reingold,E.M., Nieverselt, J and Deo, N. Combinatorial Algorithm: Theory and Practice , Prentice Hall Inc, Englewood Cliffs, New Jersey, 1977. Google ScholarDigital Library
Riv76.Rivest, R."Partial Match Retrieval Algorithms" ,SIAM J. of Computing , vol 5, no 1, pp. 19- 50, March 1976.Google ScholarDigital Library
Rob81.Robinson, J.T.'The K-D-B tree, a Search Structure for Large Multidimensional Dynamic Indexes',Proc SIGMOD 1981, ACM, New York pp. 10-18. Google ScholarDigital Library
Rot74.Rothnie, J.B. and Lozano, T. "Attribute Based File Organizations in a Paged Memory Environment",CACM, vol 17, no 2, pp 63-69, Feb 1974. Google ScholarDigital Library
Sal88.Salzberg, Betty., File Structures, An Analytic Approach. Prentice Hall, New Jersey, 1988. Google ScholarDigital Library
Sam84.Sammet, H. "The Quad Tree and Related Hierarchical Data Structures",ACM Computing Surveys(1984),vol 16, no 2, pp.187-260. Google ScholarDigital Library
Sch82.Scheuermann, P. and Ouksel, M, "Multidimensional B-trees for associative searching in database Systems" ,Inf. Systems 1982 vol 7 no 2 pp. 123-137.Google Scholar

Index Terms

Recommendations

Random sampling from hash files
SIGMOD '90: Proceedings of the 1990 ACM SIGMOD international conference on Management of data

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and ...
Read More
Random sampling from hash files

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and ...
Read More
Generation and search of clustered files

A classified, or clustered file is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Clustered files are easily adapted to broad and narrow search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
March 1989
401 pages
ISBN:0897913086
DOI:10.1145/73721
Chairman:
Avi Silberschatz
Univ. of Texas, Austin
Copyright © 1989 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 1989
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate642of2,707submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 325
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clustered multiattribute hash files

PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Random sampling from hash files

Random sampling from hash files

Generation and search of clustered files

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Clustered multiattribute hash files

PODS '89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Random sampling from hash files

Random sampling from hash files

Generation and search of clustered files

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media