ABSTRACT
Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79].
In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25.
Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88].
Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].
- Aho79.Aho, A. V. and Ullman. J.D. "Optimal Partial Match Retrieval when Fields are independently Specified", ACM TODS,vol 4, no 2,pp 168-179, June 1979. Google ScholarDigital Library
- Bol79.Bolour, A," Optimality Properties of Multikey Hashing Functions',JACM vo12, no 6, April 1979 Google ScholarDigital Library
- Chri88.Christodoulakis, S. and Ford D., "Performance Analysis and Funadamental Performance Tradeoffs for CLV Optical Disks", SIGMOD 1988, pp.286-294 Google ScholarDigital Library
- Fag79.Fagin, R.,Nievergelt, J, Pippenger, N. and Strong, H., "Extendible Hashing- A fast Ae~ese Method for Dynamic Files", ACM TODS ,vol 4, no 3,pp, 315-344,Sept 1979. Google ScholarDigital Library
- Fal86.Faloutsoe, C. "Multiattribute Hashing Using Graycodes" Proc SIGMOD 1986,pp. 227-238 Google ScholarDigital Library
- Mor83.Moran, S."On the Complexity of Designing Optimal Partial Match Retrieval Systems", ACM TODS Vol 8, No 4, pp. 543-551, December 1983. Google ScholarDigital Library
- Nie84.Nievergelt, J.H.,Hinterberger. H. and Sevcik, K.C. "The Grid File: An Adaptable, Symmetric Multikey File Structure,"AM TODS, volg,no 1, pp 38-7!, March 1984. Google ScholarDigital Library
- Ore84.Orenstein, J. and Merret, T. "A Class of Data Structures for Associative Searehing~,Proe SIGMOD 1984, pp. 181-190. Google ScholarDigital Library
- Ram83.Ramamohanarao K, Loyd, J. ud "morn, J." Partial Match Retrieval Using Hashing and Descriptom', ACM TODS, vol 8, no 4, pp. 552-576, Dee 1983 Google ScholarDigital Library
- Rei77.Reingold,E.M., Nieverselt, J and Deo, N. Combinatorial Algorithm: Theory and Practice , Prentice Hall Inc, Englewood Cliffs, New Jersey, 1977. Google ScholarDigital Library
- Riv76.Rivest, R."Partial Match Retrieval Algorithms" ,SIAM J. of Computing , vol 5, no 1, pp. 19- 50, March 1976.Google ScholarDigital Library
- Rob81.Robinson, J.T.'The K-D-B tree, a Search Structure for Large Multidimensional Dynamic Indexes',Proc SIGMOD 1981, ACM, New York pp. 10-18. Google ScholarDigital Library
- Rot74.Rothnie, J.B. and Lozano, T. "Attribute Based File Organizations in a Paged Memory Environment",CACM, vol 17, no 2, pp 63-69, Feb 1974. Google ScholarDigital Library
- Sal88.Salzberg, Betty., File Structures, An Analytic Approach. Prentice Hall, New Jersey, 1988. Google ScholarDigital Library
- Sam84.Sammet, H. "The Quad Tree and Related Hierarchical Data Structures",ACM Computing Surveys(1984),vol 16, no 2, pp.187-260. Google ScholarDigital Library
- Sch82.Scheuermann, P. and Ouksel, M, "Multidimensional B-trees for associative searching in database Systems" ,Inf. Systems 1982 vol 7 no 2 pp. 123-137.Google Scholar
Index Terms
- Clustered multiattribute hash files
Recommendations
Random sampling from hash files
SIGMOD '90: Proceedings of the 1990 ACM SIGMOD international conference on Management of dataIn this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and ...
Random sampling from hash files
In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and ...
Generation and search of clustered files
A classified, or clustered file is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Clustered files are easily adapted to broad and narrow search ...
Comments