ABSTRACT
Each tuple in a valid-time relation includes an interval attribute T that represents the tuple's valid time. The overlap join between two valid-time relations determines all pairs of tuples with overlapping intervals. Although overlap joins are common, existing partitioning and indexing schemes are inefficient if the data includes long-lived tuples or if intervals intersect partition boundaries.
We propose Overlap Interval Partitioning (OIP), a new partitioning approach for data with an interval. OIP divides the time range of a relation into k base granules and defines overlapping partitions for sequences of contiguous granules. OIP is the first partitioning method for interval data that gives a constant clustering guarantee: the difference in duration between the interval of a tuple and the interval of its partition is independent of the duration of the tuple's interval. We offer a detailed analysis of the average false hit ratio and the average number of partition accesses for queries with overlap predicates, and we prove that the average false hit ratio is independent of the number of short- and long-lived tuples. To compute the overlap join, we propose the Overlap Interval Partition Join (OIPJoin), which uses OIP to partition the input relations on-the-fly. Only the tuples from overlapping partitions have to be joined to compute the result. We analytically derive the optimal number of granules, k, for partitioning the two input relations, from the size of the data, the cost of CPU operations, and the cost of main memory or disk IOs. Our experiments confirm the analytical results and show that the OIPJoin outperforms state-of-the-art techniques for the overlap join.
- The webkit open source project. http://www.webkit.org, 2012.Google Scholar
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990. Google ScholarDigital Library
- N. Beckmann and B. Seeger. A revised R*-tree in comparison with related index structures. In SIGMOD, pages 799--812, 2009. Google ScholarDigital Library
- J. L. Bentley. Solutions to klee's rectangle problems. Technical report, Carnegie Mellon University, 1977.Google Scholar
- M. Berg, M. Kreveld, M. Overmars, and O. Schwarzkopf. More geometric data structures. In Computational Geometry, pages 211--233. Springer Berlin Heidelberg, 2000.Google ScholarCross Ref
- A. E. Brouwer, A. M. Cohen, and A. Neumaier. Distance Regular Graphs. Springer-Verlag, 1989.Google ScholarCross Ref
- A. Dignös, M. Böhlen, and J. Gamper. Query time scaling of attribute values in interval timestamped databases. In ICDE, pages 1304--1307, 2013. Google ScholarDigital Library
- A. Dignös, M. H. Böhlen, and J. Gamper. Temporal alignment. In SIGMOD, pages 433--444, 2012. Google ScholarDigital Library
- J. Enderle, M. Hampel, and T. Seidl. Joining interval data in relational databases. In SIGMOD, pages 683--694, 2004. Google ScholarDigital Library
- R. A. Finkel and J. L. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1--9, 1974.Google ScholarDigital Library
- D. Gao, C. S. Jensen, R. T. Snodgrass, and M. D. Soo. Join operations in temporal databases. VLDB J., 14(1):2--29, 2005. Google ScholarDigital Library
- J. A. G. Gendrano, R. Shah, R. T. Snodgrass, and J. Yang. University information system (uis) dataset. TimeCenter CD-1, 1998.Google Scholar
- N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD, pages 324--335, 1997. Google ScholarDigital Library
- H.-P. Kriegel, M. Pötke, and T. Seidl. Managing intervals efficiently in object-relational databases. In VLDB, pages 407--418, 2000. Google ScholarDigital Library
- M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In SIGMOD, pages 247--258, 1996. Google ScholarDigital Library
- H. Lu, B. C. Ooi, and K.-L. Tan. On spatially partitioned temporal join. In VLDB, pages 546--557, 1994. Google ScholarDigital Library
- S. Nobari, F. Tauheed, T. Heinis, P. Karras, S. Bressan, and A. Ailamaki. Touch: In-memory spatial join by hierarchical data-oriented partitioning. In SIGMOD, pages 701--712, 2013. Google ScholarDigital Library
- H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google ScholarDigital Library
- H. Samet, J. Sankaranarayanan, and M. Auerbach. Indexing methods for moving object databases: games and other applications. In SIGMOD, pages 169--180, 2013. Google ScholarDigital Library
- A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley Publishing, 9th edition, 2012. Google ScholarDigital Library
- M. D. Soo, R. T. Snodgrass, and C. S. Jensen. Efficient evaluation of the valid-time natural join. In ICDE, pages 282--292, 1994. Google ScholarDigital Library
- V. J. Tsotras and N. Kangerlaris. The snapshot index: An i/o-optimal access method for timeslice queries. Inf. Syst., 20(3):237--260, 1995. Google ScholarDigital Library
- T. Ulrich. Loose octrees. In Game Programming Gems, pages 444--453. Charles River Media, 2000.Google Scholar
Index Terms
- Overlap interval partition join
Recommendations
To Partition, or Not to Partition, That is the Join Question in a Real System
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataAn efficient implementation of a hash join has been a highly researched problem for decades. Recently, the radix join has been shown to have superior performance over the alternatives (e.g., the non-partitioned hash join), albeit on synthetic ...
Leveraging range joins for the computation of overlap joins
AbstractJoins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the ...
Optimal Interval Partitioning for Temporal Databases
BIWIT '97: Proceedings of the 3rd Basque International Workshop on Information Technology (BIWIT '97)We analyze the problem of partitioning a collection of intervals into two or more subcollections, each of them holding intervals that intersect with a certain range of the intervals' domain. Intervals that intersect with more than one range impose a ...
Comments