Article

Lazy, adaptive rid-list intersection, and its application to index anding

Authors:
Vijayshankar Raman

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Lin Qiao

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Wei Han

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Inderpal Narang

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Ying-Lin Chen

IBM Silicon Valley Lab, San Jose, CA

IBM Silicon Valley Lab, San Jose, CA
View Profile

,
Kou-Horng Yang

IBM Silicon Valley Lab, San Jose, CA

IBM Silicon Valley Lab, San Jose, CA
View Profile

,
Fen-Ling Ling

IBM Silicon Valley Lab, San Jose, CA

IBM Silicon Valley Lab, San Jose, CA
View Profile

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataJune 2007Pages 773–784https://doi.org/10.1145/1247480.1247566

Published:11 June 2007Publication History

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 773–784

ABSTRACT

RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query process ordoes index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates.

This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RID lists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected.

We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong.

We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.

References

R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, 2000. Google ScholarDigital Library
A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, and C. Zhang. Cost-based optimization in db2 xml. IBM Systems Journal, 45(2), 2006. Google ScholarDigital Library
J. Barbay and C. Kenyon. Adaptive intersection and t-threshold problems. In SODA, 2002. Google ScholarDigital Library
J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. In Intl. Workshop on Experimental Algorithms, 2006. Google ScholarDigital Library
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal xml pattern matching. In SIGMOD, 2002. Google ScholarDigital Library
E. Demaine, A. López-Ortiz, and J. Munro. Adaptive set intersections, unions, and differences. In SODA, 2000. Google ScholarDigital Library
Y. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991.Google ScholarDigital Library
M. Kamath and K. Ramamritham. Bucket skip merge join: A scalable algorithm for join processing in very large databases using indexes. Technical Report UM-CS-1996-020, University of Massachusetts at Amherst, 1996. Google ScholarDigital Library
C. Mohan, Don Haderle, Yun Wang, and Josephine Cheng. Single table access using multiple indexes: Optimization, execution, and concurrency control techniques. In Proceedings of the International Conference on Extending Database Technology, 1990.Google ScholarCross Ref
Shivnath Babu, Rajeev Motwani, Kamesh Mungala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined streamfilters. In SIGMOD, 2004. Google ScholarDigital Library
P. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, 1997.Google ScholarDigital Library
Oracle White Paper. Query Optimization in Oracle Database 10g Release 2, June 2005.Google Scholar
SAP. SAP Business Information Warehouse Benchmark.Google Scholar
M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO: DB2's LEarning Optimizer. In VLDB, 2001. Google ScholarDigital Library
Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. C-Store: A Column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library

Index Terms

Lazy, adaptive rid-list intersection, and its application to index anding
1. Information systems
  1. Data management systems

Recommendations

An Algorithm for Computing the Union, Intersection and Difference of Two Polygons
ICCRD '10: Proceedings of the 2010 Second International Conference on Computer Research and Development

An new idea for setting operations on pairs of polygons algorithm is presented. The algorithm uses a boundary representation for the input and output polygons. Its domain includes polygons as well as polygons with holes within the area of the polygon. ...
Read More
Union and intersection contracts are hard, actually
DLS 2021: Proceedings of the 17th ACM SIGPLAN International Symposium on Dynamic Languages

Union and intersection types are a staple of gradually typed languages such as TypeScript. While it's long been recognized that union and intersection types are difficult to verify statically, it may appear at first that the dynamic part of gradual ...
Read More
Parallel intersection detection in massive sets of cubes
BigSpatial '17: Proceedings of the 6th ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data

We present ParCube, which finds the pairwise intersections in a set of millions of congruent cubes. This operation is required when computing boolean combinations of meshes or polyhedra in CAD/CAM and additive manufacturing, and in determining close ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AND tree
intersection
lazy
sorting
star join
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 59
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lazy, adaptive rid-list intersection, and its application to index anding

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Algorithm for Computing the Union, Intersection and Difference of Two Polygons

Union and intersection contracts are hard, actually

Parallel intersection detection in massive sets of cubes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lazy, adaptive rid-list intersection, and its application to index anding

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Algorithm for Computing the Union, Intersection and Difference of Two Polygons

Union and intersection contracts are hard, actually

Parallel intersection detection in massive sets of cubes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media