Abstract
The skyline of a d-dimensional dataset contains the points that are not dominated by any other point on all dimensions. Skyline computation has recently received considerable attention in the database community, especially for progressive methods that can quickly return the initial results without reading the entire database. All the existing algorithms, however, have some serious shortcomings which limit their applicability in practice. In this article we develop branch-and-bound skyline (BBS), an algorithm based on nearest-neighbor search, which is I/O optimal, that is, it performs a single access only to those nodes that may contain skyline points. BBS is simple to implement and supports all types of progressive processing (e.g., user preferences, arbitrary dimensionality, etc). Furthermore, we propose several interesting variations of skyline computation, and show how BBS can be applied for their efficient processing.
- Acharya, S., Poosala, V., and Ramaswamy, S. 1999. Selectivity estimation in spatial databases. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Philadelphia, PA, June 1--3). 13--24.]] Google ScholarDigital Library
- Balke, W., Gunzer, U., and Zheng, J. 2004. Efficient distributed skylining for Web information systems. In Proceedings of the International Conference on Extending Database Technology (EDBT; Heraklio, Greece, Mar. 14--18). 256--273.]]Google Scholar
- Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Atlantic City, NJ, May 23--25). 322--331.]] Google ScholarDigital Library
- Berchtold, S., Keim, D., and Kriegel, H. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the Very Large Data Bases Conference (VLDB; Mumbai, India, Sep. 3--6). 28--39.]] Google ScholarDigital Library
- Böhm, C. and Kriegel, H. 2001. Determining the convex hull in large multidimensional databases. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK; Munich, Germany, Sep. 5--7). 294--306.]] Google ScholarDigital Library
- Borzsonyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the IEEE International Conference on Data Engineering (ICDE; Heidelberg, Germany, Apr. 2--6). 421--430.]] Google ScholarDigital Library
- Buchta, C. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett., 33, 2, 63--65.]] Google ScholarDigital Library
- Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M., and Smith, J. 2000. The Onion technique: Indexing for linear optimization queries. In Proceedings of the ACM Conference on the Management of data (SIGMOD; Dallas, TX, May 16--18). 391--402.]] Google ScholarDigital Library
- Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. Skyline with pre-sorting. In Proceedings of the IEEE International Conference on Data Engineering (ICDE; Bangalore, India, Mar. 5--8). 717--719.]]Google Scholar
- Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS; Santa Barbara, CA, May 21--23). 102--113.]] Google ScholarDigital Library
- Ferhatosmanoglu, H., Stanoi, I., Agrawal, D., and Abbadi, A. 2001. Constrained nearest neighbor queries. In Proceedings of the International Symposium on Spatial and Temporal Databases (SSTD; Redondo Beach, CA, July 12--15). 257--278.]] Google ScholarDigital Library
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Boston, MA, June 18--21). 47--57.]] Google ScholarDigital Library
- Hellerstein, J., Anvur, R., Chou, A., Hidber, C., Olston, C., Raman, V., Roth, T., and Haas, P. 1999. Interactive data analysis: The control project. IEEE Comput. 32, 8, 51--59.]] Google ScholarDigital Library
- Henrich, A. 1994. A distance scan algorithm for spatial access structures. In Proceedings of the ACM Workshop on Geographic Information Systems (ACM GIS; Gaithersburg, MD, Dec.). 136--143.]]Google Scholar
- Hjaltason, G. and Samet, H. 1999. Distance browsing in spatial databases. ACM Trans. Database Syst. 24, 2, 265--318.]] Google ScholarDigital Library
- Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; May 21--24). 259--270.]] Google ScholarDigital Library
- Kossmann, D., Ramsak, F., and Rost, S. 2002. Shooting stars in the sky: An online algorithm for skyline queries. In Proceedings of the Very Large Data Bases Conference (VLDB; Hong Kong, China, Aug. 20--23). 275--286.]] Google ScholarDigital Library
- Kung, H., Luccio, F., and Preparata, F. 1975. On finding the maxima of a set of vectors. J. Assoc. Comput. Mach., 22, 4, 469--476.]] Google ScholarDigital Library
- Matousek, J. 1991. Computing dominances in En. Inform. Process. Lett. 38, 5, 277--278.]] Google ScholarDigital Library
- Mclain, D. 1974. Drawing contours from arbitrary data points. Comput. J. 17, 4, 318--324.]]Google ScholarCross Ref
- Muralikrishna, M. and Dewitt, D. 1988. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Chicago, IL, June 1--3). 28--36.]] Google ScholarDigital Library
- Natsev, A., Chang, Y., Smith, J., Li., C., and Vitter. J. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the Very Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11--14). 281--290.]] Google ScholarDigital Library
- Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2003. An optimal and progressive algorithm for skyline queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; San Diego, CA, June 9--12). 443--454.]] Google ScholarDigital Library
- Papadias, D., Kalnis, P., Zhang, J., and Tao, Y. 2001. Efficient OLAP operations in spatial data warehouses. In Proceedings of International Symposium on Spatial and Temporal Databases (SSTD; Redondo Beach, CA, July 12--15). 443--459.]] Google ScholarDigital Library
- Preparata, F. and Shamos, M. 1985. Computational Geometry---An Introduction. Springer, Berlin, Germany.]] Google ScholarDigital Library
- Roussopoulos, N., Kelly, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; San Jose, CA, May 22--25). 71--79.]] Google ScholarDigital Library
- Sakurai, Y., Yoshikawa, M., Uemura, S., and Kojima, H. 2000. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the Very Large Data Bases Conference (VLDB; Cairo, Egypt, Sep. 10--14). 516--526.]] Google ScholarDigital Library
- Salzberg, B. and Tsotras, V. 1999. A comparison of access methods for temporal data. ACM Comput. Surv. 31, 2, 158--221.]] Google ScholarDigital Library
- Sellis, T., Roussopoulos, N., and Faloutsos, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the Very Large Data Bases Conference (VLDB; Brighton, England, Sep. 1--4). 507--518.]] Google ScholarDigital Library
- Steuer, R. 1986. Multiple Criteria Optimization. Wiley, New York, NY.]]Google Scholar
- Tan, K., Eng, P., and Ooi, B. 2001. Efficient progressive skyline computation. In Proceedings of the Very Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11--14). 301--310.]] Google ScholarDigital Library
- Theodoridis, Y., Stefanakis, E., and Sellis, T. 2000. Efficient cost models for spatial queries using R-trees. IEEE Trans. Knowl. Data Eng. 12, 1, 19--32.]] Google ScholarDigital Library
Index Terms
- Progressive skyline computation in database systems
Recommendations
U-Skyline: A New Skyline Query for Uncertain Databases
The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multicriteria data analysis and decision making. For uncertain databases, a probabilistic skyline query, called P-...
Towards multidimensional subspace skyline analysis
The skyline operator is important for multicriteria decision-making applications. Although many recent studies developed efficient methods to compute skyline objects in a given space, none of them considers skylines in multiple subspaces simultaneously. ...
Efficient Skyline Computation of Multiple Range Skyline Queries
iiWAS2021: The 23rd International Conference on Information Integration and Web IntelligenceSkyline query which returns a set of skyline objects by filtering those objects that are dominated by others from a potentially large multidimensional data set has attracted significant research attention especially in the database community. Many ...
Comments