skip to main content
article

Progressive skyline computation in database systems

Published:01 March 2005Publication History
Skip Abstract Section

Abstract

The skyline of a d-dimensional dataset contains the points that are not dominated by any other point on all dimensions. Skyline computation has recently received considerable attention in the database community, especially for progressive methods that can quickly return the initial results without reading the entire database. All the existing algorithms, however, have some serious shortcomings which limit their applicability in practice. In this article we develop branch-and-bound skyline (BBS), an algorithm based on nearest-neighbor search, which is I/O optimal, that is, it performs a single access only to those nodes that may contain skyline points. BBS is simple to implement and supports all types of progressive processing (e.g., user preferences, arbitrary dimensionality, etc). Furthermore, we propose several interesting variations of skyline computation, and show how BBS can be applied for their efficient processing.

References

  1. Acharya, S., Poosala, V., and Ramaswamy, S. 1999. Selectivity estimation in spatial databases. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Philadelphia, PA, June 1--3). 13--24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Balke, W., Gunzer, U., and Zheng, J. 2004. Efficient distributed skylining for Web information systems. In Proceedings of the International Conference on Extending Database Technology (EDBT; Heraklio, Greece, Mar. 14--18). 256--273.]]Google ScholarGoogle Scholar
  3. Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Atlantic City, NJ, May 23--25). 322--331.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berchtold, S., Keim, D., and Kriegel, H. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the Very Large Data Bases Conference (VLDB; Mumbai, India, Sep. 3--6). 28--39.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Böhm, C. and Kriegel, H. 2001. Determining the convex hull in large multidimensional databases. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK; Munich, Germany, Sep. 5--7). 294--306.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Borzsonyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the IEEE International Conference on Data Engineering (ICDE; Heidelberg, Germany, Apr. 2--6). 421--430.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Buchta, C. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett., 33, 2, 63--65.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M., and Smith, J. 2000. The Onion technique: Indexing for linear optimization queries. In Proceedings of the ACM Conference on the Management of data (SIGMOD; Dallas, TX, May 16--18). 391--402.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. Skyline with pre-sorting. In Proceedings of the IEEE International Conference on Data Engineering (ICDE; Bangalore, India, Mar. 5--8). 717--719.]]Google ScholarGoogle Scholar
  10. Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS; Santa Barbara, CA, May 21--23). 102--113.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ferhatosmanoglu, H., Stanoi, I., Agrawal, D., and Abbadi, A. 2001. Constrained nearest neighbor queries. In Proceedings of the International Symposium on Spatial and Temporal Databases (SSTD; Redondo Beach, CA, July 12--15). 257--278.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Boston, MA, June 18--21). 47--57.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hellerstein, J., Anvur, R., Chou, A., Hidber, C., Olston, C., Raman, V., Roth, T., and Haas, P. 1999. Interactive data analysis: The control project. IEEE Comput. 32, 8, 51--59.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Henrich, A. 1994. A distance scan algorithm for spatial access structures. In Proceedings of the ACM Workshop on Geographic Information Systems (ACM GIS; Gaithersburg, MD, Dec.). 136--143.]]Google ScholarGoogle Scholar
  15. Hjaltason, G. and Samet, H. 1999. Distance browsing in spatial databases. ACM Trans. Database Syst. 24, 2, 265--318.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; May 21--24). 259--270.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kossmann, D., Ramsak, F., and Rost, S. 2002. Shooting stars in the sky: An online algorithm for skyline queries. In Proceedings of the Very Large Data Bases Conference (VLDB; Hong Kong, China, Aug. 20--23). 275--286.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kung, H., Luccio, F., and Preparata, F. 1975. On finding the maxima of a set of vectors. J. Assoc. Comput. Mach., 22, 4, 469--476.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matousek, J. 1991. Computing dominances in En. Inform. Process. Lett. 38, 5, 277--278.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mclain, D. 1974. Drawing contours from arbitrary data points. Comput. J. 17, 4, 318--324.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. Muralikrishna, M. and Dewitt, D. 1988. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; Chicago, IL, June 1--3). 28--36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Natsev, A., Chang, Y., Smith, J., Li., C., and Vitter. J. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the Very Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11--14). 281--290.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2003. An optimal and progressive algorithm for skyline queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; San Diego, CA, June 9--12). 443--454.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Papadias, D., Kalnis, P., Zhang, J., and Tao, Y. 2001. Efficient OLAP operations in spatial data warehouses. In Proceedings of International Symposium on Spatial and Temporal Databases (SSTD; Redondo Beach, CA, July 12--15). 443--459.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Preparata, F. and Shamos, M. 1985. Computational Geometry---An Introduction. Springer, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Roussopoulos, N., Kelly, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the ACM Conference on the Management of Data (SIGMOD; San Jose, CA, May 22--25). 71--79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sakurai, Y., Yoshikawa, M., Uemura, S., and Kojima, H. 2000. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the Very Large Data Bases Conference (VLDB; Cairo, Egypt, Sep. 10--14). 516--526.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Salzberg, B. and Tsotras, V. 1999. A comparison of access methods for temporal data. ACM Comput. Surv. 31, 2, 158--221.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sellis, T., Roussopoulos, N., and Faloutsos, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the Very Large Data Bases Conference (VLDB; Brighton, England, Sep. 1--4). 507--518.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Steuer, R. 1986. Multiple Criteria Optimization. Wiley, New York, NY.]]Google ScholarGoogle Scholar
  31. Tan, K., Eng, P., and Ooi, B. 2001. Efficient progressive skyline computation. In Proceedings of the Very Large Data Bases Conference (VLDB; Rome, Italy, Sep. 11--14). 301--310.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Theodoridis, Y., Stefanakis, E., and Sellis, T. 2000. Efficient cost models for spatial queries using R-trees. IEEE Trans. Knowl. Data Eng. 12, 1, 19--32.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Progressive skyline computation in database systems

      Recommendations

      Reviews

      Ned Chapin

      Searching to retrieve specific records from databases can be slowed by complicating factors. One source of complication addressed in this paper is the query selection criteria when multiple dimensions are involved, a complication made worse when the dimensions use nonkey data. A classic multiple dimension search often used in the literature, and used in this paper, is for a two-dimension retrieval, from a database about hotels, of the set of hotels that are closest to the beach and lowest in price: a so-called skyline retrieval. This paper starts by reporting, based on a survey of the literature, an evaluation and comparison of prominent algorithms for doing skyline searches. The six algorithms reported on are the divide-and-conquer, the block nested loop, the sort first skyline, the bit map, the index, and the nearest neighbor. The comparison emphasizes progressiveness, absence of false misses, absence of false hits, fairness, incorporation of retrieval references, and universality. Building on the nearest neighbor algorithm, the paper proposes an algorithm it claims is better than the six compared algorithms, one it terms the branch-and-bound algorithm. The paper explains the branch-and-bound algorithm, and offers an extensive formal proof of its correctness. The paper then briefly examines the effect of additions to and deletions from the searched databases. Following that, the paper identifies six "novel variations" on skyline search to document the superiority of the branch-and-bound algorithm: constrained skylines, ranked skylines, group-by skylines, dynamic skylines, enumerating and K-dominating queries, and skybands. Only the branch-and-bound is compatible with all six variations, and, on top of that, the paper introduces a new variation, approximate skybands, on which it also shines. The paper then provides an experimental evaluation of the efficiency of the branch-and-bound algorithm compared to the nearest neighbor algorithm (generally, the best of the six prominent algorithms for most skyline searches). For a non-novel search, the branch-and-bound algorithm makes a better showing on each of three rating factors: dimensionality, cardinality, and progressive behavior. On each of the six novel variations, the paper's branch-and-bound algorithm also winds up working from about as well as to much better than the nearest neighbor algorithm. As the paper points out, saving skylines relevant for frequently asked types of queries could offer important savings for active database users. To me, the branch-and-bound algorithm's claimed desirable properties suggest that data administrators should investigate the algorithm's applicability for their own databases. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader