skip to main content
10.1145/3132847.3133012acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Efficient Computation of Subspace Skyline over Categorical Domains

Published: 06 November 2017 Publication History

Abstract

Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out while enabling numerous applications. There are only a few algorithms designed to compute the skyline over categorical attributes, yet are applicable only when the number of attributes is small. In this paper, we place the problem of skyline discovery over categorical attributes into perspective and design efficient algorithms for two cases. (i) In the absence of indices, we propose two algorithms, ST-S and ST-P, that exploit the categorical characteristics of the datasets, organizing tuples in a tree data structure, supporting efficient dominance tests over the candidate set. (ii) We then consider the existence of widely used precomputed sorted lists. After discussing several approaches, and studying their limitations, we propose TA-SKY, a novel threshold style algorithm that utilizes sorted lists. Moreover, we further optimize TA-SKY and explore its progressive nature, making it suitable for applications with strict interactive requirements. In addition to the extensive theoretical analysis of the proposed algorithms, we conduct a comprehensive experimental evaluation of the combination of real (including the entire AirBnB data collection) and synthetic datasets to study the practicality of the proposed algorithms. The results showcase the superior performance of our techniques, outperforming applicable approaches by orders of magnitude.

References

[1]
Abolfazl Asudeh, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das. 2016. Discovering the skyline of web databases. VLDB (2016).
[2]
Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, and Gergely V Zaruba. 2015. Crowdsourcing pareto-optimal object finding by pairwise comparisons CIKM.
[3]
Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. 2008. Efficient sort-based skyline evaluation. TODS, Vol. 33, 4 (2008).
[4]
Stephan Borzsony, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator Data Engineering.
[5]
Jan Chomicki, Parke Godfrey, Jarek Gryz, and Dongming Liang. 2005. Skyline with presorting: Theory and optimizations. IIPWM.
[6]
Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Dimitris Tsirogiannis. 2006. Answering top-k queries using views. In VLDB.
[7]
Ronald Fagin. 1996. Combining fuzzy information from multiple systems. ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems.
[8]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. Journal of computer and system sciences (2003).
[9]
Parke Godfrey, Ryan Shipley, and Jarek Gryz. 2005. Maximal vector computation in large data sets. In VLDB.
[10]
Ashish Gupta, Venky Harinarayan, and Dallan Quass. 1995. Aggregate-query processing in data warehousing environments. (1995).
[11]
Alon Y Halevy. 2001. Answering queries using views: A survey. VLDBJ (2001).
[12]
Ching-Lai Hwang and Kwangsun Yoon. 2012. Multiple attribute decision making: methods and applications a state-of-the-art survey. Vol. Vol. 186. SSBM.
[13]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. CSUR (2008).
[14]
Donald Kossmann, Frank Ramsak, and Steffen Rost. 2002. Shooting stars in the sky: An online algorithm for skyline queries VLDB.
[15]
Jongwuk Lee and Seung-Won Hwang. 2014 a. Scalable skyline computation using a balanced pivot selection technique. Information Systems Vol. 39 (2014).
[16]
Jongwuk Lee and Seung-won Hwang. 2014 b. Toward efficient multidimensional subspace skyline computation. The VLDB Journal, Vol. 23, 1 (2014), 129--145.
[17]
Ken CK Lee, Baihua Zheng, Huajing Li, and Wang-Chien Lee. 2007. Approaching the skyline in Z order. In VLDB.
[18]
Sofian Maabout, Carlos Ordonez, Patrick Kamnang Wanko, and Nicolas Hanusse. 2016. Skycube Materialization Using the Topmost Skyline or Functional Dependencies. TODS, Vol. 41, 4 (2016).
[19]
Michael Morse, Jignesh M Patel, and Hosagrahar V Jagadish. 2007. Efficient skyline computation over low-cardinality domains VLDB.
[20]
Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2003. An optimal and progressive algorithm for skyline queries SIGMOD.
[21]
Jian Pei, Wen Jin, Martin Ester, and Yufei Tao. 2005. Catching the best views of skyline: A semantic approach based on decisive subspaces VLDB.
[22]
Timotheus Preisinger and Werner Kießling. 2007. The hexagon algorithm for pareto preference queries M-PREF.
[23]
Md Farhadur Rahman, Abolfazl Asudeh, Nick Koudas, and Gautam Das. 2017. Efficient Computation of Subspace Skyline over Categorical Domains. arXiv preprint arXiv:1703.00080 (2017).
[24]
Yufei Tao, Xiaokui Xiao, and Jian Pei. 2006. Subsky: Efficient computation of skylines in subspaces ICDE.
[25]
Tian Xia and Donghui Zhang. 2006. Refreshing the sky: the compressed skycube with efficient support for frequent updates SIGMOD.
[26]
Tian Xia, Donghui Zhang, Zheng Fang, Cindy Chen, and Jie Wang. 2012. Online subspace skyline query processing using the compressed skycube. TODS (2012).
[27]
Yidong Yuan, Xuemin Lin, Qing Liu, Wei Wang, Jeffrey Xu Yu, and Qing Zhang. 2005. Efficient computation of the skyline cube. In VLDB.
[28]
Shiming Zhang, Nikos Mamoulis, and David W Cheung. 2009. Scalable skyline computation using object-based space partitioning SIGMOD.

Cited By

View all
  • (2022)On Finding Rank Regret RepresentativesACM Transactions on Database Systems10.1145/353105447:3(1-37)Online publication date: 18-Aug-2022
  • (2020)Efficient column-oriented processing for mutual subspace skyline queriesSoft Computing10.1007/s00500-020-04875-yOnline publication date: 21-Mar-2020
  • (2020)ARTful Skyline Computation for In-Memory Database SystemsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_1(3-12)Online publication date: 17-Aug-2020
  • Show More Cited By

Index Terms

  1. Efficient Computation of Subspace Skyline over Categorical Domains

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
    November 2017
    2604 pages
    ISBN:9781450349185
    DOI:10.1145/3132847
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. categorical domains
    2. sorted list
    3. subspace skyline computation
    4. tree

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '17
    Sponsor:

    Acceptance Rates

    CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)On Finding Rank Regret RepresentativesACM Transactions on Database Systems10.1145/353105447:3(1-37)Online publication date: 18-Aug-2022
    • (2020)Efficient column-oriented processing for mutual subspace skyline queriesSoft Computing10.1007/s00500-020-04875-yOnline publication date: 21-Mar-2020
    • (2020)ARTful Skyline Computation for In-Memory Database SystemsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_1(3-12)Online publication date: 17-Aug-2020
    • (2019)A unified optimization algorithm for solving "regret-minimizing representative" problemsProceedings of the VLDB Endowment10.14778/3368289.336829113:3(239-251)Online publication date: 1-Nov-2019
    • (2019)RRRProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300080(263-280)Online publication date: 25-Jun-2019
    • (2019)Designing Fair Ranking SchemesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300079(1259-1276)Online publication date: 25-Jun-2019
    • (2018)On obtaining stable rankingsProceedings of the VLDB Endowment10.14778/3291264.329126912:3(237-250)Online publication date: 1-Nov-2018
    • (2017)Cost Effective Accommodation Planning in a Trip by Using Accommodation Advisor Query (AA-Query) in STPF2017 International Conference on Networking and Network Applications (NaNA)10.1109/NaNA.2017.60(330-336)Online publication date: Oct-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media