skip to main content
research-article

Algorithms and metrics for processing multiple heterogeneous continuous queries

Published: 21 March 2008 Publication History

Abstract

The emergence of monitoring applications has precipitated the need for Data Stream Management Systems (DSMSs), which constantly monitor incoming data feeds (through registered continuous queries), in order to detect events of interest. In this article, we examine the problem of how to schedule multiple Continuous Queries (CQs) in a DSMS to optimize different Quality of Service (QoS) metrics. We show that, unlike traditional online systems, scheduling policies in DSMSs that optimize for average response time will be different from policies that optimize for average slowdown, which is a more appropriate metric to use in the presence of a heterogeneous workload. Towards this, we propose policies to optimize for the average-case performance for both metrics. Additionally, we propose a hybrid scheduling policy that strikes a fine balance between performance and fairness, by looking at both the average- and worst-case performance, for both metrics. We also show how our policies can be adaptive enough to handle the inherent dynamic nature of monitoring applications. Furthermore, we discuss how our policies can be efficiently implemented and extended to exploit sharing in optimized multi-query plans and multi-stream CQs. Finally, we experimentally show using real data that our policies consistently outperform currently used ones.

References

[1]
Acharya, S. and Muthukrishnan, S. 1998. Scheduling on-demand broadcasts: New metrics and algorithms. In Proceedings of the ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom).
[2]
Aksoy, D. and Franklin, M. 1999. RxW: A scheduling approach for large-scale on-demand data broadcast. IEEE/ACM Trans. Network. 7, 6, 846--860.
[3]
Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[4]
Babcock, B., Babu, S., Datar, M., and Motwani, R. 2003. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[5]
Babcock, B., Babu, S., Datar, M., Motwani, R., and Thomas, D. 2004. Operator scheduling in data stream systems. Int. J. VLDB 13, 4.
[6]
Bansal, N. and Pruhs, K. 2003. Server scheduling in the lp norm: A rising tide lifts all boats. In Proceedings of the ACM Symposium on Theory of Computing (STOC).
[7]
Bender, M. A., Chakrabarti, S., and Muthukrishnan, S. 1998. Flow and stretch metrics for scheduling continuous job streams. In Proceedings of the ACM Symposium on Discrete Algorithms (SODA).
[8]
Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams: A new class of data management applications. In Proceedings of the Very Large Data Bases Conference (VLDB).
[9]
Carney, D., Cetintemel, U., Rasin, A., Zdonik, S., Cherniack, M., and Stonebraker, M. 2003. Operator scheduling in a data stream manager. In Proceedings of the Very Large Data Bases Conference (VLDB).
[10]
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., and Shah, M. A. 2003. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR).
[11]
Chen, J., DeWitt, D. J., and Naughton, J. F. 2002. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In Proceedings of the International Conference on Data Engineering (ICDE).
[12]
Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the ACM SIGMOD International Conference.
[13]
Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS).
[14]
Golab, L. and Ozsu, M. T. 2003. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the Very Large Data Bases Conference (VLDB).
[15]
Hammad, M., Franklin, M., Aref, W., and Elmagarmid, A. K. 2003. Scheduling for shared window joins over data streams. In Proceedings of the Very Large Data Bases Conference (VLDB).
[16]
Hammad, M. A., Mokbel, M. F., Ali, M. H., Aref, W. G., Catlin, A. C., Elmagarmid, A. K., Eltabakh, M., Elfeky, M. G., Ghanem, T. M., Gwadera, R., Ilyas, I. F., Marzouk, M., and Xiong, X. 2004. Nile: A query processing engine for data streams. In Proceedings of the International Conference on Data Engineering (ICDE).
[17]
Jacobson, V. 1988. Congestion avoidance and control. In Proceedings of the ACM SIGCOMM International Conference on Communications Architectures and Protocols.
[18]
Kang, J., Naughton, J. F., and Viglas, S. D. 2003. Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE).
[19]
Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record 34, 1, 39--44.
[20]
Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[21]
Mehta, M. and DeWitt, D. J. 1993. Dynamic memory allocation for multiple-query workloads. In Proceedings of the Very Large Data Bases Conference (VLDB).
[22]
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, resource management, and approximation in a data stream management system. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR).
[23]
Muthukrishnan, S., Rajaraman, R., Shaheen, A., and Gehrke, J. E. 1999. Online scheduling to minimize average stretch. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS).
[24]
Sellis, T. K. 1988. Multiple-query optimization. ACM Trans. Datab. Syst. 13, 1.
[25]
Sharaf, M. A. 2007. Metrics and algorithms for processing multiple continuous queries. Ph.D. thesis, University of Pittsburgh.
[26]
Sharaf, M. A., Chrysanthis, P. K., and Labrinidis, A. 2005. Preemptive rate-based operator scheduling in a data stream management system. In Proceedings of the 3rd ACS/IEEE International Conference on Computer Systems and Applications (AICCSA'05). Cairo, Egypt.
[27]
Sharaf, M. A., Chrysanthis, P. K., Labrinidis, A., and Pruhs, K. 2006. Efficient scheduling of heterogeneous continuous queries. In Proceedings of the Very Large Data Bases Conference (VLDB).
[28]
Sharaf, M. A., Labrinidis, A., Chrysanthis, P. K., and Pruhs, K. 2005. Freshness-aware scheduling of continuous queries in the dynamic web. In Proceedings of the International Workshop on Web and Databases (WebDB).
[29]
Sutherland, T., Pielech, B., Zhu, Y., Ding, L., and Rundensteiner, E. A. 2005. An adaptive multi-objective scheduling selection framework for continuous query processing. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS).
[30]
Tian, F. and DeWitt, D. J. 2003. Tuple routing strategies for distributed eddies. In Proceedings of the Very Large Data Bases Conference (VLDB).
[31]
Urhan, T. and Franklin, M. J. 2001. Dynamic pipeline scheduling for improving interactive query performance. In Proceedings of the Very Large Data Bases Conference (VLDB).
[32]
Viglas, S. D. and Naughton, J. F. 2002. Rate-based query optimization for streaming information sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[33]
Waldspurger, C. A. and Weihl, W. E. 1994. Lottery scheduling: Flexible proportional-share resource management. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[34]
Wilschut, A. and Apers, P. 1991. Dataflow query execution in a parallel main-memory environment. In Proceedings of the International Conference on Parallel and Distributed Information Systems (PDIS).

Cited By

View all
  • (2024)Managing Metaverse Data Tsunami: Actionable InsightsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335496036:12(7423-7441)Online publication date: 1-Dec-2024
  • (2023)Accelerating Stream Processing Queries with Congestion-aware Scheduling and Real-time Linux ThreadsProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592202(144-153)Online publication date: 9-May-2023
  • (2023)The Metaverse Data Deluge: What Can We Do About It?2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00296(3675-3687)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 33, Issue 1
March 2008
211 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1331904
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2008
Accepted: 01 September 2007
Revised: 01 July 2007
Received: 01 January 2007
Published in TODS Volume 33, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data stream management system
  2. continuous queries
  3. operator scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Managing Metaverse Data Tsunami: Actionable InsightsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335496036:12(7423-7441)Online publication date: 1-Dec-2024
  • (2023)Accelerating Stream Processing Queries with Congestion-aware Scheduling and Real-time Linux ThreadsProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592202(144-153)Online publication date: 9-May-2023
  • (2023)The Metaverse Data Deluge: What Can We Do About It?2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00296(3675-3687)Online publication date: Apr-2023
  • (2022)Runtime Adaptation of Data Stream Processing Systems: The State of the ArtACM Computing Surveys10.1145/351449654:11s(1-36)Online publication date: 9-Sep-2022
  • (2022)Multi-Query Optimization of Incrementally Evaluated Sliding-Window AggregationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302977034:8(3899-3911)Online publication date: 1-Aug-2022
  • (2021)LachesisProceedings of the 22nd International Middleware Conference10.1145/3464298.3493407(365-378)Online publication date: 6-Dec-2021
  • (2020)Thrifty Query Execution via IncrementabilityProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389756(1241-1256)Online publication date: 11-Jun-2020
  • (2020)GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domainJournal of Web Semantics10.1016/j.websem.2020.100596(100596)Online publication date: Aug-2020
  • (2019)Accessing Data From Multiple Heterogeneous Distributed Database SystemsApplying Integration Techniques and Methods in Distributed Systems and Technologies10.4018/978-1-5225-8295-3.ch008(192-219)Online publication date: 2019
  • (2019)HarenProceedings of the 20th International Middleware Conference Demos and Posters10.1145/3366627.3368108(19-20)Online publication date: 9-Dec-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media