ABSTRACT
Data Stream Management Systems (DSMS) represent a vibrant research area that is rich in technical challenges, which many projects have approached by extending database query languages and models for continuous queries on data streams [1, 3, 4, 9, 5]. These database-inspired approaches have delivered remarkable systems and applications, but have yet to produce solid conceptual foundations for DSMS data models and query languages---particularly if we compare with the extraordinary ones of relational databases. A cornerstone of the success of relational databases was the tight coupling between their data model and their logic-based query languages. In this paper, we show that a similar approach can succeed for data streams and propose a tight-coupled design for DSMS data models and query languages. To express more naturally the behavior of a data stream and attain more powerful on-line queries, we abandon the set-of-tuples model of relational databases, and instead use sequences of tuples ordered by their time-stamps as our data stream model. This approach allows us to overcome the blocking problem that severely impairs the expressive power of data stream query languages. As elucidated in [1]: A blocking query operator is one that cannot produce the first tuple of the output until it has seen the entire input. Previous work had characterized blocking query operators by their non-monotonic behavior [7, 6, 8]. In this paper, we instead use the closed-world assumption [11, 10] to characterize blocking/nonblocking behaviors with respect to the incompleteness/completeness of the streaming database. From this, we infer simple syntactic conditions that make Datalog rules immune from blocking. A significant and surprising new result is that the use of negated goals in the bodies of rules does not imply a blocking behavior: in fact, many very useful nonblocking queries can be expressed using negation. The flip side of this exciting result is that additional conditions must then be imposed on the rules to ensure that (i) the results produced by Datalog programs are ordered according to their time-stamps, and (ii) possible time-skews between streams are also managed explicitly by the rules [2]. These problems, and their possible remedies, are captured and expressed quite naturally using Datalog, which thus emerges as a powerful framework for analyzing and expressing continuous queries. Related problems, including the treatment of data streams without time-stamps, the characterization of monotonic query operators [7, 6, 8], and the use of more general closed-world assumptions were also studied and answered in the course of this research [12].
- B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom. Models and issues in data stream systems. In PODS, 2002. Google ScholarDigital Library
- Y. Bai, H. Thakkar, H. Wang, and C. Zaniolo. Timestamp management and query execution models in data stream management systems. IEEE Internet Computing, 12(6):13--21, 2008. Google ScholarDigital Library
- S. Chandrasekaran and M. Franklin. Streaming queries over streaming data. In VLDB, 2002. Google ScholarDigital Library
- C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an SQL interface. In SIGMOD, page 623. ACM Press, 2002. Google ScholarDigital Library
- L. Golab and M. T. Özsu. Issues in data stream management. ACM SIGMOD Record, 32(2):5--14, 2003. Google ScholarDigital Library
- Y. Gurevich, D. Leinders, and J. V. den Bussche. A theory of stream queries. In Database Programming Languages, 11th International Symposium, DBPL 2007, pages 153--168. Springer, 2007. Google ScholarDigital Library
- Y.-N. Law, H. Wang, and C. Zaniolo. Data models and query language for data streams. In VLDB, pages 492--503, 2004. Google ScholarDigital Library
- Y.-N. Law, H. Wang, and C. Zaniolo. Relational languages and data models for continuous queries on sequences and data streams. ACM Transactions on Database Systems-to appear, 2011. Google ScholarDigital Library
- S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD, pages 49--61, 2002. Google ScholarDigital Library
- A. Rajasekar, J. Lobo, and J. Minker. Weak generalized closed world assumption. J. Autom. Reasoning, 5(3):293--307, 1989. Google ScholarDigital Library
- R. Reiter. Deductive question-answering on relational data bases. In H. Gallaire and J. Minker, editors, Logic and Data Bases, Symposium on Logic and Data Bases, Toulouse, 1977, pages 149--177, 1977.Google Scholar
- C. Zaniolo. Bringing logic and order to data stream management systems. UCLA Technical Report, 2011.Google Scholar
Index Terms
- The logic of query languages for data streams
Recommendations
Condensative stream query language for data streams
ADC '07: Proceedings of the eighteenth conference on Australasian database - Volume 63In contrast to traditional database queries, a query on stream data is continuous in that it is periodically evaluated over fractions (sliding windows) of the data stream. This introduces challenges beyond those encountered when processing traditional ...
Comments