Abstract
This article deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of memory for all possible instances of the data streams. For queries that can be evaluated using bounded memory, an execution strategy based on constant-sized synopses of the data streams is proposed. For queries that cannot be evaluated using bounded memory, data stream scenarios are identified in which evaluating the queries requires memory linear in the size of the unbounded streams.
Supplemental Material
Available for Download
Article appendix
- Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1 (Feb.), 137--147. Google ScholarDigital Library
- Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. 2002a. Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 221--232. Google ScholarDigital Library
- Arasu, A., Babu, S., and Widom, J. 2002b. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. Rep. http://dbpubs.stanford.edu/pub/2002-57, Stanford University. Nov.Google Scholar
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 1--16. Google ScholarDigital Library
- Babcock, B., Datar, M., and Motwani, R. 2002. Sampling from a moving window over streaming data. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, 633--634. Google ScholarDigital Library
- Babu, S. and Widom, J. 2002. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Tech. Rep. http://dbpubs.stanford.edu/pub/2002-52, Stanford University. Nov.Google Scholar
- Carney, D., Centintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. 2002. Monitoring streams---A new class of data management applications. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan-Kaufmann, San Mateo, Calif., 215--226. Google ScholarDigital Library
- Chandrasekharan, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Resis, F., and Shah, M. A. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the 1st Conference on Innovative Data Systems Research. 269--280.Google Scholar
- Chandrasekharan, S. and Franklin, M. J. 2002. Streaming queries over streaming data. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan-Kaufmann, San Mateo, Calif., 203--214. Google ScholarDigital Library
- Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. Niagaracq: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM, New York, 379--390. Google ScholarDigital Library
- Chomicki, J. 1995. Efficient checking of temporal integrity constraints using bounded history encoding. ACM Trans. Datab. Syst. 20, 2, 149--186. Google ScholarDigital Library
- Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, New York, 647--651. Google ScholarDigital Library
- Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, 635--644. Google ScholarDigital Library
- Dobra, A., Garofalakis, M. N., Gehrke, J., and Rastogi, R. 2002. Processing complex aggregate queries over data streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM, New York, 61--72. Google ScholarDigital Library
- Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 1999. An approximate l1-difference algorithm for massive data streams. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 501--511. Google ScholarDigital Library
- Gehrke, J. 2003. Special issue on data stream processing. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 26, 1 (Mar.).Google Scholar
- Gehrke, J., Korn, F., and Srivastava, D. 2001. On computing correlated aggregates over continual data streams. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. ACM, New York, 13--24. Google ScholarDigital Library
- Gilbert, A. C., Guha, S., Indyk, P., Muthukrishnan, S., and Strauss, M. 2002. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, New York, 389--398. Google ScholarDigital Library
- Golab, L. and Ozsu, M. T. 2003. Issues in data stream management. SIGMOD Record 32, 2 (June), 5--14. Google ScholarDigital Library
- Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, A., and Venkatrao, M. 1997. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Disc. 1, 1 (Mar.), 29--53. Google ScholarDigital Library
- Greenwald, M. and Khanna, S. 2001. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. ACM, New York, 58--66. Google ScholarDigital Library
- Guha, S., Koudas, N., and Shim, K. 2001. Data-streams and histograms. In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing. ACM, New York, 471--475. Google ScholarDigital Library
- Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM, New York, 49--60. Google ScholarDigital Library
- Manku, G. S., Rajagopalan, S., and Lindsay, B. G. 1999. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. ACM, New York, 251--262. Google ScholarDigital Library
- Shah, M., Hellerstein, J. M., Chandrasekharan, S., and Franklin, M. J. 2003. Flux: An adaptive partitioning operator for continuous query systems. In Proceedings of the 19th International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif.Google Scholar
- STREAM. 2003. Stanford stream data management project. http://www-db.stanford.edu/stream.Google Scholar
- Terry, D. B., Goldberg, D., Nichols, D., and Oki, B. M. 1992. Continuous queries over append-only databases. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data. ACM, New York, 321--330. Google ScholarDigital Library
- Traderbot. 2003. Traderbot home page. http://www.traderbot.com.Google Scholar
- Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarDigital Library
- Ullman, J. D. 1988. Principles of Database and Knowledge-Base Systems. Vol. I. Computer Science Press, Rockville, Md., Chapter 3, 121--122. Google ScholarDigital Library
- Ullman, J. D. 1989. Principles of Database and Knowledge-Base Systems. Vol. II. Computer Science Press, Rockville, Md., Chapter 14, 892--893. Google ScholarDigital Library
- Vitter, J. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1 (Mar.), 37--57. Google ScholarDigital Library
Index Terms
Characterizing memory requirements for queries over continuous data streams
Recommendations
Exploiting Punctuation Semantics in Continuous Data Streams
As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Characterizing memory requirements for queries over continuous data streams
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe consider conjunctive queries with arithmetic comparisons over multiple continuous data streams. We specify an algorithm for determining whether or not a query can be evaluated using a bounded amount of memory for all possible instances of the data ...
Sketching distributed sliding-window data streams
While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
Comments