ACM Home Page
Please provide us with feedback. Feedback
Query-aware partitioning for monitoring massive network data streams
Full text PdfPdf (429 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Industrial Session 3: Streams, Conversations and Verification: table of contents
Pages 1135-1146  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Theodore Johnson  AT&T Labs - Research, Florham Park, NJ, USA
Muthu S. Muthukrishnan  Rutgers University, Piscataway, NJ, USA
Vladislav Shkapenyuk  AT&T Labs - Research, Florham Park, NJ, USA
Oliver Spatscheck  AT&T Labs - Research, Florham Park, NJ, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 45,   Downloads (12 Months): 155,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376730
What is a DOI?

ABSTRACT

Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams.

Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We present methods for analyzing any given query set and choose the optimal partitioning scheme, and show how to reconcile potentially conflicting requirements that different queries might place on partitioning. We conclude with experiments on a small cluster of processing nodes on high-rate network traffic feed that demonstrates with different query sets that our methods effectively distribute the load across all processing nodes and facilitate efficient scaling whenever more processing nodes becomes available.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. J. Abadi et al. The Design of the Borealis Stream Processing Engine, CIDR 2005.
 
2
D. J. Abadi et al.. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120--139, 2003.
 
3
D. J. Abadi W. Lindner, S. Madden, and J. Schuler. An Integration Framework for Sensor Networks and Data Stream Management Systems. Demonstration. VLDB 2004
 
4
A. Arasu et al. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
 
5
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. ACM PODS, pages 1--16, 2002.
 
6
M. Balazinska, H. Balakrishnan, M. Stonebraker. Contract-Based Load Management in Federated Distributed Systems. NSDI 2004, San Francisco, CA, March 2004.
 
7
S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. CIDR 2003.
 
8
J. Chen, D.J. DeWitt, F. Tian and Y. Wang, NiagaraCQ: A Scalable Continuous Query System for Internet Databases. SIGMOD 2000 pg. 379--390.
 
9
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. CIDR 2003.
 
10
G. Cormode, T. Johnson, F. Korn, S. Muthukrishnan, O. Spatscheck, and D. Srivastava. Holistic UDAFs at streaming speeds. SIGMOD Conference, pages 35--46. ACM, 2004.
 
11
C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. Gigascope: A stream database for network applications. ACM SIGMOD, pages 647--651, 2003.
 
12
D. DeWitt, J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, v.35 n.6, p.85--98, June 1992.
 
13
M. Ivanova and T. Risch. Customizable Parallel Execution of Scientific Stream Queries. VLDB 2005.
 
14
R. R. Kompella, S. Singh, and G. Varghese. On scalable attack detection in the network. In ACM Internet Measurement Conference IMC 2004, pages 187 -- 200.
 
15
D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422--469, 2000.
 
16
Per-Ake Larson. Data Reduction by Partial Preaggregation. 18th International Conference on Data Engineering (ICDE'02), 2002.
 
17
J. Li, D. Maier, K. Tufte, V. Papadimos, P. A. Tucker: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record 34(1): 39--44 (2005)
 
18
S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, Vol 2, 2005.
 
19
J. Rao, C. Zhang, N. Megiddo, G. M. Lohman: Automating physical database design in a parallel database. SIGMOD Conference 2002: 558--569
 
20
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. ICDE 2003
 
21
M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. In Proc.USENIX Annual Technical Conf., 1998

Collaborative Colleagues:
Theodore Johnson: colleagues
Muthu S. Muthukrishnan: colleagues
Vladislav Shkapenyuk: colleagues
Oliver Spatscheck: colleagues