research-article

Public Access

Drizzle: Fast and Adaptable Stream Processing at Scale

Authors:
Shivaram Venkataraman

UC Berkeley

UC Berkeley
View Profile

,
Aurojit Panda

UC Berkeley

UC Berkeley
View Profile

,
Kay Ousterhout

UC Berkeley

UC Berkeley
View Profile

,
Michael Armbrust

Databricks

Databricks
View Profile

,
Ali Ghodsi

Databricks, UC Berkeley

Databricks, UC Berkeley
View Profile

,
Michael J. Franklin

University of Chicago

University of Chicago
View Profile

,
Benjamin Recht

UC Berkeley

UC Berkeley
View Profile

,
Ion Stoica

Databricks, UC Berkeley

Databricks, UC Berkeley
View Profile

SOSP '17: Proceedings of the 26th Symposium on Operating Systems PrinciplesOctober 2017Pages 374–389https://doi.org/10.1145/3132747.3132750

Published:14 October 2017Publication History

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

Pages 374–389

ABSTRACT

Large scale streaming systems aim to provide high throughput and low latency. They are often used to run mission-critical applications, and must be available 24x7. Thus such systems need to adapt to failures and inherent changes in workloads, with minimal impact on latency and throughput. Unfortunately, existing solutions require operators to choose between achieving low latency during normal operation and incurring minimal impact during adaptation. Continuous operator streaming systems, such as Naiad and Flink, provide low latency during normal execution but incur high overheads during adaptation (e.g., recovery), while micro-batch systems, such as Spark Streaming and FlumeJava, adapt rapidly at the cost of high latency during normal operations.

Our key observation is that while streaming workloads require millisecond-level processing, workload and cluster properties change less frequently. Based on this, we develop Drizzle, a system that decouples the processing interval from the coordination interval used for fault tolerance and adaptability. Our experiments on a 128 node EC2 cluster show that on the Yahoo Streaming Benchmark, Drizzle can achieve end-to-end record processing latencies of less than 100ms and can get 2-3x lower latency than Spark. Drizzle also exhibits better adaptability, and can recover from failures 4x faster than Flink while having up to 13x lower latency during recovery.

Supplemental Material

drizzle.mp4

mp4

2 GB

Download

References

Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. Aurora: a new model and architecture for data stream management. VLDB (2003). Google ScholarDigital Library
Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., and Whittle, S. Millwheel: Fault-tolerant stream processing at internet scale. In VLDB (2013), pp. 734--746. Google ScholarDigital Library
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernãąndez -Moctezuma, R. J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., and Whittle, S. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. VLDB (2015), 1792--1803. Google ScholarDigital Library
Ananthanarayanan, G., Ghodsi, A., Shenker, S., and Stoica, I. Effective straggler mitigation: Attack of the clones. In NSDI (2013). Google ScholarDigital Library
Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S., and Stoica, I. Pacman: Coordinated memory caching for parallel jobs. In NSDI (2012). Google ScholarDigital Library
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the outliers in Map-Reduce clusters using Mantri. In OSDI (2010). Google ScholarDigital Library
Apache Hadoop NextGen MapReduce (YARN). Retrieved 9/24/2013, URL: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google Scholar
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., et al. Spark SQL: Relational data processing in Spark. In SIGMOD (2015). Google ScholarDigital Library
Bingmann, T., Axtmann, M., Jöbstl, E., Lamm, S., Nguyen, H. C., Noe, A., Schlag, S., Stumpp, M., Sturm, T., and Sanders, P. Thrill: High-performance algorithmic distributed batch data processing with c++. CoRR abs/1608.05634 (2016).Google Scholar
Boncz, P. A., Zukowski, M., and Nes, N. Monetdb/x100: Hyper-pipelining query execution. In CIDR (2005), vol. 5, pp. 225--237.Google Scholar
Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., Wu, M., and Zhou, L. Apollo: scalable and coordinated scheduling for cloud-scale computing. In OSDI (2014). Google ScholarDigital Library
Brakmo, L. S., and Peterson, L. L. TCP Vegas: End to End congestion avoidance on a global Internet. IEEE Journal on Selected Areas in Communications 13, 8 (Oct. 1995), 1465--1480. Google ScholarDigital Library
Carbone, P., Fóra, G., Ewen, S., Haridi, S., and Tzoumas, K. Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603 (2015).Google Scholar
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., and Tzoumas, K. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin (2015).Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: easy and efficient parallel processing of massive data sets. VLDB (2008), 1265--1276. Google ScholarDigital Library
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R., Bradshaw, R., and Nathan. FlumeJava: Easy, Efficient Data-Parallel Pipelines. In PLDI (2010). Google ScholarDigital Library
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S. R., Reiss, F., and Shah, M. A. TelegraphCQ: continuous dataflow processing. In SIGMOD (2003), ACM. Google ScholarDigital Library
Chandy, K. M., and Lamport, L. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3, 1 (1985), 63--75. Google ScholarDigital Library
Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., and Stoica, I. Managing data transfers in computer clusters with orchestra. In SIGCOMM (2011). Google ScholarDigital Library
Das, T., Zhong, Y., Stoica, I., and Shenker, S. Adaptive stream processing using dynamic batch sizing. In SOCC (2014). Google ScholarDigital Library
Extending the Yahoo! Streaming Benchmark. http://data-artisans.com/extending-the-yahoo-streaming-benchmark.Google Scholar
Structured Streaming In Apache Spark: A new high-level API for streaming. https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html.Google Scholar
Datanami. Kafka Tops 1 Trillion Messages Per Day at LinkedIn. https://goo.gl/cY7VOz.Google Scholar
Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51, 1 (2008). Google ScholarDigital Library
Dragojević, A., Narayanan, D., Nightingale, E. B., Renzelmann, M., Shamis, A., Badam, A., and Castro, M. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In SOSP (2015). Google ScholarDigital Library
Floratou, A., Agrawal, A., Graham, B., Rao, S., and Ramasamy, K. Dhalion: self-regulating stream processing in heron. Proceedings of the VLDB Endowment 10, 12 (2017), 1825--1836. Google ScholarDigital Library
Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V. - A., Barroso, L., Grimes, C., and Quinlan, S. Availability in globally distributed storage systems. In OSDI (2010), pp. 61--74. Google ScholarDigital Library
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI (2011). Google ScholarDigital Library
Graefe, G. Encapsulation of parallelism in the volcano query processing system. In SIGMOD (1990), pp. 102--111. Google ScholarDigital Library
Gray, C., and Cheriton, D. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In SOSP (1989), pp. 202--210. Google ScholarDigital Library
Grosvenor, M. P., Schwarzkopf, M., Gog, I., Watson, R. N. M., Moore, A. W., Hand, S., and Crowcroft, J. Queues don't matter when you can jump them! In NSDI (2015). Google ScholarDigital Library
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R., Shenker, S., and Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI (2011). Google ScholarDigital Library
Isard, M., and Abadi, M. Falkirk wheel: Rollback recovery for dataflow systems. arXiv preprint arXiv:1503.08877 (2015).Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Eurosys (2007). Google ScholarDigital Library
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: Fair scheduling for distributed computing clusters. In SOSP (2009). Google ScholarDigital Library
Jacobson, V. Congestion avoidance and control. ACM SIGCOMM Computer Communication Review 18, 4 (1988), 314--329. Google ScholarDigital Library
Jiang, J., Sekar, V., Milner, H., Shepherd, D., Stoica, I., and Zhang, H. CFA: A Practical Prediction System for Video QoE Optimization. In NSDI (2016), pp. 137--150. Google ScholarDigital Library
Johnston, W. M., Hanna, J., and Millar, R. J. Advances in dataflow programming languages. ACM Computing Surveys (CSUR) 36, 1 (2004), 1--34. Google ScholarDigital Library
Ke, Q., Isard, M., and Yu, Y. Optimus: a dynamic rewriting framework for data-parallel execution plans. In Eurosys (2013), pp. 15--28. Google ScholarDigital Library
Kreps, J., Narkhede, N., Rao, J., et al. Kafka: A distributed messaging system for log processing. In NetDB (2011).Google Scholar
Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J. M., Ramasamy, K., and Taneja, S. Twitter heron: Stream processing at scale. In SIGMOD (2015), pp. 239--250. Google ScholarDigital Library
Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., and Zhou, L. Streamscope: continuous reliable distributed processing of big data streams. In NSDI (2016), pp. 439--453. Google ScholarDigital Library
Mashayekhi, O., Qu, H., Shah, C., and Levis, P. Scalable, fast cloud computing with execution templates. CoRR abs/1606.01972 (2016).Google Scholar
McSherry, F., Isard, M., and Murray, D. G. Scalability! but at what cost? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV) (2015). Google ScholarDigital Library
Meisner, D., Sadler, C. M., Barroso, L. A., Weber, W.-D., and Wenisch, T. F. Power management of online data-intensive services. In ISCA (2011). Google ScholarDigital Library
SLA for Stream Analytics. https://azure.microsoft.com/en-us/support/legal/sla/stream-analytics/v1_0/.Google Scholar
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. Query processing, resource management, and approximation in a data stream management system. In CIDR (2003).Google Scholar
Murray, D. G., McSherry, F., Isaacs, R., Isard, M., Barham, P., and Abadi, M. Naiad: a timely dataflow system. In SOSP (2013), pp. 439--455. Google ScholarDigital Library
Murray, D. G., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A., and Hand, S. Ciel: a universal execution engine for distributed data-flow computing. In NSDI (2011), pp. 113--126. Google ScholarDigital Library
Stream-processing with Mantis. http://techblog.netflix.com/2016/03/stream-processing-with-mantis.html.Google Scholar
Ousterhout, K., Panda, A., Rosen, J., Venkataraman, S., Xin, R., Ratnasamy, S., Shenker, S., and Stoica, I. The case for tiny tasks in compute clusters. In HotOS (2013). Google ScholarDigital Library
Ousterhout, K., Wendell, P., Zaharia, M., and Stoica, I. Sparrow: distributed, low latency scheduling. In SOSP (2013), pp. 69--84. Google ScholarDigital Library
Recht, B., Re, C., Wright, S., and Niu, F. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (2011), pp. 693--701. Google ScholarDigital Library
Schelter, S., Ewen, S., Tzoumas, K., and Markl, V. All roads lead to rome: optimistic recovery for distributed iterative data processing. In CIKM (2013). Google ScholarDigital Library
Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. Access path selection in a relational database management system. In SIGMOD (1979), pp. 23--34. Google ScholarDigital Library
Stonebraker, M., Çetintemel, U., and Zdonik, S. The 8 requirements of real-time stream processing. SIGMOD Record 34, 4 (Dec. 2005), 42--47. Google ScholarDigital Library
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al. Storm at twitter. In SIGMOD (2014).Google Scholar
Observability at Twitter: technical overview. https://goo.gl/wAHi2I.Google Scholar
Apache Spark, Preparing for the Next Wave of Reactive Big Data. http://goo.gl/FqEh94.Google Scholar
Verma, A., Cho, B., Zea, N., Gupta, I., and Campbell, R. H. Breaking the mapreduce stage barrier. Cluster computing 16, 1 (2013), 191--206. Google ScholarDigital Library
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. Large-scale cluster management at google with borg. In Eurosys (2015). Google ScholarDigital Library
Yadwadkar, N. J., Ananthanarayanan, G., and Katz, R. Wrangler: Predictable and faster jobs using fewer resources. In SOCC (2014). Google ScholarDigital Library
Benchmarking Streaming Computation Engines at Yahoo! https://yahooeng.tumblr.com/post/135321837876.Google Scholar
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P., and Currey, J. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (2008). Google ScholarDigital Library
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Eurosys (2010). Google ScholarDigital Library
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI (2012). Google ScholarDigital Library
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP (2013). Google ScholarDigital Library
Zhang, T., Chowdhery, A., Bahl, P. V., Jamieson, K., and Banerjee, S. The design and implementation of a wireless video surveillance system. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (2015), ACM, pp. 426--438. Google ScholarDigital Library

Index Terms

Drizzle: Fast and Adaptable Stream Processing at Scale
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
  2. Dependable and fault-tolerant systems and networks

Recommendations

Dual-Paradigm Stream Processing
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Existing stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable ...
Read More
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of ...
Read More
ReHype: enabling VM survival across hypervisor failures
VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles
October 2017
677 pages
ISBN:9781450350853
DOI:10.1145/3132747

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Performance
Reliability
Stream Processing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 110
  Total Citations
  View Citations
- 3,581
  Total Downloads
- Downloads (Last 12 months)381
- Downloads (Last 6 weeks)41
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Drizzle: Fast and Adaptable Stream Processing at Scale

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Dual-Paradigm Stream Processing

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks

ReHype: enabling VM survival across hypervisor failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Drizzle: Fast and Adaptable Stream Processing at Scale

SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Dual-Paradigm Stream Processing

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks

ReHype: enabling VM survival across hypervisor failures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media