ABSTRACT
Large scale streaming systems aim to provide high throughput and low latency. They are often used to run mission-critical applications, and must be available 24x7. Thus such systems need to adapt to failures and inherent changes in workloads, with minimal impact on latency and throughput. Unfortunately, existing solutions require operators to choose between achieving low latency during normal operation and incurring minimal impact during adaptation. Continuous operator streaming systems, such as Naiad and Flink, provide low latency during normal execution but incur high overheads during adaptation (e.g., recovery), while micro-batch systems, such as Spark Streaming and FlumeJava, adapt rapidly at the cost of high latency during normal operations.
Our key observation is that while streaming workloads require millisecond-level processing, workload and cluster properties change less frequently. Based on this, we develop Drizzle, a system that decouples the processing interval from the coordination interval used for fault tolerance and adaptability. Our experiments on a 128 node EC2 cluster show that on the Yahoo Streaming Benchmark, Drizzle can achieve end-to-end record processing latencies of less than 100ms and can get 2-3x lower latency than Spark. Drizzle also exhibits better adaptability, and can recover from failures 4x faster than Flink while having up to 13x lower latency during recovery.
Supplemental Material
- Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. Aurora: a new model and architecture for data stream management. VLDB (2003). Google ScholarDigital Library
- Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., and Whittle, S. Millwheel: Fault-tolerant stream processing at internet scale. In VLDB (2013), pp. 734--746. Google ScholarDigital Library
- Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernãąndez -Moctezuma, R. J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., and Whittle, S. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. VLDB (2015), 1792--1803. Google ScholarDigital Library
- Ananthanarayanan, G., Ghodsi, A., Shenker, S., and Stoica, I. Effective straggler mitigation: Attack of the clones. In NSDI (2013). Google ScholarDigital Library
- Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S., and Stoica, I. Pacman: Coordinated memory caching for parallel jobs. In NSDI (2012). Google ScholarDigital Library
- Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the outliers in Map-Reduce clusters using Mantri. In OSDI (2010). Google ScholarDigital Library
- Apache Hadoop NextGen MapReduce (YARN). Retrieved 9/24/2013, URL: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google Scholar
- Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., et al. Spark SQL: Relational data processing in Spark. In SIGMOD (2015). Google ScholarDigital Library
- Bingmann, T., Axtmann, M., Jöbstl, E., Lamm, S., Nguyen, H. C., Noe, A., Schlag, S., Stumpp, M., Sturm, T., and Sanders, P. Thrill: High-performance algorithmic distributed batch data processing with c++. CoRR abs/1608.05634 (2016).Google Scholar
- Boncz, P. A., Zukowski, M., and Nes, N. Monetdb/x100: Hyper-pipelining query execution. In CIDR (2005), vol. 5, pp. 225--237.Google Scholar
- Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., Wu, M., and Zhou, L. Apollo: scalable and coordinated scheduling for cloud-scale computing. In OSDI (2014). Google ScholarDigital Library
- Brakmo, L. S., and Peterson, L. L. TCP Vegas: End to End congestion avoidance on a global Internet. IEEE Journal on Selected Areas in Communications 13, 8 (Oct. 1995), 1465--1480. Google ScholarDigital Library
- Carbone, P., Fóra, G., Ewen, S., Haridi, S., and Tzoumas, K. Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603 (2015).Google Scholar
- Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., and Tzoumas, K. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin (2015).Google Scholar
- Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: easy and efficient parallel processing of massive data sets. VLDB (2008), 1265--1276. Google ScholarDigital Library
- Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R., Bradshaw, R., and Nathan. FlumeJava: Easy, Efficient Data-Parallel Pipelines. In PLDI (2010). Google ScholarDigital Library
- Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S. R., Reiss, F., and Shah, M. A. TelegraphCQ: continuous dataflow processing. In SIGMOD (2003), ACM. Google ScholarDigital Library
- Chandy, K. M., and Lamport, L. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3, 1 (1985), 63--75. Google ScholarDigital Library
- Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., and Stoica, I. Managing data transfers in computer clusters with orchestra. In SIGCOMM (2011). Google ScholarDigital Library
- Das, T., Zhong, Y., Stoica, I., and Shenker, S. Adaptive stream processing using dynamic batch sizing. In SOCC (2014). Google ScholarDigital Library
- Extending the Yahoo! Streaming Benchmark. http://data-artisans.com/extending-the-yahoo-streaming-benchmark.Google Scholar
- Structured Streaming In Apache Spark: A new high-level API for streaming. https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html.Google Scholar
- Datanami. Kafka Tops 1 Trillion Messages Per Day at LinkedIn. https://goo.gl/cY7VOz.Google Scholar
- Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51, 1 (2008). Google ScholarDigital Library
- Dragojević, A., Narayanan, D., Nightingale, E. B., Renzelmann, M., Shamis, A., Badam, A., and Castro, M. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In SOSP (2015). Google ScholarDigital Library
- Floratou, A., Agrawal, A., Graham, B., Rao, S., and Ramasamy, K. Dhalion: self-regulating stream processing in heron. Proceedings of the VLDB Endowment 10, 12 (2017), 1825--1836. Google ScholarDigital Library
- Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V. - A., Barroso, L., Grimes, C., and Quinlan, S. Availability in globally distributed storage systems. In OSDI (2010), pp. 61--74. Google ScholarDigital Library
- Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI (2011). Google ScholarDigital Library
- Graefe, G. Encapsulation of parallelism in the volcano query processing system. In SIGMOD (1990), pp. 102--111. Google ScholarDigital Library
- Gray, C., and Cheriton, D. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In SOSP (1989), pp. 202--210. Google ScholarDigital Library
- Grosvenor, M. P., Schwarzkopf, M., Gog, I., Watson, R. N. M., Moore, A. W., Hand, S., and Crowcroft, J. Queues don't matter when you can jump them! In NSDI (2015). Google ScholarDigital Library
- Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R., Shenker, S., and Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI (2011). Google ScholarDigital Library
- Isard, M., and Abadi, M. Falkirk wheel: Rollback recovery for dataflow systems. arXiv preprint arXiv:1503.08877 (2015).Google Scholar
- Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Eurosys (2007). Google ScholarDigital Library
- Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: Fair scheduling for distributed computing clusters. In SOSP (2009). Google ScholarDigital Library
- Jacobson, V. Congestion avoidance and control. ACM SIGCOMM Computer Communication Review 18, 4 (1988), 314--329. Google ScholarDigital Library
- Jiang, J., Sekar, V., Milner, H., Shepherd, D., Stoica, I., and Zhang, H. CFA: A Practical Prediction System for Video QoE Optimization. In NSDI (2016), pp. 137--150. Google ScholarDigital Library
- Johnston, W. M., Hanna, J., and Millar, R. J. Advances in dataflow programming languages. ACM Computing Surveys (CSUR) 36, 1 (2004), 1--34. Google ScholarDigital Library
- Ke, Q., Isard, M., and Yu, Y. Optimus: a dynamic rewriting framework for data-parallel execution plans. In Eurosys (2013), pp. 15--28. Google ScholarDigital Library
- Kreps, J., Narkhede, N., Rao, J., et al. Kafka: A distributed messaging system for log processing. In NetDB (2011).Google Scholar
- Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J. M., Ramasamy, K., and Taneja, S. Twitter heron: Stream processing at scale. In SIGMOD (2015), pp. 239--250. Google ScholarDigital Library
- Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., and Zhou, L. Streamscope: continuous reliable distributed processing of big data streams. In NSDI (2016), pp. 439--453. Google ScholarDigital Library
- Mashayekhi, O., Qu, H., Shah, C., and Levis, P. Scalable, fast cloud computing with execution templates. CoRR abs/1606.01972 (2016).Google Scholar
- McSherry, F., Isard, M., and Murray, D. G. Scalability! but at what cost? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV) (2015). Google ScholarDigital Library
- Meisner, D., Sadler, C. M., Barroso, L. A., Weber, W.-D., and Wenisch, T. F. Power management of online data-intensive services. In ISCA (2011). Google ScholarDigital Library
- SLA for Stream Analytics. https://azure.microsoft.com/en-us/support/legal/sla/stream-analytics/v1_0/.Google Scholar
- Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. Query processing, resource management, and approximation in a data stream management system. In CIDR (2003).Google Scholar
- Murray, D. G., McSherry, F., Isaacs, R., Isard, M., Barham, P., and Abadi, M. Naiad: a timely dataflow system. In SOSP (2013), pp. 439--455. Google ScholarDigital Library
- Murray, D. G., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A., and Hand, S. Ciel: a universal execution engine for distributed data-flow computing. In NSDI (2011), pp. 113--126. Google ScholarDigital Library
- Stream-processing with Mantis. http://techblog.netflix.com/2016/03/stream-processing-with-mantis.html.Google Scholar
- Ousterhout, K., Panda, A., Rosen, J., Venkataraman, S., Xin, R., Ratnasamy, S., Shenker, S., and Stoica, I. The case for tiny tasks in compute clusters. In HotOS (2013). Google ScholarDigital Library
- Ousterhout, K., Wendell, P., Zaharia, M., and Stoica, I. Sparrow: distributed, low latency scheduling. In SOSP (2013), pp. 69--84. Google ScholarDigital Library
- Recht, B., Re, C., Wright, S., and Niu, F. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems (2011), pp. 693--701. Google ScholarDigital Library
- Schelter, S., Ewen, S., Tzoumas, K., and Markl, V. All roads lead to rome: optimistic recovery for distributed iterative data processing. In CIKM (2013). Google ScholarDigital Library
- Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. Access path selection in a relational database management system. In SIGMOD (1979), pp. 23--34. Google ScholarDigital Library
- Stonebraker, M., Çetintemel, U., and Zdonik, S. The 8 requirements of real-time stream processing. SIGMOD Record 34, 4 (Dec. 2005), 42--47. Google ScholarDigital Library
- Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al. Storm at twitter. In SIGMOD (2014).Google Scholar
- Observability at Twitter: technical overview. https://goo.gl/wAHi2I.Google Scholar
- Apache Spark, Preparing for the Next Wave of Reactive Big Data. http://goo.gl/FqEh94.Google Scholar
- Verma, A., Cho, B., Zea, N., Gupta, I., and Campbell, R. H. Breaking the mapreduce stage barrier. Cluster computing 16, 1 (2013), 191--206. Google ScholarDigital Library
- Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. Large-scale cluster management at google with borg. In Eurosys (2015). Google ScholarDigital Library
- Yadwadkar, N. J., Ananthanarayanan, G., and Katz, R. Wrangler: Predictable and faster jobs using fewer resources. In SOCC (2014). Google ScholarDigital Library
- Benchmarking Streaming Computation Engines at Yahoo! https://yahooeng.tumblr.com/post/135321837876.Google Scholar
- Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P., and Currey, J. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (2008). Google ScholarDigital Library
- Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Eurosys (2010). Google ScholarDigital Library
- Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI (2012). Google ScholarDigital Library
- Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP (2013). Google ScholarDigital Library
- Zhang, T., Chowdhery, A., Bahl, P. V., Jamieson, K., and Banerjee, S. The design and implementation of a wireless video surveillance system. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (2015), ACM, pp. 426--438. Google ScholarDigital Library
Index Terms
- Drizzle: Fast and Adaptable Stream Processing at Scale
Recommendations
Dual-Paradigm Stream Processing
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingExisting stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable ...
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance EngineeringDistributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of ...
ReHype: enabling VM survival across hypervisor failures
VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsWith existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by ...
Comments