research-article

Speeding up distributed request-response workflows

Authors:

Virajith Jalaparti,

Srikanth Kandula,

Mikhail Rybalkin,

Chenyu YanAuthors Info & Claims

ACM SIGCOMM Computer Communication Review, Volume 43, Issue 4

Pages 219 - 230

https://doi.org/10.1145/2534169.2486028

Published: 27 August 2013 Publication History

Abstract

We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.

References

[1]

Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/.

[2]

M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010.

Digital Library

[3]

M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing Datacenter Packet Transport. In Hotnets, 2012.

Digital Library

[4]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in MapReduce Clusters Using Mantri. In OSDI, 2010.

Digital Library

[5]

D. G. Andersen, H. Balakrishnan, M. F. Kaashoek, and R. N. Rao. Improving web availability for clients with MONET. In NSDI, 2005.

Digital Library

[6]

W. Baek and T. M. Chilimbi. Green: A Framework for Supporting Energy-Conscious Programming using Controlled Approximation. In PLDI, 2010.

Digital Library

[7]

S. Boucheron, G. Lugosi, and O. Bousquet. Concentration inequalities. Advanced Lectures on Machine Learning, 2004.

[8]

J. Brutlag. Speed matters for Google web search. http://googleresearch.blogspot.com/2009/06/speedmatters. html, 2009.

[9]

J. R. Dabrowski and E. V. Munson. Is 100 Milliseconds Too Fast? In CHI, 2001.

Digital Library

[10]

J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2):74--80, Feb. 2013.

Digital Library

[11]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[12]

G. Decandia et al. Dynamo : Amazon's Highly Available Key-value Store. In SOSP, 2007.

Digital Library

[13]

A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In EuroSys, 2012.

Digital Library

[14]

D. Han, A. Anand, A. Akella, and S. Seshan. RPT: Re-architecting Loss Protection for Content-Aware Networks. In NSDI, 2012.

Digital Library

[15]

Y. He, S. Elnikety, J. Larus, and C. Yan. Zeta: Scheduling Interactive Services with Partial Execution. In SOCC, 2012.

Digital Library

[16]

H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic Knobs for Responsive Power-Aware Computing. In ASPLOS, 2011.

Digital Library

[17]

C. Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012.

Digital Library

[18]

R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: Predictable Low Latency for Data Center Applications. In SOCC, 2012.

Digital Library

[19]

Y. Kwok and I. Ahmad. Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors. ACM Computing Surveys (CSUR), 1999.

Digital Library

[20]

R. Nishtala et al. Scaling Memcache at Facebook. In NSDI, 2013.

Digital Library

[21]

L. Ravindranath, J. Padhye, S. Agarwal, R. Mahajan, I. Obermiller, and S. Shayandeh. AppInsight: Mobile App Performance Monitoring in the Wild. In OSDI, 2012.

Digital Library

[22]

E. Schurman and J. Brutlag. The User and Business Impact of Server Delays, Additional Bytes, and Http Chunking in Web Search. http://velocityconf.com/velocity2009/public/schedule/detail/8523, 2009.

[23]

A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker. More is Less: Reducing Latency via Redundancy. In HotNets, 2012.

Digital Library

[24]

X. S. Wang et al. Demystifying Page Load Performance with WProf. In NSDI, 2013.

Digital Library

[25]

C. Wilson et al. Better Never than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011.

Digital Library

[26]

H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast Congestion Control for TCP in Data Center Networks. In CONEXT, 2010.

Digital Library

[27]

Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding Long Tails in the Cloud. In NSDI, 2013.

Digital Library

[28]

M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In OSDI, 2008.

Digital Library

[29]

D. Zats et al. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In SIGCOMM, 2012.

Digital Library

[30]

S. Zilberstein. Using Anytime Algorithms in Intelligent Systems. AI Magazine, 17(3):73--83, 1996.

Cited By

Xu QYang JZhang FPan JChen KShen YZhou ADu X(2025)Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle InequalitiesProceedings of the ACM on Management of Data10.1145/37097433:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709743
Liu LXiao FHan LFan WHe X(2024)GTCC: A Game Theoretic Approach for Efficient Congestion Control in Datacenter NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.344309911:6(6328-6344)Online publication date: Nov-2024
https://doi.org/10.1109/TNSE.2024.3443099
Huang HDu XLi TWang HXu KWang MDai H(2024)Re-Architecting Buffer Management in Lossless EthernetIEEE/ACM Transactions on Networking10.1109/TNET.2024.343098932:6(4749-4764)Online publication date: Dec-2024
https://doi.org/10.1109/TNET.2024.3430989
Show More Cited By

Index Terms

Speeding up distributed request-response workflows
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles
        Organizing principles for web applications

Recommendations

Speeding up distributed request-response workflows
SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM

We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the ...
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing ...
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services
ASPLOS '15

Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCOMM Computer Communication Review

ACM SIGCOMM Computer Communication Review Volume 43, Issue 4

October 2013

595 pages

ISSN:0146-4833

DOI:10.1145/2534169

Issue’s Table of Contents

SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
August 2013
580 pages
ISBN:9781450320566
DOI:10.1145/2486001
General Chairs:
Dah Ming Chiu
Chinese University of Hong Kong, China
,
Jia Wang
AT&T Labs - Research, USA
,
Program Chairs:
Paul Barford
University of Wisconsin, USA
,
Srinivasan Seshan
Carnegie Mellon University, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2013

Published in SIGCOMM-CCR Volume 43, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

119
Total Citations
View Citations
1,040
Total Downloads

Downloads (Last 12 months)164
Downloads (Last 6 weeks)25

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu QYang JZhang FPan JChen KShen YZhou ADu X(2025)Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle InequalitiesProceedings of the ACM on Management of Data10.1145/37097433:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709743
Liu LXiao FHan LFan WHe X(2024)GTCC: A Game Theoretic Approach for Efficient Congestion Control in Datacenter NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.344309911:6(6328-6344)Online publication date: Nov-2024
https://doi.org/10.1109/TNSE.2024.3443099
Huang HDu XLi TWang HXu KWang MDai H(2024)Re-Architecting Buffer Management in Lossless EthernetIEEE/ACM Transactions on Networking10.1109/TNET.2024.343098932:6(4749-4764)Online publication date: Dec-2024
https://doi.org/10.1109/TNET.2024.3430989
Zhang JWang YZhong XYu MPan HZhang YGuan ZChe BWan ZPan THuang T(2024)PACC: A Proactive CNP Generation Scheme for Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.336177132:3(2586-2599)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3361771
Malebary SAlesawi SChe H(2023)Tail Prediction for Heterogeneous Data Center ClustersProcesses10.3390/pr1102040711:2(407)Online publication date: 30-Jan-2023
https://doi.org/10.3390/pr11020407
Wen SHan RLiu CChen L(2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1186/s13677-023-00465-z
Jiang WLi HYan YJi FHuang JWang JZhang T(2023)Consistent Low Latency Scheduler for Distributed Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.331577734:12(3012-3027)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3315777
Jiang WQiu YChen YJi FXie HZhou XChen JHuang JWang JLi Y(2023)Accelerated Information Dissemination for Replica Selection in Distributed Key-Value Store SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322164234:1(358-371)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPDS.2022.3221642
Mazloum AKfoury EGomez JCrichigno J(2023)A Survey on Rerouting Techniques with P4 Programmable Data Plane SwitchesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109795230:COnline publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.comnet.2023.109795
Cui PPan HLi ZZhang PMiao TZhou JGuan HXie G(2022)Enabling In-Network Floating-Point Arithmetic for Efficient Computation OffloadingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.320842533:12(4918-4934)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3208425
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents