skip to main content
research-article

Speeding up distributed request-response workflows

Published: 27 August 2013 Publication History

Abstract

We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.

References

[1]
Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2/.
[2]
M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010.
[3]
M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing Datacenter Packet Transport. In Hotnets, 2012.
[4]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in MapReduce Clusters Using Mantri. In OSDI, 2010.
[5]
D. G. Andersen, H. Balakrishnan, M. F. Kaashoek, and R. N. Rao. Improving web availability for clients with MONET. In NSDI, 2005.
[6]
W. Baek and T. M. Chilimbi. Green: A Framework for Supporting Energy-Conscious Programming using Controlled Approximation. In PLDI, 2010.
[7]
S. Boucheron, G. Lugosi, and O. Bousquet. Concentration inequalities. Advanced Lectures on Machine Learning, 2004.
[8]
J. Brutlag. Speed matters for Google web search. http://googleresearch.blogspot.com/2009/06/speedmatters. html, 2009.
[9]
J. R. Dabrowski and E. V. Munson. Is 100 Milliseconds Too Fast? In CHI, 2001.
[10]
J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2):74--80, Feb. 2013.
[11]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[12]
G. Decandia et al. Dynamo : Amazon's Highly Available Key-value Store. In SOSP, 2007.
[13]
A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In EuroSys, 2012.
[14]
D. Han, A. Anand, A. Akella, and S. Seshan. RPT: Re-architecting Loss Protection for Content-Aware Networks. In NSDI, 2012.
[15]
Y. He, S. Elnikety, J. Larus, and C. Yan. Zeta: Scheduling Interactive Services with Partial Execution. In SOCC, 2012.
[16]
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic Knobs for Responsive Power-Aware Computing. In ASPLOS, 2011.
[17]
C. Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012.
[18]
R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: Predictable Low Latency for Data Center Applications. In SOCC, 2012.
[19]
Y. Kwok and I. Ahmad. Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors. ACM Computing Surveys (CSUR), 1999.
[20]
R. Nishtala et al. Scaling Memcache at Facebook. In NSDI, 2013.
[21]
L. Ravindranath, J. Padhye, S. Agarwal, R. Mahajan, I. Obermiller, and S. Shayandeh. AppInsight: Mobile App Performance Monitoring in the Wild. In OSDI, 2012.
[22]
E. Schurman and J. Brutlag. The User and Business Impact of Server Delays, Additional Bytes, and Http Chunking in Web Search. http://velocityconf.com/velocity2009/public/schedule/detail/8523, 2009.
[23]
A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker. More is Less: Reducing Latency via Redundancy. In HotNets, 2012.
[24]
X. S. Wang et al. Demystifying Page Load Performance with WProf. In NSDI, 2013.
[25]
C. Wilson et al. Better Never than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011.
[26]
H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast Congestion Control for TCP in Data Center Networks. In CONEXT, 2010.
[27]
Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding Long Tails in the Cloud. In NSDI, 2013.
[28]
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In OSDI, 2008.
[29]
D. Zats et al. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In SIGCOMM, 2012.
[30]
S. Zilberstein. Using Anytime Algorithms in Intelligent Systems. AI Magazine, 17(3):73--83, 1996.

Cited By

View all
  • (2025)Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle InequalitiesProceedings of the ACM on Management of Data10.1145/37097433:1(1-28)Online publication date: 11-Feb-2025
  • (2024)GTCC: A Game Theoretic Approach for Efficient Congestion Control in Datacenter NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.344309911:6(6328-6344)Online publication date: Nov-2024
  • (2024)Re-Architecting Buffer Management in Lossless EthernetIEEE/ACM Transactions on Networking10.1109/TNET.2024.343098932:6(4749-4764)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCOMM Computer Communication Review
ACM SIGCOMM Computer Communication Review  Volume 43, Issue 4
October 2013
595 pages
ISSN:0146-4833
DOI:10.1145/2534169
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
    August 2013
    580 pages
    ISBN:9781450320566
    DOI:10.1145/2486001
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2013
Published in SIGCOMM-CCR Volume 43, Issue 4

Check for updates

Author Tags

  1. distributed services.
  2. incomplete results
  3. interactive services
  4. optimization
  5. reissues
  6. tail latency

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)164
  • Downloads (Last 6 weeks)25
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle InequalitiesProceedings of the ACM on Management of Data10.1145/37097433:1(1-28)Online publication date: 11-Feb-2025
  • (2024)GTCC: A Game Theoretic Approach for Efficient Congestion Control in Datacenter NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.344309911:6(6328-6344)Online publication date: Nov-2024
  • (2024)Re-Architecting Buffer Management in Lossless EthernetIEEE/ACM Transactions on Networking10.1109/TNET.2024.343098932:6(4749-4764)Online publication date: Dec-2024
  • (2024)PACC: A Proactive CNP Generation Scheme for Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.336177132:3(2586-2599)Online publication date: Jun-2024
  • (2023)Tail Prediction for Heterogeneous Data Center ClustersProcesses10.3390/pr1102040711:2(407)Online publication date: 30-Jan-2023
  • (2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
  • (2023)Consistent Low Latency Scheduler for Distributed Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.331577734:12(3012-3027)Online publication date: 1-Dec-2023
  • (2023)Accelerated Information Dissemination for Replica Selection in Distributed Key-Value Store SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322164234:1(358-371)Online publication date: 1-Jan-2023
  • (2023)A Survey on Rerouting Techniques with P4 Programmable Data Plane SwitchesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109795230:COnline publication date: 1-Jul-2023
  • (2022)Enabling In-Network Floating-Point Arithmetic for Efficient Computation OffloadingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.320842533:12(4918-4934)Online publication date: 1-Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media