research-article

Public Access

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Authors:
Yudong Chen

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Lili Su

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jiaming Xu

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 1 Issue 2Article No.: 44pp 1–25https://doi.org/10.1145/3154503

Published:19 December 2017Publication History

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Abstract

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks.

In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q ≤ for an arbitrarily small but fixed constant ε > 0. The parameter estimate converges in O(log N) rounds with an estimation error on the order of max{√dq/N, √d/N, which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q. The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log³ N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem.

A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

References

Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving Data Mining. SIGMOD Rec., Vol. 29, 2 (May 2000), 439--450. Google ScholarDigital Library
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Byzantine-Tolerant Machine Learning. arXiv preprint arXiv:1703.02757 (2017).Google Scholar
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, Vol. 3, 1 (2011), 1--122. Google ScholarDigital Library
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press. Google ScholarDigital Library
Hervé Cardot, Peggy Cénac, Pierre-André Zitt, et al\mbox. 2013. Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, Vol. 19, 1 (2013), 18--43.Google ScholarCross Ref
Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. 2016. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 9--21. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113. Google ScholarDigital Library
Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. Robust estimators in high dimensions without the computational intractability. Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 655--664.Google ScholarCross Ref
John Duchi, Martin J Wainwright, and Michael I Jordan. 2013. Local privacy and minimax bounds: Sharp rates for probability estimation. Advances in Neural Information Processing Systems. 1529--1537. Google ScholarDigital Library
Jiashi Feng, Huan Xu, and Shie Mannor. 2014. Distributed Robust Learning. arXiv preprint arXiv:1409.5937 (2014).Google Scholar
Michael I Jordan, Jason D Lee, and Yun Yang. 2016. Communication-Efficient Distributed Statistical Inference. arXiv preprint arXiv:1605.07689 (2016).Google Scholar
Charlie Kaufman, Radia Perlman, and Mike Speciner. 2002. Network security: private communication in a public world. Prentice Hall Press. Google ScholarDigital Library
JHB Kemperman. 1987. The median of a finite measure on a Banach space. Statistical data analysis based on the L1-norm and related methods (Neuchâtel, 1987) (1987), 217--230.Google Scholar
Jakub Konevcnỳ, Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015).Google Scholar
Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 665--674.Google ScholarCross Ref
Hendrik P Lopuhaa and Peter J Rousseeuw. 1991. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics (1991), 229--248.Google Scholar
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment Vol. 5, 8 (2012), 716--727. Google ScholarDigital Library
Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. \showISBNx9780080504704 Google ScholarDigital Library
Brendan McMahan and Daniel Ramage. Accessed: 2017-04--10. Federated Learning: Collaborative Machine Learning without Centralized Training Data. https://research.googleblog.com/2017/04/federated-learning-collaborative.html (Accessed: 2017-04--10).Google Scholar
Song Mei, Yu Bai, and Andrea Montanari. 2016. The landscape of empirical risk for non-convex losses. arXiv preprint arXiv:1607.06534 (2016).Google Scholar
P Milasevic, GR Ducharme, et al. 1987. Uniqueness of the spatial median. The Annals of Statistics Vol. 15, 3 (1987), 1332--1333.Google ScholarCross Ref
Stanislav Minsker et al. 2015. Geometric median and robust estimation in Banach spaces. Bernoulli, Vol. 21, 4 (2015), 2308--2335.Google ScholarCross Ref
Philipp Moritz, Robert Nishihara, Ion Stoica, and Michael I Jordan. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051 (2015).Google Scholar
Jyrki Möttönen, Klaus Nordhausen, Hannu Oja, et al. 2010. Asymptotic theory of the spatial median. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurevcková. Institute of Mathematical Statistics, 182--193.Google Scholar
Charles P. Pfleeger and Shari Lawrence Pfleeger. 2002. Security in Computing (3rd ed.). Prentice Hall Professional Technical Reference. Google ScholarDigital Library
Foster J Provost and Daniel N Hennessy. 1996. Scaling up: Distributed machine learning with cooperation. AAAI/IAAI, Vol. 1. Citeseer, 74--79. Google ScholarDigital Library
Hanif D Sherali and Dimitri P Bertsekas. 1998. Network optimization: Continuous and discrete models. (1998).Google Scholar
R. Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. Arxiv preprint arxiv:1011.3027 (2010).Google Scholar
Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou. 2010. Privacy-preserving public auditing for data storage security in cloud computing. Infocom, 2010 proceedings ieee. Ieee, 1--9. Google ScholarDigital Library
Yihong Wu. 2017. Lecture Notes on Information-theoretic Methods For High-dimensional Statistics. (April 2017). http://www.stat.yale.edu/ yw562/teaching/it-stats.pdf.Google Scholar
Yuchen Zhang, John Duchi, and Martin Wainwright. 2015. Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res Vol. 16 (2015), 3299--3340. Google ScholarDigital Library
Yuchen Zhang, John C. Duchi, and Martin J. Wainwright. 2013. Communication-Efficient Algorithms for Statistical Optimization. Journal of Machine Learning Research Vol. 14 (2013), 3321--3363. http://jmlr.org/papers/v14/zhang13b.html Google ScholarDigital Library

Index Terms

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
      1. MapReduce algorithms
  2. Machine learning
    1. Learning settings
      1. Batch learning
2. Security and privacy
  1. Network security
    1. Mobile and wireless security
  2. Systems security
    1. Distributed systems security

Recommendations

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
Read More
Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
SIGMETRICS '18

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
Read More
Securing Distributed Gradient Descent in High Dimensional Statistical Learning

We consider unreliable distributed learning systems wherein the training data is kept confidential by external workers, and the learner has to interact closely with those workers to train a model. In particular, we assume that there exists a system ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 1, Issue 2
December 2017
480 pages
EISSN:2476-1249
DOI:10.1145/3175501
Editors:
Augustin Chaintreau
Columbia University
,
Aditya Akella
University of Wisconsin-Madison
,
Adam Wierman
California Institute of Technology
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 December 2017
Published in pomacs Volume 1, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
byzantine adversaries
distributed systems
learning
security
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 209
  Total Citations
  View Citations
- 3,249
  Total Downloads
- Downloads (Last 12 months)573
- Downloads (Last 6 weeks)80
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Securing Distributed Gradient Descent in High Dimensional Statistical Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Securing Distributed Gradient Descent in High Dimensional Statistical Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media