JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration

Author(s): Bihao Sun, Jinhui Hu, Dawen Xia, Huaqing Li
Affiliation(s): Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China; more
Corresponding email(s): huaqingli@swu.edu.cn
Key Words: Distributed optimization, High-performance algorithm, Multi-agent system, Machine-learning problem, Stochastic gradient

Share this article to： More <<< Previous Paper \|Next Paper >>>

Bihao Sun, Jinhui Hu, Dawen Xia, Huaqing Li. A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2000615

@article{title="A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration",
author="Bihao Sun, Jinhui Hu, Dawen Xia, Huaqing Li",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2000615"
}

%0 Journal Article
%T A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration
%A Bihao Sun
%A Jinhui Hu
%A Dawen Xia
%A Huaqing Li
%J Frontiers of Information Technology & Electronic Engineering
%P 1463-1476
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2000615"

TY - JOUR
T1 - A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration
A1 - Bihao Sun
A1 - Jinhui Hu
A1 - Dawen Xia
A1 - Huaqing Li
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1463
EP - 1476
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2000615"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: distributed optimization has been well developed in recent years due to its wide applications in machine learning and signal processing. In this paper, we focus on investigating distributed optimization to minimize a global objective. The objective is a sum of smooth and strongly convex local cost functions which are distributed over an undirected network of n nodes. In contrast to existing works, we apply a distributed heavy-ball term to improve the convergence performance of the proposed algorithm. To accelerate the convergence of existing distributed stochastic first-order gradient methods, a momentum term is combined with a gradient-tracking technique. It is shown that the proposed algorithm has better acceleration ability than GT-SAGA without increasing the complexity. Extensive experiments on real-world datasets verify the effectiveness and correctness of the proposed algorithm.

基于梯度跟踪和分布式重球加速的分布式随机优化算法

孙碧皓¹，胡锦辉¹，夏大文²，李华青¹
¹西南大学电子信息工程学院非线性电路与智能信息处理重庆市重点实验室，中国重庆市，400715
²贵州民族大学数据科学与信息工程学院，中国贵阳市，550025
摘要：由于在机器学习和信号处理中的广泛应用，近年来分布式优化得到良好发展。本文致力于研究分布式优化以求解目标函数全局最小值。该目标是分布在个节点的无向网络上的平滑且强凸的局部成本函数总和。与已有工作不同的是，我们使用分布式重球项以提高算法的收敛性能。为使现有分布式随机一阶梯度算法的收敛加速，将动量项与梯度跟踪技术结合。仿真结果表明，在不增加复杂度的情况下，所提算法具有比GT-SAGA更高收敛速率。在真实数据集上的数值实验证明了该算法的有效性和正确性。

关键词组：分布式优化；高性能算法；多智能体系统；机器学习问题；随机梯度

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bertsekas D, Gafni E, 1983. Projected Newton methods and optimization of multicommodity flows. IEEE Trans Autom Contr, 28(12):1090-1096.

[2]Boyd S, Parikh N, Chu E, et al., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends^® Mach Learn, 3(1):1-122.

[3]Cheng B, Li ZK, 2019. Coordinated tracking control with asynchronous edge-based event-triggered communications. IEEE Trans Autom Contr, 64(10):4321-4328.

[4]Cheng S, Chen MY, Wai RJ, et al., 2014. Optimal placement of distributed generation units in distribution systems via an enhanced multi-objective particle swarm optimization algorithm. J Zhejiang Univ-Sci C (Comput & Electron), 15(4):300-311.

[5]Cohen K, Nedić A, Srikant R, 2017. Distributed learning algorithms for spectrum sharing in spatial random access wireless networks. IEEE Trans Autom Contr, 62(6):2854-2869.

[6]Defazio A, Bach F, Lacoste-Julien S, 2014. SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Proc 27^th Int Conf on Neural Information Processing Systems, p.1646-1654.

[7]Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[8]Duchi JC, Agarwal A, Wainwright MJ, 2012. Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans Autom Contr, 57(3):592-606.

[9]Eisen M, Mokhtari A, Ribeiro A, 2017. Decentralized quasi-Newton methods. IEEE Trans Signal Process, 65(10):2613-2628.

[10]Erseghe T, Zennaro D, Dall’Anese E, et al., 2011. Fast consensus by the alternating direction multipliers method. IEEE Trans Signal Process, 59(11):5523-5537.

[11]Guan L, Sun T, Qiao LB, et al., 2020. An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front Inform Technol Electron Eng, 21(4):587-603.

[12]Han ZM, Lin ZY, Fu MY, et al., 2015. Distributed coordination in multi-agent systems: a graph Laplacian perspective. Front Inform Technol Electron Eng, 16(6):429-448.

[13]Hu JH, Yan Y, Li HQ, et al., 2021. Convergence of an accelerated distributed optimisation algorithm over time-varying directed networks. IET Contr Theory Appl, 15(1):24-39.

[14]Johnson R, Zhang T, 2013. Accelerating stochastic gradient descent using predictive variance reduction. Proc 26^th Int Conf on Neural Information Processing Systems, p.315-323.

[15]Lan Q, Qiao LB, Wang YJ, 2018. Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front Inform Technol Electron Eng, 19(6):755-762.

[16]Li HQ, Cheng HQ, Wang Z, et al., 2020. Distributed Nesterov gradient and heavy-ball double accelerated asynchronous optimization. IEEE Trans Neur Netw Learn Syst, in press.

[17]Li Z, Shi W, Yan M, 2019. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans Signal Process, 67(17):4494-4506.

[18]Ling Q, Ribeiro A, 2014. Decentralized dynamic optimization through the alternating direction method of multipliers. IEEE Trans Signal Process, 62(5):1185-1197.

[19]Ling Q, Tian Z, 2010. Decentralized sparse signal recovery for compressive sleeping wireless sensor networks. IEEE Trans Signal Process, 58(7):3816-3827.

[20]Liu R, Sun WC, Hou T, et al., 2019. Block coordinate descentwith time perturbation for nonconvex nonsmooth problems in real-world studies. Front Inform Technol Electron Eng, 20(10):1390-1403.

[21]Lü QG, Liao XF, Li HQ, et al., 2020. A Nesterov-like gradient tracking algorithm for distributed optimization over directed networks. IEEE Trans Syst Man Cybern, in press.

[22]Mateos G, Bazerque JA, Giannakis GB, 2010. Distributed sparse linear regression. IEEE Trans Signal Process, 58(10):5262-5276.

[23]Matthews TP, Wang K, Li CP, et al., 2016. Nonlinear waveform inversion by use of the regularized dual averaging method for ultrasound computed tomography. Progress in Electromagnetic Research Symp, p.3948.

[24]McMahan B, Moore E, Ramage D, et al., 2017. Communication-efficient learning of deep networks from decentralized data. Proc 20^th Int Conf on Artificial Intelligence and Statistics, p.1273-1282.

[25]Nedić A, Ozdaglar A, 2009. Distributed subgradient methods for multi-agent optimization. IEEE Trans Autom Contr, 54(1):48-61.

[26]Nedić A, Olshevsky A, Shi W, 2017a. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J Optim, 27(4):2597-2633.

[27]Nedić A, Olshevsky A, Shi W, et al., 2017b. Geometrically convergent distributed optimization with uncoordinated step-sizes. American Control Conf, p.3950-3955.

[28]Schmidt M, Le Roux N, Bach F, 2017. Minimizing finite sums with the stochastic average gradient. Math Program, 162(1-2):83-112.

[29]Tan CH, Ma SQ, Dai YH, et al., 2016. Barzilai-Borwein step size for stochastic gradient descent. Proc 30^th Int Conf on Neural Information Processing Systems, p.685-693.

[30]Tsitsiklis J, Bertsekas D, Athans M, 1986. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans Autom Contr, 31(9):803-812.

[31]Wang B, Jiang HY, Fang J, et al., 2018. A proximal ADMM for decentralized composite optimization. IEEE Signal Process Lett, 25(8):1121-1125.

[32]Wang Z, Li HQ, 2020. Edge-based stochastic gradient algorithm for distributed optimization. IEEE Trans Netw Sci Eng, 7(3):1421-1430.

[33]Wei EM, Ozdaglar A, Jadbabaie A, 2013. A distributed Newton method for network utility maximization—I: algorithm. IEEE Trans Autom Contr, 58(9):2162-2175.

[34]Xi CG, Khan UA, 2017. DEXTRA: a fast algorithm for optimization over directed graphs. IEEE Trans Autom Contr, 62(10):4980-4993.

[35]Xia YS, Wang J, 2004. A one-layer recurrent neural network for support vector machine learning. IEEE Trans Syst Man Cybern B, 34(2):1261-1269.

[36]Xin R, Khan UA, 2018. A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Contr Syst Lett, 2(3):315-320.

[37]Xin R, Khan UA, 2020. Distributed heavy-ball: a generalization and acceleration of first-order methods with gradient tracking. IEEE Trans Autom Contr, 65(6):2627-2633.

[38]Xin R, Jakovetić D, Khan UA, 2019a. Distributed Nesterov gradient methods over arbitrary graphs. IEEE Signal Process Lett, 26(8):1247-1251.

[39]Xin R, Sahu AK, Khan UA, et al., 2019b. Distributed stochastic optimization with gradient tracking over strongly-connected networks. Proc IEEE 58^th Conf on Decision and Control, p.8353-8358.

[40]Xin R, Xi CG, Khan UA, 2019c. FROST—fast row-stochastic optimization with uncoordinated step-sizes. EURASIP J Adv Signal Process, 2019(1):1.

[41]Xin R, Khan UA, Kar S, 2020. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Trans Signal Process, 68:6255-6271.

[42]Xu JM, Zhu SY, Soh YC, et al., 2015. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. Proc 54^th IEEE Conf on Decision and Control, p.2055-2060.

[43]Yin R, Zhang Y, Yu GD, et al., 2010. Centralized and distributed resource allocation in OFDM based multi-relay system. J Zhejiang Univ-Sci C (Comput & Electron), 11(6):450-464.

[44]Yuan DM, Ma Q, Wang Z, 2013. Distributed dual averaging method for solving saddle-point problems over multi-agent networks. Proc 32^nd Chinese Control Conf, p.6868-6872.

[45]Zhang CL, Ahmad M, Wang YQ, 2019. ADMM based privacy-preserving decentralized optimization. IEEE Trans Inform Forens Secur, 14(3):565-580.

[46]Zinkevich MA, Weimer M, Smola A, et al., 2010. Parallelized stochastic gradient descent. Proc 23^th Int Conf on Neural Information Processing Systems, p.2595-2603.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于梯度跟踪和分布式重球加速的分布式随机优化算法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference