JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies

Author(s): Rui Liu, Wei-chu Sun, Tao Hou, Chun-hong Hu, Lin-bo Qiao
Affiliation(s): Department of Oncology, The Second Xiangya Hospital of Central South University, Changsha 410011, China; more
Corresponding email(s): liuruirui@csu.edu.cn, smsysun@foxmail.com, houtao@csu.edu.cn, huchunhong@csu.edu.cn, qiao.linbo@nudt.edu.cn
Key Words: Convergence analysis, Asynchronous block coordinate descent method, Time perturbation, Nonconvex nonsmooth optimization, Real-world study

Share this article to： More <<< Previous Paper \|Next Paper >>>

Rui Liu, Wei-chu Sun, Tao Hou, Chun-hong Hu, Lin-bo Qiao. Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900341

@article{title="Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies",
author="Rui Liu, Wei-chu Sun, Tao Hou, Chun-hong Hu, Lin-bo Qiao",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1900341"
}

%0 Journal Article
%T Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies
%A Rui Liu
%A Wei-chu Sun
%A Tao Hou
%A Chun-hong Hu
%A Lin-bo Qiao
%J Frontiers of Information Technology & Electronic Engineering
%P 1390-1403
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1900341"

TY - JOUR
T1 - Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies
A1 - Rui Liu
A1 - Wei-chu Sun
A1 - Tao Hou
A1 - Chun-hong Hu
A1 - Lin-bo Qiao
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1390
EP - 1403
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1900341"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale real-world problems due to a large amount of data that needs to be analyzed and the difficulty in solving problems with nonconvex nonlinear settings. We aim to minimize the composite of a smooth nonlinear function and a block-separable nonconvex function on a large number of block variables with inequality constraints. We propose a novel parallel first-order optimization method, called asynchronous block coordinate descent with time perturbation (ATP), which adopts a time perturbation technique that escapes from saddle points and sub-optimal local points. The details of the proposed method are presented with analyses of convergence and iteration complexity properties. Experiments conducted on real-world machine learning problems validate the efficacy of our proposed method. The experimental results demonstrate that time perturbation enables ATP to escape from saddle points and sub-optimal points, providing a promising way to handle nonconvex optimization problems with inequality constraints employing asynchronous block coordinate descent. The asynchronous parallel implementation on shared memory multi-core platforms indicates that the proposed algorithm, ATP, has strong scalability.

面向真实世界研究中非凸非平滑问题的具有时间扰动的块坐标下降法

摘要：真实世界研究的大数据时代已经来临；这个时代将极大促进医学发展，尤其是肿瘤学。然而，鉴于大规模数据量的增加以及求解的目标问题具有非凸非平滑等不易求解的函数性质，传统机器学习方法不能很好解决这类新问题。我们的目标是求解一个带不等式约束的优化问题，该优化问题是由一个平滑非线性函数与大量块变量可分的非凸非平滑目标函数组合相加而得。提出一种新的并行一阶优化方法，称为带时间扰动的异步块坐标下降法（asynchronous block coordinate descent with time perturbation，ATP）。该方法采用一种从鞍点和次优局部点逃脱的时间扰动技术。通过分析收敛性和迭代复杂度特性，介绍了该方法的详细内容。针对真实世界研究机器学习问题的实验验证了本文所提方法的有效性。实验结果表明，时间扰动使ATP能从鞍点和次优点逃脱；采用异步块坐标下降法为处理具有不等式约束的非凸优化问题提供了一种可行方法。在共享内存多核平台上异步并行的实现，表明该算法具有很强可扩展性。

关键词组：收敛分析；异步块坐标下降法；时间扰动；非凸非平滑优化；真实世界研究

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Anandkumar A, Ge R, 2016. Efficient approaches for escaping higher order saddle points in non-convex optimization. Proc 29^th Annual Conf on Learning Theory, p.81-102.

[2]Bertsekas DP, Tsitsiklis JN, 1989. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs, NJ, USA.

[3]Bertsimas D, Bjarnadóttir MV, Kane MA, et al., 2008. Algorithmic prediction of health-care costs. Oper Res, 56(6):1382-1392.

[4]Cannelli L, Facchinei F, Kungurtsev V, et al., 2018. Asynchronous parallel algorithms for nonconvex optimization. https://arxiv.org/abs/1607.04818

[5]Dagum L, Menon R, 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng, 5(1):46-55.

[6]Fan JQ, Li RZ, 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc, 96(456):1348-1360.

[7]Friedman J, Hastie T, Tibshirani R, 2001. The Elements of Statistical Learning. Springer, Berlin, Germany.

[8]Ge R, Huang FR, Jin C, et al., 2015. Escaping from saddle points—online stochastic gradient for tensor decomposition. Proc 28^th Conf on Learning Theory, p.797-842.

[9]Guennebaud G, Jacob B, 2010. Eigen v3. http://eigen.tuxfamily.org

[10]Hazan E, Levy KY, Shalev-Shwartz S, 2016. On graduated optimization for stochastic non-convex problems. Proc 33^rd Int Conf on Machine Learning, p.1833-1841.

[11]Hong MY, Luo ZQ, Razaviyayn M, 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim, 26(1):337-364.

[12]James GM, Paulson C, Rusmevichientong P, 2012. The Constrained Lasso. Working Paper, University of Southern California, Los Angeles, California, USA.

[13]Jiang B, Lin TY, Ma SQ, et al., 2019. Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl, 72(1):115-157.

[14]Jin C, Ge R, Netrapalli P, et al., 2017. How to escape saddle points efficiently. Proc 34^th Int Conf on Machine Learning, p.1724-1732.

[15]Khozin S, Blumenthal GM, Pazdur R, 2017. Real-world data for clinical evidence generation in oncology. J Nat Cancer Inst, 109(11), Article djx187.

[16]Li D, Lai Z, Ge K, et al., 2019. {HPDL: towards a general framework for high-performance distributed deep learning. Proc 39^th IEEE Int Conf on Distributed Computing Systems,} p.1742-1753.

[17]Li JQ, Vachani A, Epstein A, et al., 2018. A doubly robust approach for cost-effectiveness estimation from observational data. Stat Methods Med Res, 27(10):3126-3138.

[18]Liu R, 2019. Asynchronous block coordinate descent method for large-scale nonconvex problem in real world study. Proc IEEE 21^st Int Conf on High Performance Computing and Communications, jointly with IEEE 17^th Int Conf on Smart City and IEEE 5^th Int Conf on Data Science and Systems, p.2033-2038.

[19]Nesterov Y, 2012. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim, 22(2):341-362.

[20]Nesterov Y, Nemirovskii A, 1994. {Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, USA.

[21]Palomar DP, Chiang M, 2006. A tutorial on decomposition methods for network utility maximization. IEEE J Sel Areas Commun, 24(8):1439-1451.

[22]Peng ZM, Xu YY, Yan M, et al., 2019. On the convergence of asynchronous parallel iteration with unbounded delays. J Oper Res Soc China, 7(1):5-42.

[23]Qiao LB, Zhang BF, Su JS, et al., 2016a. Linearized alternating direction method of multipliers for constrained nonconvex regularized optimization. Proc 8^th Asian Conf on Machine Learning, p.97-109.

[24]Qiao LB, Lin TY, Jiang YG, et al., 2016b. On stochastic primal-dual hybrid gradient approach for compositely regularized minimization. Proc 22^nd European Conf on Artificial Intelligence, p.167-174.

[25]Qiao LB, Zhang BF, Lu XC, et al., 2017. Adaptive linearized alternating direction method of multipliers for non-convex compositely regularized optimization problems. Tsinghua Sci Technol, 22(3):328-341.

[26]Qiao LB, Lin TY, Qin Q, et al., 2018. On the iteration complexity analysis of stochastic primal-dual hybrid gradient approach with high probability. Neurocomputing, 307:78-90.

[27]Razaviyayn M, Hong MY, Luo ZQ, et al., 2014. Parallel successive convex approximation for nonsmooth nonconvex optimization. Proc 27^th Int Conf on Neural Information Processing Systems, p.1440-1448.

[28]Rockafellar RT, Wets RJB, 2009. Variational Analysis. Springer, New York, USA.

[29]Shen L, Liu W, Yuan G, et al., 2017. GSOS: Gauss-Seidel operator splitting algorithm for multi-term nonsmooth convex composite optimization. Proc 34^th Int Conf on Machine Learning, p.3125-3134.

[30]Shen L, Sun P, Wang YT, et al., 2018. {An algorithmic framework of variable metric over-relaxed hybrid proximal extra-gradient method. https://arxiv.org/abs/1805.06137}

[31]Sun MZ, Jiang Y, Sun C, et al., 2019. The associations between smoking and obesity in northeast China: a quantile regression analysis. Sci Rep, 9, Article 3732.

[32]Sun T, Hannah R, Yin W, et al., 2017. Asynchronous coordinate descent under more realistic assumptions. Proc Advances in Neural Information Processing Systems, p.6182-6190.

[33]Wang X, Ma SQ, Goldfarb D, et al., 2017. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim, 27(2):927-956.

[34]Wang Y, Yin W, Zeng J, 2019. Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput, 78(1):29-63.

[35]Xiao L, Johansson M, Boyd SP, 2004. Simultaneous routing and resource allocation via dual decomposition. IEEE Trans Commun, 52(7):1136-1144.

[36]Xu YY, 2019. Asynchronous parallel primal-dual block coordinate update methods for affinely constrained convex programs. https://arxiv.org/abs/1705.06391

[37]Xu YY, Yin W, 2017. A globally convergent algorithm for nonconvex optimization based on block coordinate update. J Sci Comput, 72(2):700-734.

[38]Zhang CH, 2010. Nearly unbiased variable selection under minimax concave penalty. Ann Stat, 38(2):894-942.

[39]Zhang T, 2010. Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res, 11:1081-1107.

[40]Zhang X, Liu J, Zhu ZY, 2018. Taming convergence for asynchronous stochastic gradient descent with unbounded delay in non-convex learning. https://arxiv.org/abs/1805.09470

[41]Zou FY, Shen L, Jie ZQ, et al., 2018. A sufficient condition for convergences of Adam and RMSProp. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.11127-11135. https://arxiv.org/abs/1811.09358

[42]Zou FY, Shen L, Jie ZQ, et al., 2019. {Weighted AdaGrad with unified momentum. https://arxiv.org/abs/1808.03408}

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

面向真实世界研究中非凸非平滑问题的具有时间扰动的块坐标下降法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference