CLC number: TP391
On-line Access: 2019-11-11
Received: 2019-07-09
Revision Accepted: 2019-09-16
Crosschecked: 2019-10-25
Cited: 0
Clicked: 5757
Rui Liu, Wei-chu Sun, Tao Hou, Chun-hong Hu, Lin-bo Qiao. Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(10): 1390-1403.
@article{title="Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies",
author="Rui Liu, Wei-chu Sun, Tao Hou, Chun-hong Hu, Lin-bo Qiao",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="20",
number="10",
pages="1390-1403",
year="2019",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900341"
}
%0 Journal Article
%T Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies
%A Rui Liu
%A Wei-chu Sun
%A Tao Hou
%A Chun-hong Hu
%A Lin-bo Qiao
%J Frontiers of Information Technology & Electronic Engineering
%V 20
%N 10
%P 1390-1403
%@ 2095-9184
%D 2019
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900341
TY - JOUR
T1 - Block coordinate descent with time perturbation for nonconvex nonsmooth problems in real-world studies
A1 - Rui Liu
A1 - Wei-chu Sun
A1 - Tao Hou
A1 - Chun-hong Hu
A1 - Lin-bo Qiao
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 20
IS - 10
SP - 1390
EP - 1403
%@ 2095-9184
Y1 - 2019
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900341
Abstract: The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale real-world problems due to a large amount of data that needs to be analyzed and the difficulty in solving problems with nonconvex nonlinear settings. We aim to minimize the composite of a smooth nonlinear function and a block-separable nonconvex function on a large number of block variables with inequality constraints. We propose a novel parallel first-order optimization method, called asynchronous block coordinate descent with time perturbation (ATP), which adopts a time perturbation technique that escapes from saddle points and sub-optimal local points. The details of the proposed method are presented with analyses of convergence and iteration complexity properties. Experiments conducted on real-world machine learning problems validate the efficacy of our proposed method. The experimental results demonstrate that time perturbation enables ATP to escape from saddle points and sub-optimal points, providing a promising way to handle nonconvex optimization problems with inequality constraints employing asynchronous block coordinate descent. The asynchronous parallel implementation on shared memory multi-core platforms indicates that the proposed algorithm, ATP, has strong scalability.
[1]Anandkumar A, Ge R, 2016. Efficient approaches for escaping higher order saddle points in non-convex optimization. Proc 29th Annual Conf on Learning Theory, p.81-102.
[2]Bertsekas DP, Tsitsiklis JN, 1989. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs, NJ, USA.
[3]Bertsimas D, Bjarnadóttir MV, Kane MA, et al., 2008. Algorithmic prediction of health-care costs. Oper Res, 56(6):1382-1392.
[4]Cannelli L, Facchinei F, Kungurtsev V, et al., 2018. Asynchronous parallel algorithms for nonconvex optimization. https://arxiv.org/abs/1607.04818
[5]Dagum L, Menon R, 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng, 5(1):46-55.
[6]Fan JQ, Li RZ, 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc, 96(456):1348-1360.
[7]Friedman J, Hastie T, Tibshirani R, 2001. The Elements of Statistical Learning. Springer, Berlin, Germany.
[8]Ge R, Huang FR, Jin C, et al., 2015. Escaping from saddle points—online stochastic gradient for tensor decomposition. Proc 28th Conf on Learning Theory, p.797-842.
[9]Guennebaud G, Jacob B, 2010. Eigen v3. http://eigen.tuxfamily.org
[10]Hazan E, Levy KY, Shalev-Shwartz S, 2016. On graduated optimization for stochastic non-convex problems. Proc 33rd Int Conf on Machine Learning, p.1833-1841.
[11]Hong MY, Luo ZQ, Razaviyayn M, 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim, 26(1):337-364.
[12]James GM, Paulson C, Rusmevichientong P, 2012. The Constrained Lasso. Working Paper, University of Southern California, Los Angeles, California, USA.
[13]Jiang B, Lin TY, Ma SQ, et al., 2019. Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl, 72(1):115-157.
[14]Jin C, Ge R, Netrapalli P, et al., 2017. How to escape saddle points efficiently. Proc 34th Int Conf on Machine Learning, p.1724-1732.
[15]Khozin S, Blumenthal GM, Pazdur R, 2017. Real-world data for clinical evidence generation in oncology. J Nat Cancer Inst, 109(11), Article djx187.
[16]Li D, Lai Z, Ge K, et al., 2019. {HPDL: towards a general framework for high-performance distributed deep learning. Proc 39th IEEE Int Conf on Distributed Computing Systems,} p.1742-1753.
[17]Li JQ, Vachani A, Epstein A, et al., 2018. A doubly robust approach for cost-effectiveness estimation from observational data. Stat Methods Med Res, 27(10):3126-3138.
[18]Liu R, 2019. Asynchronous block coordinate descent method for large-scale nonconvex problem in real world study. Proc IEEE 21st Int Conf on High Performance Computing and Communications, jointly with IEEE 17th Int Conf on Smart City and IEEE 5th Int Conf on Data Science and Systems, p.2033-2038.
[19]Nesterov Y, 2012. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim, 22(2):341-362.
[20]Nesterov Y, Nemirovskii A, 1994. {Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, USA.
[21]Palomar DP, Chiang M, 2006. A tutorial on decomposition methods for network utility maximization. IEEE J Sel Areas Commun, 24(8):1439-1451.
[22]Peng ZM, Xu YY, Yan M, et al., 2019. On the convergence of asynchronous parallel iteration with unbounded delays. J Oper Res Soc China, 7(1):5-42.
[23]Qiao LB, Zhang BF, Su JS, et al., 2016a. Linearized alternating direction method of multipliers for constrained nonconvex regularized optimization. Proc 8th Asian Conf on Machine Learning, p.97-109.
[24]Qiao LB, Lin TY, Jiang YG, et al., 2016b. On stochastic primal-dual hybrid gradient approach for compositely regularized minimization. Proc 22nd European Conf on Artificial Intelligence, p.167-174.
[25]Qiao LB, Zhang BF, Lu XC, et al., 2017. Adaptive linearized alternating direction method of multipliers for non-convex compositely regularized optimization problems. Tsinghua Sci Technol, 22(3):328-341.
[26]Qiao LB, Lin TY, Qin Q, et al., 2018. On the iteration complexity analysis of stochastic primal-dual hybrid gradient approach with high probability. Neurocomputing, 307:78-90.
[27]Razaviyayn M, Hong MY, Luo ZQ, et al., 2014. Parallel successive convex approximation for nonsmooth nonconvex optimization. Proc 27th Int Conf on Neural Information Processing Systems, p.1440-1448.
[28]Rockafellar RT, Wets RJB, 2009. Variational Analysis. Springer, New York, USA.
[29]Shen L, Liu W, Yuan G, et al., 2017. GSOS: Gauss-Seidel operator splitting algorithm for multi-term nonsmooth convex composite optimization. Proc 34th Int Conf on Machine Learning, p.3125-3134.
[30]Shen L, Sun P, Wang YT, et al., 2018. {An algorithmic framework of variable metric over-relaxed hybrid proximal extra-gradient method. https://arxiv.org/abs/1805.06137}
[31]Sun MZ, Jiang Y, Sun C, et al., 2019. The associations between smoking and obesity in northeast China: a quantile regression analysis. Sci Rep, 9, Article 3732.
[32]Sun T, Hannah R, Yin W, et al., 2017. Asynchronous coordinate descent under more realistic assumptions. Proc Advances in Neural Information Processing Systems, p.6182-6190.
[33]Wang X, Ma SQ, Goldfarb D, et al., 2017. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim, 27(2):927-956.
[34]Wang Y, Yin W, Zeng J, 2019. Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput, 78(1):29-63.
[35]Xiao L, Johansson M, Boyd SP, 2004. Simultaneous routing and resource allocation via dual decomposition. IEEE Trans Commun, 52(7):1136-1144.
[36]Xu YY, 2019. Asynchronous parallel primal-dual block coordinate update methods for affinely constrained convex programs. https://arxiv.org/abs/1705.06391
[37]Xu YY, Yin W, 2017. A globally convergent algorithm for nonconvex optimization based on block coordinate update. J Sci Comput, 72(2):700-734.
[38]Zhang CH, 2010. Nearly unbiased variable selection under minimax concave penalty. Ann Stat, 38(2):894-942.
[39]Zhang T, 2010. Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res, 11:1081-1107.
[40]Zhang X, Liu J, Zhu ZY, 2018. Taming convergence for asynchronous stochastic gradient descent with unbounded delay in non-convex learning. https://arxiv.org/abs/1805.09470
[41]Zou FY, Shen L, Jie ZQ, et al., 2018. A sufficient condition for convergences of Adam and RMSProp. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.11127-11135. https://arxiv.org/abs/1811.09358
[42]Zou FY, Shen L, Jie ZQ, et al., 2019. {Weighted AdaGrad with unified momentum. https://arxiv.org/abs/1808.03408}
Open peer comments: Debate/Discuss/Question/Opinion
<1>