Full Text:   <264>

Summary:  <83>

CLC number: TP391

On-line Access: 2019-12-10

Received: 2018-09-25

Revision Accepted: 2019-06-23

Crosschecked: 2019-10-10

Cited: 0

Clicked: 574

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Meng-long Lu

http://orcid.org/0000-0003-0011-0368

Da-wei Feng

http://orcid.org/0000-0002-7587-8905

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2019 Vol.20 No.11 P.1551-1563

10.1631/FITEE.1800596


Mini-batch cutting plane method for regularized risk minimization


Author(s):  Meng-long Lu, Lin-bo Qiao, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu

Affiliation(s):  Science and Technology on Parallel and Distributed Laboratory, National University of Defense Technology, Changsha 410073, China

Corresponding email(s):   lumenglong2018@163.com, davyfeng.c@gmail.com

Key Words:  Machine learning, Optimization methods, Gradient methods, Cutting plane method


Meng-long Lu, Lin-bo Qiao, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu. Mini-batch cutting plane method for regularized risk minimization[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(11): 1551-1563.

@article{title="Mini-batch cutting plane method for regularized risk minimization",
author="Meng-long Lu, Lin-bo Qiao, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="20",
number="11",
pages="1551-1563",
year="2019",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800596"
}

%0 Journal Article
%T Mini-batch cutting plane method for regularized risk minimization
%A Meng-long Lu
%A Lin-bo Qiao
%A Da-wei Feng
%A Dong-sheng Li
%A Xi-cheng Lu
%J Frontiers of Information Technology & Electronic Engineering
%V 20
%N 11
%P 1551-1563
%@ 2095-9184
%D 2019
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800596

TY - JOUR
T1 - Mini-batch cutting plane method for regularized risk minimization
A1 - Meng-long Lu
A1 - Lin-bo Qiao
A1 - Da-wei Feng
A1 - Dong-sheng Li
A1 - Xi-cheng Lu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 20
IS - 11
SP - 1551
EP - 1563
%@ 2095-9184
Y1 - 2019
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800596


Abstract: 
Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable model. Solution to the convex problem is a global minimum, and the final model can be explained mathematically. Typically, the convex problem is re-casted as a regularized risk minimization problem to prevent overfitting. The cutting plane method (CPM) is one of the best solvers for the convex problem, irrespective of whether the objective function is differentiable or not. However, CPM and its variants fail to adequately address large-scale data-intensive cases because these algorithms access the entire dataset in each iteration, which substantially increases the computational burden and memory cost. To alleviate this problem, we propose a novel algorithm named the mini-batch cutting plane method (MBCPM), which iterates with estimated cutting planes calculated on a small batch of sampled data and is capable of handling large-scale problems. Furthermore, the proposed MBCPM adopts a “sink” operation that detects and adjusts noisy estimations to guarantee convergence. Numerical experiments on extensive real-world datasets demonstrate the effectiveness of MBCPM, which is superior to the bundle methods for regularized risk minimization as well as popular stochastic gradient descent methods in terms of convergence speed.

正则风险最小化的小批量割平面法

摘要:虽然最近求解非凸问题的研究十分热门,尤其在机器学习需要可解释性模型情况下,凸优化仍然重要。求解凸问题可得到全局最优解,故而最终模型可用数学方法解释。通常为防止过度拟合,凸问题被重新描述为一个正则风险最小化问题。无论目标函数是否可微,割平面法是求解凸问题最佳方法之一。然而,割平面法及其变体无法充分应对大规模密集型数据,因为这些算法每次迭代都需访问整个数据集,大大增加了计算负担和内存成本。为解决这一问题,提出一种新的小批量割平面法。该算法通过对小批量采样数据进行计算,得到估计切割平面用于迭代,使其能处理大规模数据。此外,小批量割平面法使用“sink”算子检测和调整噪声估计以保证收敛性。在大量实际数据集上的数值实验证明了小批量割平面法有效性,收敛速度优于正则风险最小化的bundle法和普遍使用的随机梯度下降法。

关键词:机器学习;优化方法;梯度法;割平面法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Belloni A, 2005. Introduction to Bundle Methods. Lecture Notes for IAP, Operations Research Center, MIT, USA.

[2]Bennett KP, Mangasarian OL, 1992. Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw, 1(1):23-34.

[3]Bottou L, 2010. Large-scale machine learning with stochastic gradient descent. Proc 19th Int Conf on Computational Statistics, p.177-187.

[4]Crammer K, Singer Y, 2003. Ultraconservative online algorithms for multiclass problems. J Mach Learn Res, 3:951-991.

[5]Duarte M, Hu YH, 2004. Vehicle classification in distributed sensor networks. J Parall Distr Comput, 64(7):826-838.

[6]Duchi J, Hazan E, Singer Y, 2011. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res, 12:2121-2159.

[7]Franc V, Sonnenburg S, 2008. Optimized cutting plane algorithm for support vector machines. Proc 25th Int Conf on Machine Learning, p.320-327.

[8]Hiriart-Urruty JB, Lemaréchal C, 1993. Convex Analysis and Minimization Algorithms I. Springer, Berlin, Germany.

[9]Kelley JEJr, 1960. The cutting-plane method for solving convex programs. J Soc Ind Appl Math, 8(4):703-712.

[10]Kingma DP, Ba J, 2014. Adam: a method for stochastic optimization. https://arxiv.org/abs/1412.6980

[11]Kiwiel KC, 1983. An aggregate subgradient method for nonsmooth convex minimization. Math Program, 27(3):320-341.

[12]Kiwiel KC, 1990. Proximity control in bundle methods for convex nondifferentiable minimization. Math Program, 46(1):105-122.

[13]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324.

[14]Lemaréchal C, Nemirovskii A, Nesterov Y, 1995. New variants of bundle methods. Math Program, 69(1):111-147.

[15]Ma J, Saul LK, Savage S, et al., 2009. Identifying suspicious URLs: an application of large-scale online learning. Proc 26th Int Conf on Machine Learning, p.681-688.

[16]Mokhtari A, Eisen M, Ribeiro A, 2018. IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Opt, 28(2):1670-1698.

[17]Mou LL, Men R, Li G, et al., 2016. Natural language inference by tree-based convolution and heuristic matching. Proc 54th Annual Meeting of the Association for Computational Linguistics, p.130-136.

[18]Qian N, 1999. On the momentum term in gradient descent learning algorithms. Neur Netw, 12(1):145-151.

[19]Schramm H, Zowe J, 1992. A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J Opt, 2(1):121-152.

[20]Sonnenburg S, Franc V, 2010. COFFIN: a computational framework for linear SVMs. Proc 27th Int Conf on Machine Learning, p.999-1006.

[21]Teo CH, Vishwanthan SVN, Smola AJ, et al., 2010. Bundle methods for regularized risk minimization. J Mach Learn Res, 11:311-365.

[22]Tsochantaridis I, Joachims T, Hofmann T, et al., 2005. Large margin methods for structured and interdependent output variables. J Mach Learn Res, 6:1453-1484.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE