Full Text:   <1525>

CLC number: TP18

On-line Access: 2012-08-02

Received: 2012-01-11

Revision Accepted: 2012-06-21

Crosschecked: 2012-07-06

Cited: 5

Clicked: 3375

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE C 2012 Vol.13 No.8 P.585-592


Negative effects of sufficiently small initial weights on back-propagation neural networks

Author(s):  Yan Liu, Jie Yang, Long Li, Wei Wu

Affiliation(s):  School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China; more

Corresponding email(s):   liuyan@dlpu.edu.cn, yangjiee@dlut.edu.cn, long_li1982@163.com, wuweiw@dlut.edu.cn

Key Words:  Neural networks, Back-propagation, Gradient learning method, Convergence

Yan Liu, Jie Yang, Long Li, Wei Wu. Negative effects of sufficiently small initial weights on back-propagation neural networks[J]. Journal of Zhejiang University Science C, 2012, 13(8): 585-592.

@article{title="Negative effects of sufficiently small initial weights on back-propagation neural networks",
author="Yan Liu, Jie Yang, Long Li, Wei Wu",
journal="Journal of Zhejiang University Science C",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Negative effects of sufficiently small initial weights on back-propagation neural networks
%A Yan Liu
%A Jie Yang
%A Long Li
%A Wei Wu
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 8
%P 585-592
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1200008

T1 - Negative effects of sufficiently small initial weights on back-propagation neural networks
A1 - Yan Liu
A1 - Jie Yang
A1 - Long Li
A1 - Wei Wu
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 8
SP - 585
EP - 592
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1200008

In the training of feedforward neural networks, it is usually suggested that the initial weights should be small in magnitude in order to prevent premature saturation. The aim of this paper is to point out the other side of the story: In some cases, the gradient of the error functions is zero not only for infinitely large weights but also for zero weights. Slow convergence in the beginning of the training procedure is often the result of sufficiently small initial weights. Therefore, we suggest that, in these cases, the initial values of the weights should be neither too large, nor too small. For instance, a typical range of choices of the initial weights might be something like (−0.4, −0.1)∪(0.1,0.4), rather than (−0.1, 0.1) as suggested by the usual strategy. Our theory that medium size weights should be used has also been extended to a few commonly used transfer functions and error functions. Numerical experiments are carried out to support our theoretical findings.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Bao, J., Chen, Y., Yu, J., 2010. A regeneratable dynamic differential evolution algorithm for neural networks with integer weights. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(12):939-947.

[2]Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Dis. Adv., 3(2):26-34.

[3]Castillo, P.A., Carpio, J., Merelo, J.J., Prieto, A., Rivas, V., Romero, G., 2000. Evolving multilayer perceptrons. Neur. Process. Lett., 12(2):115-128.

[4]Deng, Y., He, X., Zhao, J., Xiong, Y., Shen, Y., Jiang, J., 2010. Application of artificial neural network for switching loss modeling in power IGBTs. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(6):435-443.

[5]Drago, G.P., Ridella, S., 1992. Statistically controlled activation weight initialization. IEEE Trans. Neur. Networks, 3(4):627-631.

[6]Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71-99.

[7]Hagan, M.T., Demuth, H.B., Beale, M., 1996. Neural Network Design. PWS Publishing Company, Boston, MA.

[8]Ham, F.M., Kostanic, I., 2001. Principles of Neurocomputing for Science and Engineering. McGraw-Hill, New York.

[9]Kathirvalavakumar, T., Thangavel, P., 2003. A new learning algorithm using simultaneous perturbation with weight initialization. Neur. Process. Lett., 17(1):55-68.

[10]Li, Z., Wu, W., Zhang, H., 2001. Convergence of on-line gradient methods for two-layer feedforward neural networks. J. Math. Res. Exp., 21:219-228.

[11]Liu, M., Zhang, M., Yan, G., 2008. A new neural network model for the feedback stabilization of nonlinear systems. J. Zhejiang Univ.-Sci. A, 9(8):1015-1023.

[12]Ludermir, T.B., Yamazaki, A., Zanchettin, C., 2006. An optimization methodology for neural network weights and architectures. IEEE Trans. Neur. Networks, 17(6):1452-1459.

[13]Pradhan, B., 2011. An assessment of the use of an advanced neural network model with five different training strategies for the preparation of landslide susceptibility maps. J. Data Sci., 9(1):65-81.

[14]Pradhan, B., Buchroithner, M.F., 2010. Comparison and validation of landslide susceptibility maps using an artificial neural network model for three test areas in Malaysia. Envir. Eng. Geosci., 16(2):107-126.

[15]Pradhan, B., Youssef, A.M., Varathrajoo, R., 2010. Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geo-spat. Inf. Sci., 13(2):93-102.

[16]Qi, H., Zhao, H., Liu, W., Zhang, H., 2009. Parameters optimization and nonlinearity analysis of grating eddy current displacement sensor using neural network and genetic algorithm. J. Zhejiang Univ.-Sci. A, 10(8):1205-1212.

[17]Wang, J., Wu, W., Zurada, J.M., 2011. Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing, 74(14-15):2368-2376.

[18]Wu, W., Feng, G., Li, Z., Xu, Y., 2005. Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neur. Networks, 16(3):533-540.

[19]Xiong, Y., Wu, W., Kang, X., Zhang, C., 2007. Training pi-sigma network by online gradient algorithm with penalty for small weight update. Neur. Comput., 19(12):3356-3368.

[20]Yam, J.Y.F., Chow, T.W.S., 2001. Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neur. Networks, 12(2):430-434.

[21]Yam, Y.F., Chow, T.W.S., Leung, C.T., 1997. A new method in determining initial weights of feedforward neural networks for training enhancement. Neurocomputing, 16(1):23-32.

[22]Yang, S.S., Siu, S., Ho, C.L., 2008. Analysis of the initial values in split-complex backpropagation algorithm. IEEE Trans. Neur. Networks, 19(9):1564-1573.

[23]Zeng, X.Q., Wang, Y.F., Zhang, K., 2006. Computation of Adalines’ sensitivity to weight perturbation. IEEE Trans. Neur. Networks, 17(2):515-519.

[24]Zhang, N., Wu, W., Zheng, G., 2006. Convergence of gradient method with momentum for two-layer feedforward neural networks. IEEE Trans. Neur. Networks, 17(2):522-525.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE