JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

An effective fault prediction model developed using an extreme learning machine with various kernel methods

Author(s): Lov Kumar, Anand Tirkey, Santanu-Ku. Rath
Affiliation(s): Department of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela 769008, India
Corresponding email(s): lovkumar505@gmail.com, andy9c@gmail.com, skrath@nitrkl.ac.in
Key Words: CK metrics, Cost analysis, Extreme learning machine, Feature selection techniques, Object-oriented software

Share this article to： More <<< Previous Paper \|Next Paper >>>

Lov Kumar, Anand Tirkey, Santanu-Ku. Rath. An effective fault prediction model developed using an extreme learning machine with various kernel methods[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1601501

@article{title="An effective fault prediction model developed using an extreme learning machine with various kernel methods",
author="Lov Kumar, Anand Tirkey, Santanu-Ku. Rath",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1601501"
}

%0 Journal Article
%T An effective fault prediction model developed using an extreme learning machine with various kernel methods
%A Lov Kumar
%A Anand Tirkey
%A Santanu-Ku. Rath
%J Frontiers of Information Technology & Electronic Engineering
%P 864-888
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1601501"

TY - JOUR
T1 - An effective fault prediction model developed using an extreme learning machine with various kernel methods
A1 - Lov Kumar
A1 - Anand Tirkey
A1 - Santanu-Ku. Rath
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 864
EP - 888
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1601501"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: System analysts often use software fault prediction models to identify fault-prone modules during the design phase of the software development life cycle. The models help predict faulty modules based on the software metrics that are input to the models. In this study, we consider 20 types of metrics to develop a model using an extreme learning machine associated with various kernel methods. We evaluate the effectiveness of the mode using a proposed framework based on the cost and efficiency in the testing phases. The evaluation process is carried out by considering case studies for 30 object-oriented software systems. Experimental results demonstrate that the application of a fault prediction model is suitable for projects with the percentage of faulty classes below a certain threshold, which depends on the efficiency of fault identification (low: 47.28%; median: 39.24%; high: 25.72%). We consider nine feature selection techniques to remove the irrelevant metrics and to select the best set of source code metrics for fault prediction.

一种有效的基于不同核函数的极限学习机故障预测模型

概要：在软件开发生命周期的设计阶段，系统分析员常利用软件故障预测模型识别易产生故障的模块。故障预测模型通过软件度量指标预测缺陷模块。基于不同核函数的极限学习机，结合20类度量指标，建立一种故障预测模型。使用软件测试成本与效率的建议框架评估模型的效率，并对30个面向对象软件系统案例进行研究。实验结果表明，根据故障识别效率（低：47.28%；中：39.24%；高：25.72%），提出的故障预测模型适用于故障占比低于特定阈值的项目。为剔除不相关指标，并筛选适用于故障预测的最佳源代码指标集，考虑了9种不同的特征选择方法。

关键词组：CK度量；成本分析；极限学习机；特征选择方法；面向对象软件

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abaei G, Selamat A, Fujita H, 2015. An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl-Based Syst, 74:28-39.

[2]Aggarwal KK, Singh Y, Kaur A, et al., 2009. Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract, 14(1):39-62.

[3]Arisholm E, Briand LC, Johannessen EB, 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Emp Softw Eng, 83(1):2-17.

[4]Briand LC, Wüst J, Daly JW, et al., 2000. Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw, 51(3):245-273.

[5]Camargo Cruz AE, Ochimizu K, 2009. Towards logistic regression models for predicting fault-prone code across software projects. Proc 3^rd Int Symp on Empirical Software Engineering and Measurement, p.460-463.

[6]Cartwright M, Shepperd M, 2000. An empirical investigation of an object-oriented software system. IEEE Trans Softw Eng, 26(8):786-796.

[7]Chidamber SR, Kemerer CF, 1991. Towards a metrics suite for object-oriented design. Proc 6^th ACM Conf on Object-Oriented Programming Systems, Languages, and Applications, p.197-211.

[8]Chidamber SR, Kemerer CF, 1994. A metrics suite for object-oriented design. IEEE Trans Softw Eng, 20(6):476-493.

[9]Dash M, Liu H, 2003. Consistency-based search in feature selection. Artif Intell, 151(1-2):155-176.

[10]Doraisamy S, Golzari S, Mohd N, et al., 2008. A study on feature selection and classification techniques for automatic genre classification of traditional malay music. ISMIR, p.331-336.

[11]El Emam K, Melo W, Machado JC, 2001. The prediction of faulty classes using object-oriented design metrics. J Syst Softw, 56(1):63-75.

[12]Erturk E, Sezer EA, 2015. A comparison of some soft computing methods for software fault prediction. Exp Syst Appl, 42(4):1872-1879.

[13]Fokaefs M, Mikhaiel R, Tsantalis N, et al., 2011. An empirical study on web service evolution. IEEE Int Conf on Web Services, p.49-56.

[14]Forman G, 2003. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res, 3(2):1289-1305.

[15]Furlanello C, Serafini M, Merler S, et al., 2003. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinform, 4(1):54.

[16]Gao K, Khoshgoftaar TM, Wang H, et al., 2011. Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp, 41(5):579-606.

[17]Goyal R, Chandra P, Singh Y, 2014. Suitability of KNN regression in the development of interaction based software fault prediction models. IERI Proc, 6:15-21.

[18]Gyimothy T, Ferenc R, Siket I, 2005. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw, 31(10):897-910.

[19]Halstead MH, 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York, NY, USA.

[20]Huang GB, Zhu QY, Siew CK, 2006. Extreme learning machine: theory and applications. Neurocomputing, 70(1):489-501.

[21]Huitt R, Wilde N, 1992. Maintenance support for object-oriented programs. IEEE Trans Softw Eng, 18(12):1038-1044.

[22]Jiang Y, Cukic B, Ma Y, 2008. Techniques for evaluating fault prediction models. Emp Softw Eng, 13(5):561-595.

[23]Jing XY, Ying S, Zhang ZW, et al., 2014a. Dictionary learning based software defect prediction. Proc 36^th Int Conf on Software Engineering, p.414-423.

[24]Jing XY, Zhang ZW, Ying S, et al., 2014b. Software defect prediction based on collaborative representation classification. Companion Proc 36^th Int Conf on Software Engineering, p.632-633.

[25]Jing XY, Wu F, Dong XW, et al., 2015. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. Proc 10^th Joint Meeting on Foundations of Software Engineering, p.496-507.

[26]Jing XY, Wu F, Dong XW, et al., 2017. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng, 43(4):321-339.

[27]Jones C, 2010. Software quality in 2010: a survey of the state of the art. http://semat.org/documents/20181/27952/software_quality_survey_2010.pdf/7cf00a73-c290-47fe-a5ff-4449ba32f65b

[28]Kanmani S, Uthariaraj VR, Sankaranarayanan V, et al., 2007. Object-oriented software fault prediction using neural networks. Inform Softw Technol, 49(5):483-492.

[29]Kapila H, Singh S, 2013. Analysis of CK metrics to predict software fault-proneness using Bayesian inference. Int J Comput Appl, 74(2):1-4.

[30]Kohavi R, 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc 14^th Int Joint Conf on Artificial Intelligence, p.1137-1143.

[31]Kohavi R, John GH, 1997. Wrappers for feature subset selection. Artif Intell, 97(1):273-324.

[32]Li W, Henry S, 1993. Maintenance metrics for the object-oriented paradigm. Proc 1^st Int Software Metrics Symp}, p.52-60.

[33]Lorenz M, Kidd J, 1994. Object-Oriented Software Metrics. Prentice-Hall, Englewood Cliffs, NJ.

[34]Malhotra R, Jain A, 2012. Fault prediction using statistical and machine learning methods for improving software quality. J Inform Process Syst, 8(2):241-262.

[35]Malhotra R, Singh Y, 2011. On the applicability of machine learning techniques for object-oriented software fault prediction. Softw Eng Int J, 1(1):24-37.

[36]McCabe TJ, 1976. A complexity measure. IEEE Trans Softw Eng, 2(4):308-320.

[37]Mende T, Koschke R, 2009. Revisiting the evaluation of defect prediction models. Proc 5^th Int Conf on Predictor Models in Software Engineering, p.1-10.

[38]Mende T, Koschke R, 2010. Effort-aware defect prediction models. 14^th European Conf on Software Maintenance and Reengineering, p.107-116.

[39]Mishra B, Shukla KK, 2012. Defect prediction for object oriented software using support vector based fuzzy classification model. Int J Comput Appl, 60(15):8-16.

[40]Nagappan N, Williams L, Vouk M, et al., 2005. Early estimation of software quality using in-process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes, 30(4):1-7.

[41]Novakovic J, 2010. The impact of feature selection on the accuracy of Naive Bayes classifier. 18^th Telecommunications Forum TELFOR}, p.1113-1116.

[42]Olague HM, Etzkorn LH, Gholston S, et al., 2007. Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng, 33(6):402-419.

[43]Pai GJ, Dugan JB, 2007. Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng, 33(10):675-686.

[44]Pawlak Z, 1982. Rough sets. Int J Comput Inform Sci, 11(5):341-356.

[45]Plackett RL, 1983. Karl Pearson and the Chi-squared test. Int Statist Rev, 51(1):59-72.

[46]Shatnawi R, Li W, 2008. The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw, 81(11):1868-1882.

[47]Singh Y, Kaur A, Malhotra R, 2010. Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J, 18(1):3-35.

[48]Slowinski R, 1992. Intelligent decision support. In: Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, Dordrecht, p.396.

[49]Tomaszewski P, Haakansson J, Grahn H, et al., 2007. Statistical models vs. expert estimation for fault prediction in modified code—an industrial case study. J Syst Softw, 80(8):1227-1238.

[50]Wagner S, 2006. A literature survey of the quality economics of defect-detection techniques. Proc ACM/IEEE Int Symp on Empirical Software Engineering, p.194-203.

[51]Wang D, Romagnoli JA, 2005. Robust multi-scale principal components analysis with applications to process monitoring. J Process Contr, 15(8):869-882.

[52]Wang T, Zhang Z, Jing X, et al., 2016. Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng, 23(4):569-590.

[53]Zhou Y, Leung H, 2006. Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng, 32(10):771-789.

[54]Zhou Y, Xu B, Leung H, 2010. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw, 83(4):660-674.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

一种有效的基于不同核函数的极限学习机故障预测模型

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference