Full Text:   <5486>

Summary:  <369>

CLC number: TP311.5

On-line Access: 2022-05-19

Received: 2021-09-30

Revision Accepted: 2022-05-19

Crosschecked: 2022-02-05

Cited: 0

Clicked: 2711

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Jinfu CHEN

https://orcid.org/0000-0002-3124-5452

Saihua CAI

https://orcid.org/0000-0003-0743-1156

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2022 Vol.23 No.5 P.715-731

http://doi.org/10.1631/FITEE.2100468


A software defect prediction method with metric compensation based on feature selection and transfer learning


Author(s):  Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN

Affiliation(s):  School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China; more

Corresponding email(s):   caisaih@ujs.edu.cn

Key Words:  Defect prediction, Feature selection, Transfer learning, Metric compensation


Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN. A software defect prediction method with metric compensation based on feature selection and transfer learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 715-731.

@article{title="A software defect prediction method with metric compensation based on feature selection and transfer learning",
author="Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="715-731",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100468"
}

%0 Journal Article
%T A software defect prediction method with metric compensation based on feature selection and transfer learning
%A Jinfu CHEN
%A Xiaoli WANG
%A Saihua CAI
%A Jiaping XU
%A Jingyi CHEN
%A Haibo CHEN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 715-731
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100468

TY - JOUR
T1 - A software defect prediction method with metric compensation based on feature selection and transfer learning
A1 - Jinfu CHEN
A1 - Xiaoli WANG
A1 - Saihua CAI
A1 - Jiaping XU
A1 - Jingyi CHEN
A1 - Haibo CHEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 715
EP - 731
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100468


Abstract: 
Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.

一种基于特征选择与迁移学习的度量补偿软件缺陷预测方法

陈锦富1,2,王小丽1,2,蔡赛华1,2,徐家平1,陈静怡1,陈海波1
1江苏大学计算机科学与通信工程学院,中国镇江市,212013
2江苏大学工业网络空间安全技术江苏省重点实验室,中国镇江市,212013
摘要:跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题,克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时,出现两个新问题:(1)模型训练过程中过多无关和冗余特征影响训练效率,降低了模型预测精度;(2)由于开发环境等因素,度量值的分布因项目而异,当模型用于跨项目预测时,预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题,采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明,用该方法构建的模型在AUC(接收器工作特性曲线下面积)值和F1度量指标上取得较好结果。

关键词:缺陷预测;特征选择;迁移学习;度量补偿

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Amasaki S, Kawata K, Yokogawa T, 2015. Improving cross-project defect prediction methods with data simplification. Proc 41st Euromicro Conf on Software Engineering and Advanced Applications, p.96-103.

[2]Briand LC, Melo WL, Wüst J, 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng, 28(7):706-720.

[3]Cai JC, Xu K, Zhu YH, et al., 2020. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy, 262:114566.

[4]Chen JY, Yang YT, Hu KK, et al., 2019. Multiview transfer learning for software defect prediction. IEEE Access, 7:8901-8916.

[5]Chen JY, Hu KK, Yu Y, et al., 2020. Software visualization and deep transfer learning for effective software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.578-589.

[6]Chen X, Zhao YQ, Wang QP, et al., 2018. MULTI: multi-objective effort-aware just-in-time software defect prediction. Inform Softw Technol, 93:1-13.

[7]Fukushima T, Kamei Y, McIntosh S, et al., 2014. An empirical study of just-in-time defect prediction using cross-project models. Proc 11th Working Conf on Mining Software Repositories, p.172-181.

[8]Grimm LG, Nesselroade KP Jr, 2018. Statistical Applications for the Behavioral and Social Sciences (2nd Ed.). John Wiley & Sons, Hoboken, USA.

[9]Guo YC, Shepperd M, Li N, 2018. Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. Proc 40th Int Conf on Software Engineering: Companion Proceeedings, p.325-326.

[10]Habibi PA, Amrizal V, Bahaweres RB, 2018. Cross-project defect prediction for web application using naive Bayes (case study: petstore web application). Proc Int Workshop on Big Data and Information Security, p.13-18.

[11]Hall T, Beecham S, Bowes D, et al., 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 38(6):1276-1304.

[12]He P, Li B, Liu X, et al., 2015. An empirical study on software defect prediction with a simplified metric set. Inform Softw Technol, 59:170-190.

[13]Herbold S, Trautsch A, Grabowski J, 2018. A comparative study to benchmark cross-project defect prediction approaches. Proc 40th Int Conf on Software Engineering, p.1063.

[14]Iqbal T, Cao Y, Kong QQ, et al., 2020. Learning with out-of-distribution data for audio classification. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.636-640.

[15]Kamei Y, Fukushima T, McIntosh S, et al., 2016. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng, 21(5):2072-2106.

[16]Li K, Xiang ZL, Chen T, et al., 2020a. BILO-CPDP: bi-level programming for automated model discovery in cross-project defect prediction. Proc 35th IEEE/ACM Int Conf on Automated Software Engineering, p.573-584.

[17]Li K, Xiang ZL, Chen T, et al., 2020b. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.566-577.

[18]Liu C, Yang D, Xia X, et al., 2019. A two-phase transfer learning model for cross-project defect prediction. Inform Softw Technol, 107:125-136.

[19]Lv WD, 2019. Method and application of data defect analysis based on linear discriminant regression of far subspace. Cluster Comput, 22(2):4277-4282.

[20]Madeyski L, Jureczko M, 2015. Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J, 23(3):393-422.

[21]Malhotra R, 2015. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput, 27:504-518.

[22]Marian Z, Mircea IG, Czibula IG, et al., 2016. A novel approach for software defect prediction using fuzzy decision trees. Proc 18th Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.240-247.

[23]McBride R, Wang K, Ren ZY, et al., 2019. Cost-sensitive learning to rank. Proc 33rd AAAI Conf on Artificial Intelligence, p.4570-4577.

[24]Nam J, Pan SJ, Kim S, 2013. Transfer defect learning. Proc 35th Int Conf on Software Engineering, p.382-391.

[25]Peng ML, Zhang Q, Xing XY, et al., 2019. Trainable undersampling for class-imbalance learning. Proc 33rd AAAI Conf on Artificial Intelligence, p.4707-4714.

[26]Purnami SW, Trapsilasiwi RK, 2017. SMOTE-least square support vector machine for classification of multiclass imbalanced data. Proc 9th Int Conf on Machine Learning and Computing, p.107-111.

[27]Rahman F, Devanbu P, 2013. How, and why, process metrics are better. Proc 35th Int Conf on Software Engineering, p.432-441.

[28]Ryu D, Choi O, Baik J, 2014. Improving prediction robustness of VAB-SVM for cross-project defect prediction. Proc IEEE 17th Int Conf on Computational Science and Engineering, p.994-999.

[29]Ryu D, Choi O, Baik J, 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng, 21(1):43-71.

[30]Ryu D, Jang JI, Baik J, 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, 25(1):235-272.

[31]Saidi R, Bouaguel W, Essoussi N, 2019. Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. In: Hassanien AE (Ed.), Machine Learning Paradigms: Theory and Application. Springer, Cham, p.3-24.

[32]Shippey T, Bowes D, Hall T, 2019. Automatically identifying code features for software defect prediction: using AST N-grams. Inform Softw Technol, 106:142-160.

[33]Shuai B, Li HF, Li MJ, et al., 2013. Software defect prediction using dynamic support vector machine. Proc 9th Int Conf on Computational Intelligence and Security, p.260-263.

[34]Siers MJ, Islam Z, 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inform Syst, 51:62-71.

[35]Tabassum S, Minku LL, Feng DY, et al., 2020. An investigation of cross-project learning in online just-in-time software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.554-565.

[36]Thejas GS, Garg R, Iyengar SS, et al., 2021. Metric and accuracy ranked feature inclusion: hybrids of filter and wrapper feature selection approaches. IEEE Access, 9:128687-128701.

[37]Tsuda N, Washizaki H, Honda K, et al., 2019. WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. Proc IEEE/ACM 41st Int Conf on Software Engineering: Software Engineering in Practice, p.312-321.

[38]Wahono RS, 2015. A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng, 1(1):1-16.

[39]Wan ZY, Xia X, Hassan AE, et al., 2020. Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng, 46(11):1241-1266.

[40]Wang HJ, Khoshgoftaar TM, Napolitano A, 2010. A comparative study of ensemble feature selection techniques for software defect prediction. Proc 9th Int Conf on Machine Learning and Applications, p.135-140.

[41]Watanabe S, Kaiya H, Kaijiri K, 2008. Adapting a fault prediction model to allow inter languagereuse. Proc 4th Int Workshop on Predictor Models in Software Engineering, p.19-24.

[42]Wu F, Jing XY, Dong XW, et al., 2017. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. Proc IEEE/ACM 39th Int Conf on Software Engineering Companion, p.195-197.

[43]Yang XL, Lo D, Xia X, et al., 2017. TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inform Softw Technol, 87:206-220.

[44]Yu JL, Benesty J, Huang GP, et al., 2015. Optimal single-channel noise reduction filtering matrices from the Pearson correlation coefficient perspective. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.201-205.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE