Full Text:   <9691>

Summary:  <872>

CLC number: TP301

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 2021-02-14

Cited: 0

Clicked: 7052

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Ming-gang Dong

https://orcid.org/0000-0001-7078-3942

Chao Jing

https://orcid.org/0000-0002-4695-8746

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2022 Vol.23 No.2 P.278-290

http://doi.org/10.1631/FITEE.2000417


One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning


Author(s):  Minggang DONG, Ming LIU, Chao JING

Affiliation(s):  School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China; more

Corresponding email(s):   jingchao@glut.edu.cn

Key Words:  Decision trees, Multiclass imbalanced learning, Node splitting criterion, Hellinger distance, One-against-all scheme



Abstract: 
Since traditional machine learning methods are sensitive to skewed distribution and do not consider the characteristics in multiclass imbalance problems, the skewed distribution of multiclass data poses a major challenge to machine learning algorithms. To tackle such issues, we propose a new splitting criterion of the decision tree based on the one-against-all-based hellinger distance (OAHD). Two crucial elements are included in OAHD. First, the one-against-all scheme is integrated into the process of computing the hellinger distance in OAHD, thereby extending the hellinger distance decision tree to cope with the multiclass imbalance problem. Second, for the multiclass imbalance problem, the distribution and the number of distinct classes are taken into account, and a modified Gini index is designed. Moreover, we give theoretical proofs for the properties of OAHD, including skew insensitivity and the ability to seek a purer node in the decision tree. Finally, we collect 20 public real-world imbalanced data sets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and the University of California, Irvine (UCI) repository. Experimental and statistical results show that OAHD significantly improves the performance compared with the five other well-known decision trees in terms of Precision, F-measure, and multiclass area under the receiver operating characteristic curve (MAUC). Moreover, through statistical analysis, the Friedman and Nemenyi tests are used to prove the advantage of OAHD over the five other decision trees.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE