CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2018-05-10
Cited: 0
Clicked: 8062
Ke Guo, Xia-bi Liu, Lun-hao Guo, Zong-jie Li, Zeng-min Geng. A new constrained maximum margin approach to discriminative learning of Bayesian classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(5): 639-650.
@article{title="A new constrained maximum margin approach to discriminative learning of Bayesian classifiers",
author="Ke Guo, Xia-bi Liu, Lun-hao Guo, Zong-jie Li, Zeng-min Geng",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="5",
pages="639-650",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1700007"
}
%0 Journal Article
%T A new constrained maximum margin approach to discriminative learning of Bayesian classifiers
%A Ke Guo
%A Xia-bi Liu
%A Lun-hao Guo
%A Zong-jie Li
%A Zeng-min Geng
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 5
%P 639-650
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1700007
TY - JOUR
T1 - A new constrained maximum margin approach to discriminative learning of Bayesian classifiers
A1 - Ke Guo
A1 - Xia-bi Liu
A1 - Lun-hao Guo
A1 - Zong-jie Li
A1 - Zeng-min Geng
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 5
SP - 639
EP - 650
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1700007
Abstract: We propose a novel discriminative learning approach for Bayesian pattern classification, called ‘constrained maximum margin (CMM)’. We define the margin between two classes as the difference between the minimum decision value for positive samples and the maximum decision value for negative samples. The learning problem is to maximize the margin under the constraint that each training pattern is classified correctly. This nonlinear programming problem is solved using the sequential unconstrained minimization technique. We applied the proposed CMM approach to learn Bayesian classifiers based on gaussian mixture models, and conducted the experiments on 10 UCI datasets. The performance of our approach was compared with those of the expectation-maximization algorithm, the support vector machine, and other state-of-the-art approaches. The experimental results demonstrated the effectiveness of our approach.
[1]Alcalá-Fdez J, Sanchez L, Garcia S, et al., 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput, 13(3):307-318.
[2]Alcalá-Fdez J, Fernández A, Luengo J, et al., 2011. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multi-Valued Log Soft Comput, 17(2-3):255-287.
[3]Bredensteiner EJ, Bennett KP, 1999. Multicategory classification by support vector machines. In: Pang JS (Ed.), Computational Optimization. Springer US, New York, p.53-79.
[4]Dempster AP, Laird NM, Rubin DB, 1977. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B, 39(1):1-38.
[5]Demšar J, 2006. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 7(Jan):1-30.
[6]Dong W, Zhou M, 2014. Gaussian classifier-based evolutionary strategy for multimodal optimization. IEEE Trans Neur Netw Learn Syst, 25(6):1200-1216.
[7]Dvořák J, Savický P, 2007. Softening splits in decision trees using simulated annealing. Int Conf on Adaptive and Natural Computing Algorithms, p.721-729.
[8]Fiacco AV, McCormick GP, 1990. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. SIAM, Philadelphia.
[9]Forsythe GE, Malcolm MA, Moler CB, 1977. Computer Methods for Mathematical Computations (1st Ed.). Prentice Hall, New Jersey.
[10]Friedman N, Geiger D, Goldszmidt M, 1997. Bayesian network classifiers. Mach Learn, 29(2-3):131-163.
[11]Gorman RP, Sejnowski TJ, 1988. Analysis of hidden units in a layered network trained to classify sonar targets. Neur Netw, 1(1):75-89.
[12]Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10-18.
[13]Jiang H, 2010. Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang, 24(4):589-608.
[14]Jiang L, Zhang H, Cai Z, 2009. A novel Bayes model: hidden naïve Bayes. IEEE Trans Knowl Data Eng, 21(10): 1361-1371.
[15]Jiang L, Zhang H, Cai Z, et al., 2012. Weighted average of one-dependence estimators. J Exp Theor Artif Intell, 24(2):219-230.
[16]Jiang Y, Zhou ZH, 2004. Editing training data for kNN classifiers with neural network ensemble. Advances in Neural Networks—Int Symp on Neural Networks, p.356-361.
[17]Juang BH, Katagiri S, 1992. Discriminative learning for minimum error classification (pattern recognition). IEEE Trans Signal Process, 40(12):3043-3054.
[18]Karabatak M, 2015. A new classifier for breast cancer detection based on naïve Bayesian. Measurement, 72:32-36.
[19]Kim BH, Pfister HD, 2011. An iterative joint linear-programming decoding of LDPC codes and finite-state channels. IEEE Conf on Communications, p.1-6.
[20]Kwok JTY, 1999. Moderating the outputs of support vector machine classifiers. IEEE Trans Neur Netw, 10(5): 1018-1031.
[21]Moerland P, 1999. A comparison of mixture models for density estimation. 9th Int Conf on Artificial Neural Networks, p.25-30.
[22]Nádas A, 1983. A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Audio Speech Signal Process, 31(4):814-817.
[23]OpenCV Team, 2015. Open Source Computer Vision Library. http://opencv.org [Accessed on July 15, 2016].
[24]Pernkopf F, Wohlmayr M, 2010. Large margin learning of Bayesian classifiers based on Gaussian mixture models. Joint European Conf on Machine Learning and Knowledge Discovery in Databases, p.50-66.
[25]Pernkopf F, Wohlmayr M, Tschiatschek S, 2012. Maximum margin Bayesian network classifiers. IEEE Trans Patt Anal Mach Intell, 34(3):521-532.
[26]Povey D, Woodland PC, 2002. Minimum phone error and I-smoothing for improved discriminative training. IEEE Int Conf on Acoustics, p.105-108.
[27]University of California, 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml [Accessed on Aug. 10, 2016].
[28]Vapnik V, 2013. The Nature of Statistical Learning Theory (2nd Ed.). Springer-Verlag, New York.
[29]Vlassis N, Likas A, 1999. A kurtosis-based dynamic approach to Gaussian mixture modeling. IEEE Trans Syst Man Cybern A, 29(4):393-399.
[30]Webb GI, Boughton JR, Wang Z, 2005. Not so naïve Bayes: aggregating one-dependence estimators. Mach Learn, 58(1):5-24.
[31]Woodland PC, Povey D, 2002. Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang, 16(1):25-47.
Open peer comments: Debate/Discuss/Question/Opinion
<1>