
CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2018-02-08
Cited: 0
Clicked: 8557
You-wei Wang, Li-zhou Feng. A new feature selection method for handling redundant information in text classification[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1601761 @article{title="A new feature selection method for handling redundant information in text classification", %0 Journal Article TY - JOUR
一种用于文本分类的去冗余特征选择新方法关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Alatas B, 2010. Chaotic harmony search algorithms. Appl Math Comput, 216(9):2687-2699. ![]() [2]Apte C, Damerau F, Weiss S, 1999. Text mining with decision trees and decision rules. Conf on Automated Learning and Discovery, p.169-198. ![]() [3]Battiti R, 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neur Netw, 5(4):537-550. ![]() [4]Breiman L, Friedman JH, Olshen RA, et al., 1984. Classification and Regression Trees. Wadsworth International Group, Monterey, USA. ![]() [5]Caruana G, Li MZ, Liu Y, 2013. An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing, 108:45-57. ![]() [6]Cevenini G, Barbini E, Massai MR, et al., 2013. A naïve Bayes classifier for planning transfusion requirements in heart surgery. J Eval Clin Pract, 19(1):25-29. ![]() [7]Chang CC, Lin CJ, 2007. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3), Article 27. ![]() [8]Chen JN, Huang HK, Tian SF, et al., 2009. Feature selection for text classification with naïve Bayes. Exp Syst Appl, 36(3):5432-5435. ![]() [9]Dallachiesa M, Palpanas T, Ilyas IF, 2014. Top-k nearest neighbor search in uncertain data series. Proc VLDB Endowm, 8(1):13-24. ![]() [10]De Souza AF, Pedroni F, Oliveira E, et al., 2009. Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing, 72(10-12):2209-2217. ![]() [11]Drucker H, Wu DH, Vapnik VN, 1999. Support vector machines for spam categorization. IEEE Trans Neur Netw, 10(5):1048-1054. ![]() [12]Elghazel H, Aussem A, Gharroudi O, et al., 2016. Ensemble multi-label text categorization based on rotation forest and latent semantic indexing. Exp Syst Appl, 57:1-11. ![]() [13]Estevez PA, Tesmer M, Perez CA, et al., 2009. Normalized mutual information feature selection. IEEE Trans Neur Netw, 20(2):189-201. ![]() [14]Geem ZW, Kim JH, Loganathan GV, 2001. A new heuristic optimization algorithm: harmony search. Simulation, 76(2): 60-68. ![]() [15]Han M, Ren WJ, 2015. Global mutual information-based feature selection approach using single-objective and multi-objective optimization. Neurocomputing, 168:47-54. ![]() [16]Hoque N, Bhattacharyya DK, Kalita JK, 2014. MIFS-ND: a mutual information-based feature selection method. Exp Syst Appl, 41(14):6371-6385. ![]() [17]Jing LP, Ng MK, Huang JZ, 2010. Knowledge-based vector space model for text clustering. Knowl Inform Syst, 25(1):35-55. ![]() [18]Joachims T, 1998. Text categorization with support vector machines: learning with many relevant features. Proc 10th European Conf on Machine Learning, p.137-142. ![]() [19]Kruskal JB, Wish M, 1978. Multidimensional Scaling. Sage, London, UK. ![]() [20]Lin YJ, Hu QH, Liu JH, et al., 2015. Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing, 168:92-103. ![]() [21]Liu H, Yu L, 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng, 17(4):491-502. ![]() [22]McCallum A, Nigam K, 2001. A comparison of event models for naive Bayes text classification. AAAI-98 Workshop on Learning for Text Categorization, p.41-48. ![]() [23]Napoletano P, Colace F, De Santo M, et al., 2012. Text classification using a graph of terms. 6th Int Conf on Complex, Intelligent and Software Intensive Systems. p.1030-1035. ![]() [24]Peng HC, Long FH, Ding C, 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 27(8):1226-1238. ![]() [25]Porter MF, 1997. An algorithm for suffix stripping. In: Jones KS, Willett P (Eds.), Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, USA, p.313-316. ![]() [26]Schneider KM, 2003. A comparison of event models for naive Bayes anti-spam e-mail filtering. Proc 10th Conf on European Chapter of the Association for Computational Linguistics, p.307-314. ![]() [27]Sebastiani F, 2002. Machine learning in automated text categorization. ACM Comput Surv, 34(1):1-47. ![]() [28]Shang WQ, Huang HK, Zhu HB, et al., 2007. A novel feature selection algorithm for text categorization. Exp Syst Appl, 33(1):1-5. ![]() [29]Taheri SM, Hesamian G, 2013. A generalization of the Wilcoxon signed-rank test and its applications. Stat Paper, 54(2):457-470. ![]() [30]Tenenhaus M, Vinzi VE, Chatelin YM, et al., 2005. PLS path modeling. Comput Stat Data Anal, 48(1):159-205. ![]() [31]Uğuz H, 2011. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst, 24(7):1024-1032. ![]() [32]Wang DQ, Zhang H, Liu R, et al., 2012. Feature selection based on term frequency and T-test for text categorization. Proc 21st ACM Int Conf on Information and Knowledge Management, p.1482-1486. ![]() [33]Wang YW, Liu YN, Feng LZ, et al., 2014. Novel feature selection method based on harmony search for email classification. Knowl-Based Syst, 73:311-323. ![]() [34]Wilcoxon F, 1945. Individual comparisons by ranking methods. Biom Bull, 1(6):80-83. ![]() [35]Yang JM, Liu YN, Zhu XD, et al., 2012. A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inform Process Manag, 48(4):741-754. ![]() [36]Yan J, Liu N, Zhang B, et al., 2005. OCFS: optimal orthogonal centroid feature selection for text categorization. Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.122-129. ![]() [37]Yang JM, Qu ZY, Liu ZY, 2014. Improved feature-selection method considering the imbalance problem in text categorization. Sci World J, 2014:625342. ![]() [38]Yang YM, Pedersen JO, 1997. A comparative study on feature selection in text categorization. Proc 14th Int Conf on Machine Learning, p.412-420. ![]() [39]Zhang W, Yoshida T, Tang XJ, 2011. A comparative study of TF*IDF, LSI and multi-words for text classification. Exp Syst Appl, 38(3):2758-2765. ![]() [40]Zhang W, Clark RAJ, Wang YY, et al., 2016. Unsupervised language identification based on latent Dirichlet Allocation. Comput Speech Lang, 39:47-66. ![]() [41]Zhang YS, Zhang ZG, 2012. Feature subset selection with cumulate conditional mutual information minimization. Exp Syst Appl, 39(5):6078-6088. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>