Full Text:   <3016>

CLC number: TP31

On-line Access: 

Received: 2003-12-05

Revision Accepted: 2004-06-26

Crosschecked: 0000-00-00

Cited: 0

Clicked: 5639

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2005 Vol.6 No.1 P.49~55


An improved TF-IDF approach for text classification*

Author(s):  Yun-tao Zhang1,2, Ling Gong2, Yong-cheng Wang2

Affiliation(s):  1. Network & Information Center, Shanghai Jiaotong University, Shanghai 200030, China; more

Corresponding email(s):   ytzhang@mail.sjtu.edu.cn

Key Words:  Term frequency/inverse document frequency (TF-IDF), Text classification, Confidence, Support, Characteristic words

ZHANG Yun-tao, GONG Ling, WANG Yong-cheng. An improved TF-IDF approach for text classification[J]. Journal of Zhejiang University Science A, 2005, 6(1): 49~55.

@article{title="An improved TF-IDF approach for text classification",
author="ZHANG Yun-tao, GONG Ling, WANG Yong-cheng",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T An improved TF-IDF approach for text classification
%A ZHANG Yun-tao
%A GONG Ling
%A WANG Yong-cheng
%J Journal of Zhejiang University SCIENCE A
%V 6
%N 1
%P 49~55
%@ 1673-565X
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.A0049

T1 - An improved TF-IDF approach for text classification
A1 - ZHANG Yun-tao
A1 - GONG Ling
A1 - WANG Yong-cheng
J0 - Journal of Zhejiang University Science A
VL - 6
IS - 1
SP - 49
EP - 55
%@ 1673-565X
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.A0049

This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Fabrizio, S., 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47. 

[2] Fan, Y., Zheng, C., Wang, Q.Y., Cai, Q.S., Liu, J., 2001. Using naïve bayes to coordinate the classification of web pages. Journal of Software, 12(9):1386-1392. 

[3] Huang, X.J., Wu, L.D., 1998. SVM based classification system. Pattern Recognition and Artificial Intelligence, (in Chinese),11(2):147-153. 

[4] Larry, M.M., Malik, Y., 2001. One-class SVMs for document classification. Journal of Machine Learning Research, 2:139-154. 

[5] Lin, H.F., Gao, T., Yao, T.S., 2000. Chinese text visualization. Journal of Northeastern University, (in Chinese),21(5):501-504. 

[6] Olivier, D.V., 2000. Mining E-mail Authorship. , Proceedings of Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA, :

[7] Salton, G., 1991. Developments in automatic text retrieval. Science, 253:974-979. 

[8] Salton, G., Buckley, C., 1988. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513-523. 

[9] Sun, J., Wang, W., Zhong, Y.X., 2001. Automatic text categorization based on k-nearest neighbor. Journal of Beijing University of Posts & Telecomms, (in Chinese),24(1):42-46. 

[10] Thorsten, J., 1996. Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. , Proceedings of 14th International Conference on Machine Learning. McLean. Virginia, USA, 78-85. :78-85. 

[11] Wang, Y.C., 1992. The Processing Technology and Basis for Chinese Information, (in Chinese), Shanghai Jiaotong University Press, Shanghai,:10-30. 

Open peer comments: Debate/Discuss/Question/Opinion



2014-09-05 01:00:49

good ,thanks for sharing


2014-03-16 19:30:24

want to read this paper

Linlin Gao@Harbin Engineering University<gll\_89@163.com>

2013-09-26 14:40:47

Look forword to reading the full paper!

Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE