Full Text:   <1669>

CLC number: TP391.1

On-line Access: 

Received: 2006-07-04

Revision Accepted: 2006-10-07

Crosschecked: 0000-00-00

Cited: 2

Clicked: 2974

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE A 2007 Vol.8 No.1 P.79~87


Using LSA and text segmentation to improve automatic Chinese dialogue text summarization

Author(s):  LIU Chuan-han, WANG Yong-cheng, ZHENG Fei, LIU De-rong

Affiliation(s):  Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, China; more

Corresponding email(s):   uuchliu@163.com

Key Words:  Automatic text summarization, Latent semantic analysis (LSA), Text segmentation, Dialogue style, Coherence, Question-answer pairs

LIU Chuan-han, WANG Yong-cheng, ZHENG Fei, LIU De-rong. Using LSA and text segmentation to improve automatic Chinese dialogue text summarization[J]. Journal of Zhejiang University Science A, 2007, 8(1): 79~87.

@article{title="Using LSA and text segmentation to improve automatic Chinese dialogue text summarization",
author="LIU Chuan-han, WANG Yong-cheng, ZHENG Fei, LIU De-rong",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Using LSA and text segmentation to improve automatic Chinese dialogue text summarization
%A LIU Chuan-han
%A WANG Yong-cheng
%A LIU De-rong
%J Journal of Zhejiang University SCIENCE A
%V 8
%N 1
%P 79~87
%@ 1673-565X
%D 2007
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2007.A0079

T1 - Using LSA and text segmentation to improve automatic Chinese dialogue text summarization
A1 - LIU Chuan-han
A1 - WANG Yong-cheng
A1 - ZHENG Fei
A1 - LIU De-rong
J0 - Journal of Zhejiang University Science A
VL - 8
IS - 1
SP - 79
EP - 87
%@ 1673-565X
Y1 - 2007
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2007.A0079

Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, latent semantic analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to TextTiling is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some “important” sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Beeferman, D., Berger, A., Lafferty, J., 1999. Statistical models for text segmentation. Machine Learning, 34:177-210.

[2] Bestgen, Y., 2006. Improving text segmentation using latent semantic analysis: a reanalysis of Choi, Wiemer-Hastings, and Moore (2001). Computational Linguistics, 32(1):5-12.

[3] Chen, W.P., Wang, Y.C., Liu, C.H., 2005. Research on automatic summarization of spoken dialogues. Computer Simulation, 22(5):226-230 (in Chinese).

[4] Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J., 2001. Latent Semantic Analysis for Text Segmentation. Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, p.109-117.

[5] Cunningham, A.M., Wicks, W., 1992. Guide to Careers in Abstracting and Indexing. National Federation of Abstracting and Information Services, Philadelphia.

[6] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407.

[7] Golub, G.H., van Loan, C.F., 1996. Matrix Computations (3rd Ed.). John Hopkins University Press, Baltimore and London, p.69-74.

[8] Hearst, M.A., 1997. TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33-64.

[9] Hsueh, P.Y., Moore, J., Renals, S., 2006. Automatic Segmentation of Multiparty Dialogue. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p.273-280.

[10] Kaufmann, S., 1999. Cohesion and Collocation: Using Context Vectors in Text Segmentation. Proceedings of the 37th Annual Meeting of the Association of for Computational Linguistics (Student Session), p.591-595.

[11] Kehagias, A., Pavlina, F., Petridis, P., 2003. Linear Text Segmentation Using a Dynamic Programming Algorithm. Proceedings of the European Association of Computational Linguistics. Budapest, Hungary, p.171-178.

[12] Kozima, H., 1993. Text Segmentation Based on Similarity Between Words. Proceedings of the 31st Annual Meeting of Association for Computational Linguistics (ACL’93), p.286-288.

[13] Li, H., Yamanishi, K., 2003. Topic analysis using a finite mixture model. Information Processing and Management, 39(4):521-541.

[14] Mani, I., 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadelphia, p.1-25.

[15] Ponte, J.M., Croft, W.B., 1997. Text Segmentation by Topic. Proceedings of the 1st European Conference on Research and Advanced Technology for Digital Libraries, p.120-129.

[16] Reynar, J.C., 1999. Statistical Models for Topic Segmentation. Proceedings of the 37th Annual Meeting of Association for Computational Linguistics (ACL’99), p.357-364.

[17] Salton, G., Singhal, A., Buckley, C., Mitra, M., 1996. Automatic Text Decomposition Using Text Segments and Text Themes. Proceedings of the 7th ACM Conference on Hypertext (Hypertext’96), p.53-65.

[18] Wang, Z.Q., Wang, Y.C., Gao, K., 2005. A New Model of Document Structure Analysis. FSKD 2005, LNAI 3614, p.658-666.

[19] Wu, Y., Liu, T., Wang, K.Z., Chen, B., 1998. Research on the method of Chinese automatic abstracting. Journal of Chinese Information Processing, 12(2):8-16 (in Chinese).

[20] Wu, C.H., Yeh, J.F., Chen, M.J., 2005. Domain-specific FAQ retrieval using independent aspects. ACM Transactions on Asian Language Information Processing, 4(1):1-17.

[21] Zechner, K., 2001. Automatic Generation of Concise Summaries of Spoken Dialogues in Unrestricted Domains. Proceedings of the 24th ACM SIGIR International Conference on Research and Development in Information Retrieval. New Orleans, LA, USA, p.199-207.

[22] Zechner, K., Lavie, A., 2001. Increasing the Coherence of Spoken Dialogue Summaries by Cross-speaker Information Linking. Proceedings of the NAACL-01 Workshop on Automatic Summarization. Pittsburgh, PA, p.22-31.

[23] Zechner, K., 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics, 28(4):447-485.

[24] Zhang, P., Soergel, D., 2006. Knowledge-based Approaches to the Segmentation of Oral History Interviews. MALACH Technical Report. College of Information Studies, University of Maryland, College Park.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE