Full Text:   <2542>

CLC number: TP391

On-line Access: 2010-11-04

Received: 2010-09-01

Revision Accepted: 2010-09-16

Crosschecked: 2010-09-01

Cited: 0

Clicked: 6401

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE C 2010 Vol.11 No.11 P.844-849

http://doi.org/10.1631/jzus.C1001003


Importance of retrieving noun phrases and named entities from digital library content


Author(s):  Ratna Sanyal, Kushal Keshri, Vidya Nand

Affiliation(s):  Indian Institute of Information Technology, Allahabad 211012, India

Corresponding email(s):   rsanyal@iiita.ac.in, iit2006031@iiita.ac.in, iit2006032@iiita.ac.in

Key Words:  Coreference resolution, Hybrid approach, Filtering, Rule based and J48 algorithm


Ratna Sanyal, Kushal Keshri, Vidya Nand. Importance of retrieving noun phrases and named entities from digital library content[J]. Journal of Zhejiang University Science C, 2010, 11(11): 844-849.

@article{title="Importance of retrieving noun phrases and named entities from digital library content",
author="Ratna Sanyal, Kushal Keshri, Vidya Nand",
journal="Journal of Zhejiang University Science C",
volume="11",
number="11",
pages="844-849",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1001003"
}

%0 Journal Article
%T Importance of retrieving noun phrases and named entities from digital library content
%A Ratna Sanyal
%A Kushal Keshri
%A Vidya Nand
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 11
%P 844-849
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1001003

TY - JOUR
T1 - Importance of retrieving noun phrases and named entities from digital library content
A1 - Ratna Sanyal
A1 - Kushal Keshri
A1 - Vidya Nand
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 11
SP - 844
EP - 849
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1001003


Abstract: 
We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents. The problem of coreference resolution has been divided into two subproblems: pronoun resolution and non-pronominal resolution. A rule based-technique was used for pronoun resolution while a learning approach for non-pronominal resolution. For named entity resolution, disambiguation arises mainly due to polysemy and synonymy. The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool. The proposed approach, to our knowledge, outperforms several baseline techniques with a higher balanced F-measure, which is harmonic mean of recall and precision. The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements, use of a hybrid approach, and increment in the feature vector to include more linguistic details in the learning technique.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Aone, C., Bennett, S.W., 1995. Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies. Proc. 33rd Annual Meeting on Association for Computational Linguistics, p.122-129.

[2]Bergsma, S., Lin, D., 2006. Bootstrapping Path-Based Pronoun Resolution. Proc. Conf. on Computational Linguistics, p.33-40.

[3]Cardie, C., 1992. Learning to Disambiguate Relative Pronouns. Proc. 10th National Conf. on Artificial Intelligence, p.38-43.

[4]Dagan, I., Itai, A., 1990. Automatic Processing of Large Corpora for the Resolution of Anaphora References. Proc. 13th Int. Conf. on Computational Linguistics, 3:1-3.

[5]Fisher, D., Ellen, R., 1992. Applying Statistical Methods to Small Corpora: Benefitting from a Limited Domain. Probabilistic Approaches to Natural Language, Technical Report FS-92-05. American Association for Artificial Intelligence, AAAI Press.

[6]Grosz, B.J., Joshi, A.K., Weinstein, S., 1995. Centering: a framework for modeling the local coherence of discourse. Comput. Ling., 21:203-226.

[7]Hobbs, J.R., 1978. Resolving pronoun references. Lingua, 44(4):311-338.

[8]Kameyama, M., 1997. Recognizing Referential Links: an Information Extraction Perspective. Technical Report, AI Center, SRI International.

[9]Kennedy, C., Boguraev, B., 1996. Pronominal Anaphora Resolution without a Parser. Proc. 16th Int. Conf. on Computational Linguistics, 1:113-118.

[10]Lappin, S., Leass, H.J., 1994. An algorithm for pronominal anaphora resolution. Comput. Ling., 20(4):535-561.

[11]McCarthy, J.F., Lehnert, W.G., 1995. Using Decision Trees for Coreference Resolution. Proc. 14th Int. Joint Conf. on Artificial Intelligence, p.1050-1055.

[12]Mitkov, R., 2002. Anaphora resolution. Comput. Ling., 29(4).

[13]Ng, V., Cardie, C., 2002. Improving Machine Learning Approaches to Coreference Resolution. Proc. 40th Annual Meeting of the Association for Computational Linguistics, p.104-111.

[14]Poesio, M., Ishikawa, T., Im Walde, S.S., Vieira, R., 2002. Acquiring Lexical Knowledge for Anaphora Resolution. Proc. 3rd Conf. on Language Resources and Evaluation, p.1220-1224.

[15]Reutemann, P., Pfahringer, B., Frank, E., 2004. A Toolbox for Learning from Relational Data with Propositional and Multi-Instance Learners. 17th Australian Joint Conf. on Artificial Intelligence, p.1017-1023.

[16]Soon, W.M., Ng, H.T., Lim, D., 2001. A machine learning approach to coreference resolution of noun phrase. Comput. Ling., 27(4):521-544.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE