Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2017 Vol.18 No.2 P.195-205

An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems

Author(s): Hui Chen, Bao-gang Wei, Yi-ming Li, Yong-huai Liu, Wen-hao Zhu
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China more
Corresponding email(s): chenhuicn@126.com, wbg@zju.edu.cn
Key Words: Entity recognition and disambiguation (ERD), Evaluation framework, Information extraction

Share this article to： More <<< Previous Article \|Next Article >>>

Hui Chen, Bao-gang Wei, Yi-ming Li, Yong-huai Liu, Wen-hao Zhu. An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(2): 195-205.

@article{title="An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems",
author="Hui Chen, Bao-gang Wei, Yi-ming Li, Yong-huai Liu, Wen-hao Zhu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="18",
number="2",
pages="195-205",
year="2017",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500473"
}

%0 Journal Article
%T An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems
%A Hui Chen
%A Bao-gang Wei
%A Yi-ming Li
%A Yong-huai Liu
%A Wen-hao Zhu
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 2
%P 195-205
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500473

TY - JOUR
T1 - An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems
A1 - Hui Chen
A1 - Bao-gang Wei
A1 - Yi-ming Li
A1 - Yong-huai Liu
A1 - Wen-hao Zhu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 2
SP - 195
EP - 205
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500473

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: entity recognition and disambiguation (ERD) is a crucial technique for knowledge base population and information extraction. In recent years, numerous papers have been published on this subject, and various ERD systems have been developed. However, there are still some confusions over the ERD field for a fair and complete comparison of these systems. Therefore, it is of emerging interest to develop a unified evaluation framework. In this paper, we present an easy-to-use evaluation framework (EUEF), which aims at facilitating the evaluation process and giving a fair comparison of ERD systems. EUEF is well designed and released to the public as an open source, and thus could be easily extended with novel ERD systems, datasets, and evaluation metrics. It is easy to discover the advantages and disadvantages of a specific ERD system and its components based on EUEF. We perform a comparison of several popular and publicly available ERD systems by using EUEF, and draw some interesting conclusions after a detailed analysis.

一种易用的实体识别消歧系统评测框架

概要：实体识别消歧是知识库扩充和信息抽取的重要技术之一。近些年该领域诞生了很多研究成果，提出了许多实体识别消歧系统。但由于缺乏对这些系统的完善评测对比，该领域依然处于良莠淆杂的状态。因此很有必要设计一个评测框架对各个系统进行统一评测。本文提出一个实体识别消歧系统的统一评测框架，用于公平地比较各个实体识别消歧系统的效果。该框架代码开源，可以采用新的系统、数据集、评测机制扩展。通过该框架评测实体系统，可以分析得到系统各个模块的优劣之处。本文分析对比了几个公开的实体识别消歧系统，并总结出了一些有用的结论。

关键词：实体识别消歧；评测框架；信息抽取

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bizer, C., Lehmann, J., Kobilarov, G., et al., 2009. DBpedia–-a crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web, 7(3):154-165.

[2]Carletta, J., 1996. Assessing agreement on classification tasks: the kappa statistic. Comput. Ling., 22(2):249-254.

[3]Cornolti, M., Ferragina, P., Ciaramita, M., 2013. A framework for benchmarking entity-annotation systems. Proc. 22nd Int. Conf. on World Wide Web, p.249-260.

[4]Finkel, J.R., Grenager, T., Manning, C., 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.363-370.

[5]Hachey, B., Nothman, J., Radford, W., 2014. Cheap and easy entity evaluation. Proc. 52nd Annual Meeting of the Association for Computational Linguistics, p.464-469.

[6]Hoffart, J., Yosef, M.A., Bordino, I., et al., 2011. Robust disambiguation of named entities in text. Proc. Conf. on Empirical Methods in Natural Language Processing, p.782-792.

[7]Ji, H., Nothman, J., Hachey, B., et al., 2014. Overview of TAC-KBP2014 entity discovery and linking tasks. Proc. Text Analysis Conf.

[8]Ji, H., Nothman, J., Hachey, B., et al., 2015. Overview of TAC-KBP2015 tri-lingual entity discovery and linking. Proc. Text Analysis Conf.

[9]Ling, X., Singh, S., Weld, D.S., 2015. Design challenges for entity linking. Trans. Assoc. Comput. Ling., 3:315-328.

[10]Milne, D., Witten, I.H., 2008. Learning to link with Wikipedia. Proc. 17th ACM Conf. on Information and Knowledge Management, p.509-518.

[11]Milne, D., Witten, I.H., 2013. An open-source toolkit for mining Wikipedia. Artif. Intell., 194:222-239.

[12]Ratinov, L., Roth, D., 2009. Design challenges and misconceptions in named entity recognition. Proc. 13th Conf. on Computational Natural Language Learning, p.147-155.

[13]Ratinov, L., Roth, D., Downey, D., et al., 2011. Local and global algorithms for disambiguation to Wikipedia. Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language, p.1375-1384.

[14]Ristad, E.S., Yianilos, P.N., 1998. Learning string-edit distance. IEEE Trans. Patt. Anal. Mach. Intell., 20(5):522-532.

[15]Rizzo, G., van Erp, M., Troncy, R., 2014. Benchmarking the extraction and disambiguation of named entities on the semantic web. Proc. 9th Int. Conf. on Language Resources and Evaluation.

[16]Shen, W., Wang, J., Han, J., 2015. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443-460.

[17]Spitkovsky, V.I., Chang, A.X., 2012. A cross-lingual dictionary for English Wikipedia concepts. 8th Int. Conf. on Language Resources and Evaluation, p.3168-3175.

[18]Usbeck, R., Röder, M., Ngonga Ngomo, A.C., et al., 2015. GERBIL: general entity annotator benchmarking framework. Proc. 24th Int. Conf. on World Wide Web, p.1133-1143.

Open peer comments: Debate/Discuss/Question/Opinion

<1>