Full Text:   <560>

Summary:  <20>

CLC number: TP391.1

On-line Access: 2020-06-12

Received: 2018-11-22

Revision Accepted: 2019-04-17

Crosschecked: 2019-08-19

Cited: 0

Clicked: 795

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Dong-sheng LI

http://orcid.org/0000-0001-9743-2034

Zhen-zhen Li

http://orcid.org/0000-0002-4116-5077

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2020 Vol.21 No.6 P.903-916

10.1631/FITEE.1800743


Learning to select pseudo labels: a semi-supervised method for named entity recognition


Author(s):  Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu

Affiliation(s):  College of Computer, National University of Defense Technology, Changsha 410073, China

Corresponding email(s):   lizhenzhen14@nudt.edu.cn, davyfeng.c@gmail.com, dsli@nudt.edu.cn, xclu@nudt.edu.cn

Key Words:  Named entity recognition, Unlabeled data, Deep learning, Semi-supervised method


Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu. Learning to select pseudo labels: a semi-supervised method for named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(6): 903-916.

@article{title="Learning to select pseudo labels: a semi-supervised method for named entity recognition",
author="Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="6",
pages="903-916",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800743"
}

%0 Journal Article
%T Learning to select pseudo labels: a semi-supervised method for named entity recognition
%A Zhen-zhen Li
%A Da-wei Feng
%A Dong-sheng Li
%A Xi-cheng Lu
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 6
%P 903-916
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800743

TY - JOUR
T1 - Learning to select pseudo labels: a semi-supervised method for named entity recognition
A1 - Zhen-zhen Li
A1 - Da-wei Feng
A1 - Dong-sheng Li
A1 - Xi-cheng Lu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 6
SP - 903
EP - 916
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800743


Abstract: 
deep learning models have achieved state-of-the-art performance in named entity recognition (NER); the good performance, however, relies heavily on substantial amounts of labeled data. In some specific areas such as medical, financial, and military domains, labeled data is very scarce, while unlabeled data is readily available. Previous studies have used unlabeled data to enrich word representations, but a large amount of entity information in unlabeled data is neglected, which may be beneficial to the NER task. In this study, we propose a semi-supervised method for NER tasks, which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels. Pseudo labels are automatically generated for unlabeled data and used as if they were true labels. Our semi-supervised framework includes three steps: constructing an optimal single neural model for a specific NER task, learning a module that evaluates pseudo labels, and creating new labeled data and improving the NER model iteratively. Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model. Even when we use only pre-trained static word embeddings and do not rely on any external knowledge, our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.

学习挑选伪标签:一种用于命名实体识别的半监督学习方法

李真真,冯大为,李东升,卢锡城
国防科技大学计算机学院,中国长沙市,410073

摘要:深度学习模型在命名实体识别(NER)中实现了最先进的性能;然而,其良好性能很大程度上依赖于大量标记数据。在某些特定领域,例如医学、金融和军事领域,标记数据非常稀缺,而未标记数据则很容易获得。过往研究使用未标记数据丰富词的表示,却忽略了未标记数据中对NER任务很可能有帮助的大量实体信息。本文提出一种用于NER任务的半监督方法,其通过学习一个判别模块筛除错误伪标签,以创建高质量标注数据。伪标签是为未标记数据自动生成的标签,并被当作真实标签用来训练模型。该半监督框架包括3个步骤:为特定NER任务构建最佳单神经网络模型,学习一个评价伪标签的模块,以及迭代创建新的标记数据和改进NER模型。两个英语NER任务和一个中文医疗命名实体识别任务的实验结果表明,该方法进一步提高了最佳单神经模型的性能。当仅使用预训练的静态词嵌入且不依赖任何外部知识时,该方法可获得与CoNLL-2003和OntoNotes 5.0英语NER任务上最先进模型相当的性能。

关键词:命名实体识别;无标注数据;深度学习;半监督学习方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Akbik A, Blythe D, Vollgraf R, 2018. Contextual string embeddings for sequence labeling. Proc 27th Int Conf on Computational Linguistics, p.1638-1649.

[2]Chang CC, Lin CJ, 2011. LIBSVM—a library for support vector machines. ACM Trans Intell Syst Technol, 2, Article 27.

[3]Chawla NV, Bowyer KW, Hall LO, et al., 2002. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 16:321-357.

[4]Chiu JPC, Nichols E, 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Ling, 4:357-370.

[5]Collobert R, Weston J, Bottou L, et al., 2011. Natural language processing (almost) from scratch. J Mach Learn Res, 12:2493-2537.

[6]Cortes C, Vapnik V, 1995. Support-vector networks. Mach Learn, 20(3):273-297.

[7]Devlin J, Chang MW, Lee K, et al., 2018. BERT: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805

[8]Ghaddar A, Langlais P, 2018. Robust lexical features for improved neural network named-entity recognition. Proc 27th Int Conf on Computational Linguistics, p.1896-1907.

[9]Grandvalet Y, Bengio Y, 2006. Entropy regularization. In: Chapelle O, Schölkopf B, Zien A (Eds.), Semi-supervised Learning. MIT Press, Cambridge, Mass, p.151-168.

[10]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780.

[11]Hu J, Shi X, Liu Z, et al., 2017. HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text. China Conf on Knowledge Graph and Semantic Computing, p.1-6.

[12]Huang Z, Xu W, Yu K, 2015. Bidirectional LSTM-CRF models for sequence tagging. https://arxiv.org/abs/1508.01991

[13]Jagannatha AN, Yu H, 2016. Structured prediction models for RNN based sequence labeling in clinical text. Proc Conf on Empirical Methods in Natural Language Processing, p.856.

[14]Lafferty JD, McCallum A, Pereira FCN, 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc 18th Int Conf on Machine Learning, p.282-289.

[15]Lample G, Ballesteros M, Subramanian S, et al., 2016. Neural architectures for named entity recognition. North American Chapter of the Association for Computational Linguistics, p.260-270.

[16]Lee DH, 2013. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. Work Shop on Challenges in Representation Learning, p.1-6.

[17]Li PH, Dong RP, Wang YS, et al., 2017. Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. Proc Conf on Empirical Methods in Natural Language Processing, p.2664-2669.

[18]Liao WH, Veeramachaneni S, 2009. A simple semi-supervised algorithm for named entity recognition. Proc NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, p.58-65.

[19]Ma XZ, Hovy E, 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proc 54th Annual Meeting of the Association for Computational Linguistics, p.1064-1074.

[20]Mesnil G, He X, Deng L, et al., 2013. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. Interspeech, p.1-5.

[21]Mikolov T, Sutskever I, Chen K, et al., 2013. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111-3119.

[22]Pennington J, Socher R, Manning CD, 2014. Glove: global vectors for word representation. Proc Empirical Methods in Natural Language Processing, p.1532-1543.

[23]Peters ME, Ammar W, Bhagavatula C, et al., 2017. Semi-supervised sequence tagging with bidirectional language models. Proc 55th Annual Meeting of the Association for Computational Linguistics, p.1756-1765.

[24]Peters ME, Neumann M, Iyyer M, et al., 2018. Deep contextualized word representations. https://arxiv.org/abs/1802.05365

[25]Pradhan S, Moschitti A, Xue N, et al., 2013. Towards robust linguistic analysis using ontonotes. Proc 7th Conf on Computational Natural Language Learning, p.143-152.

[26]Qi YJ, Collobert R, Kuksa P, et al., 2009. Combining labeled and unlabeled data with word-class distribution learning. Proc 18th ACM Conf on Information and Knowledge Management, p.1737-1740.

[27]Rei M, 2017. Semi-supervised multitask learning for sequence labeling. 55th Annual Meeting of the Association for Computational Linguistics, p.2121-2130.

[28]Schmidhuber J, 2015. Deep learning in neural networks: an overview. Neur Netw, 61:85-117.

[29]Shen YY, Yun H, Lipton ZC, et al., 2017. Deep active learning for named entity recognition. https://arxiv.org/abs/1707.05928

[30]Strubell E, Verga P, Belanger D, et al., 2017. Fast and accurate entity recognition with iterated dilated mboxconvolutions. Proc Conf on Empirical Methods in Natural Language Processing, p.2670-2680.

[31]Sun YQ, Li L, Xie ZW, et al., 2017. Co-training an improved recurrent neural network with probability statistic models for named entity recognition. Int Conf on Database Systems for Advanced Applications, p.545-555.

[32]Tjong Kim Sang EF, de Meulder F, 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Proc 7thConf on Natural Language Learning at HLT-NAACL, p.142-147.

[33]Wu H, Prasad S, 2018. Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans Image Process, 27(3):1259-1270.

[34]Xia Y, Wang Q, 2017. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. CEUR Workshop Proc, p.43-48.

[35]Xiao Y, Wang Z, 2017. Clinical Named Entity Recognition Evaluation Tasks at CCKS 2017. http://ceur-ws.org/Vol-1976/

[36]Yang J, Zhang Y, 2018. NCRF++: an open-source neural sequence labeling toolkit. Proc 56th Annual Meeting of the Association for Computational Linguistics, p.74-79. http://aclweb.org/anthology/P18-4013

[37]Zhai F, Potdar S, Xiang B, et al., 2017. Neural models for sequence chunking. Proc 31st AAAI Conf on Artificial Intelligence, p.3365-3371.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE