CLC number: TP391.1
On-line Access: 2020-06-12
Received: 2018-11-22
Revision Accepted: 2019-04-17
Crosschecked: 2019-08-19
Cited: 0
Clicked: 5868
Citations: Bibtex RefMan EndNote GB/T7714
Zhen-zhen Li, Da-wei Feng, Dong-sheng Li, Xi-cheng Lu. Learning to select pseudo labels: a semi-supervised method for named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1800743 @article{title="Learning to select pseudo labels: a semi-supervised method for named entity recognition", %0 Journal Article TY - JOUR
学习挑选伪标签:一种用于命名实体识别的半监督学习方法国防科技大学计算机学院,中国长沙市,410073 摘要:深度学习模型在命名实体识别(NER)中实现了最先进的性能;然而,其良好性能很大程度上依赖于大量标记数据。在某些特定领域,例如医学、金融和军事领域,标记数据非常稀缺,而未标记数据则很容易获得。过往研究使用未标记数据丰富词的表示,却忽略了未标记数据中对NER任务很可能有帮助的大量实体信息。本文提出一种用于NER任务的半监督方法,其通过学习一个判别模块筛除错误伪标签,以创建高质量标注数据。伪标签是为未标记数据自动生成的标签,并被当作真实标签用来训练模型。该半监督框架包括3个步骤:为特定NER任务构建最佳单神经网络模型,学习一个评价伪标签的模块,以及迭代创建新的标记数据和改进NER模型。两个英语NER任务和一个中文医疗命名实体识别任务的实验结果表明,该方法进一步提高了最佳单神经模型的性能。当仅使用预训练的静态词嵌入且不依赖任何外部知识时,该方法可获得与CoNLL-2003和OntoNotes 5.0英语NER任务上最先进模型相当的性能。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Akbik A, Blythe D, Vollgraf R, 2018. Contextual string embeddings for sequence labeling. Proc 27th Int Conf on Computational Linguistics, p.1638-1649. [2]Chang CC, Lin CJ, 2011. LIBSVM—a library for support vector machines. ACM Trans Intell Syst Technol, 2, Article 27. [3]Chawla NV, Bowyer KW, Hall LO, et al., 2002. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 16:321-357. [4]Chiu JPC, Nichols E, 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Ling, 4:357-370. [5]Collobert R, Weston J, Bottou L, et al., 2011. Natural language processing (almost) from scratch. J Mach Learn Res, 12:2493-2537. [6]Cortes C, Vapnik V, 1995. Support-vector networks. Mach Learn, 20(3):273-297. [7]Devlin J, Chang MW, Lee K, et al., 2018. BERT: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805 [8]Ghaddar A, Langlais P, 2018. Robust lexical features for improved neural network named-entity recognition. Proc 27th Int Conf on Computational Linguistics, p.1896-1907. [9]Grandvalet Y, Bengio Y, 2006. Entropy regularization. In: Chapelle O, Schölkopf B, Zien A (Eds.), Semi-supervised Learning. MIT Press, Cambridge, Mass, p.151-168. [10]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780. [11]Hu J, Shi X, Liu Z, et al., 2017. HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text. China Conf on Knowledge Graph and Semantic Computing, p.1-6. [12]Huang Z, Xu W, Yu K, 2015. Bidirectional LSTM-CRF models for sequence tagging. https://arxiv.org/abs/1508.01991 [13]Jagannatha AN, Yu H, 2016. Structured prediction models for RNN based sequence labeling in clinical text. Proc Conf on Empirical Methods in Natural Language Processing, p.856. [14]Lafferty JD, McCallum A, Pereira FCN, 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc 18th Int Conf on Machine Learning, p.282-289. [15]Lample G, Ballesteros M, Subramanian S, et al., 2016. Neural architectures for named entity recognition. North American Chapter of the Association for Computational Linguistics, p.260-270. [16]Lee DH, 2013. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. Work Shop on Challenges in Representation Learning, p.1-6. [17]Li PH, Dong RP, Wang YS, et al., 2017. Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. Proc Conf on Empirical Methods in Natural Language Processing, p.2664-2669. [18]Liao WH, Veeramachaneni S, 2009. A simple semi-supervised algorithm for named entity recognition. Proc NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, p.58-65. [19]Ma XZ, Hovy E, 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proc 54th Annual Meeting of the Association for Computational Linguistics, p.1064-1074. [20]Mesnil G, He X, Deng L, et al., 2013. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. Interspeech, p.1-5. [21]Mikolov T, Sutskever I, Chen K, et al., 2013. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111-3119. [22]Pennington J, Socher R, Manning CD, 2014. Glove: global vectors for word representation. Proc Empirical Methods in Natural Language Processing, p.1532-1543. [23]Peters ME, Ammar W, Bhagavatula C, et al., 2017. Semi-supervised sequence tagging with bidirectional language models. Proc 55th Annual Meeting of the Association for Computational Linguistics, p.1756-1765. [24]Peters ME, Neumann M, Iyyer M, et al., 2018. Deep contextualized word representations. https://arxiv.org/abs/1802.05365 [25]Pradhan S, Moschitti A, Xue N, et al., 2013. Towards robust linguistic analysis using ontonotes. Proc 7th Conf on Computational Natural Language Learning, p.143-152. [26]Qi YJ, Collobert R, Kuksa P, et al., 2009. Combining labeled and unlabeled data with word-class distribution learning. Proc 18th ACM Conf on Information and Knowledge Management, p.1737-1740. [27]Rei M, 2017. Semi-supervised multitask learning for sequence labeling. 55th Annual Meeting of the Association for Computational Linguistics, p.2121-2130. [28]Schmidhuber J, 2015. Deep learning in neural networks: an overview. Neur Netw, 61:85-117. [29]Shen YY, Yun H, Lipton ZC, et al., 2017. Deep active learning for named entity recognition. https://arxiv.org/abs/1707.05928 [30]Strubell E, Verga P, Belanger D, et al., 2017. Fast and accurate entity recognition with iterated dilated mboxconvolutions. Proc Conf on Empirical Methods in Natural Language Processing, p.2670-2680. [31]Sun YQ, Li L, Xie ZW, et al., 2017. Co-training an improved recurrent neural network with probability statistic models for named entity recognition. Int Conf on Database Systems for Advanced Applications, p.545-555. [32]Tjong Kim Sang EF, de Meulder F, 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Proc 7thConf on Natural Language Learning at HLT-NAACL, p.142-147. [33]Wu H, Prasad S, 2018. Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans Image Process, 27(3):1259-1270. [34]Xia Y, Wang Q, 2017. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. CEUR Workshop Proc, p.43-48. [35]Xiao Y, Wang Z, 2017. Clinical Named Entity Recognition Evaluation Tasks at CCKS 2017. http://ceur-ws.org/Vol-1976/ [36]Yang J, Zhang Y, 2018. NCRF++: an open-source neural sequence labeling toolkit. Proc 56th Annual Meeting of the Association for Computational Linguistics, p.74-79. http://aclweb.org/anthology/P18-4013 [37]Zhai F, Potdar S, Xiang B, et al., 2017. Neural models for sequence chunking. Proc 31st AAAI Conf on Artificial Intelligence, p.3365-3371. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>