Full Text:  <2008>

Summary:  <1549>

CLC number: TP393.08

On-line Access: 2019-07-08

Received: 2018-08-31

Revision Accepted: 2019-03-11

Crosschecked: 2019-06-11

Cited: 0

Clicked: 6114

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Ya Qin

https://orcid.org/0000-0002-2685-3445

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


A network security entity recognition method based on feature template and CNN-BiLSTM-CRF


Author(s):  Ya Qin, Guo-wei Shen, Wen-bo Zhao, Yan-ping Chen, Miao Yu, Xin Jin

Affiliation(s):  College of Computer Science and Technology, Guizhou University, Guiyang 550025, China; more

Corresponding email(s):  qyamail@163.com, gwshen@gzu.edu.cn

Key Words:  Network security entity, Security knowledge graph (SKG), Entity recognition, Feature template, Neural network


Share this article to: More <<< Previous Paper|

Ya Qin, Guo-wei Shen, Wen-bo Zhao, Yan-ping Chen, Miao Yu, Xin Jin. A network security entity recognition method based on feature template and CNN-BiLSTM-CRF[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1800520

@article{title="A network security entity recognition method based on feature template and CNN-BiLSTM-CRF",
author="Ya Qin, Guo-wei Shen, Wen-bo Zhao, Yan-ping Chen, Miao Yu, Xin Jin",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1800520"
}

%0 Journal Article
%T A network security entity recognition method based on feature template and CNN-BiLSTM-CRF
%A Ya Qin
%A Guo-wei Shen
%A Wen-bo Zhao
%A Yan-ping Chen
%A Miao Yu
%A Xin Jin
%J Frontiers of Information Technology & Electronic Engineering
%P 872-884
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1800520"

TY - JOUR
T1 - A network security entity recognition method based on feature template and CNN-BiLSTM-CRF
A1 - Ya Qin
A1 - Guo-wei Shen
A1 - Wen-bo Zhao
A1 - Yan-ping Chen
A1 - Miao Yu
A1 - Xin Jin
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 872
EP - 884
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1800520"


Abstract: 
By network security threat intelligence analysis based on a security knowledge graph (SKG), multi-source threat intelligence data can be analyzed in a fine-grained manner. This has received extensive attention. It is difficult for traditional named entity recognition methods to identify mixed security entities in Chinese and English in the field of network security, and there are difficulties in accurately identifying network security entities because of insufficient features extracted. In this paper, we propose a novel FT-CNN-BiLSTM-CRF security entity recognition method based on a neural network CNN-BiLSTM-CRF model combined with a feature template (FT). The feature template is used to extract local context features, and a neural network model is used to automatically extract character features and text global features. Experimental results showed that our method can achieve an F-score of 86% on a large-scale network security dataset and outperforms other methods.

一种基于特征模板和CNN-BiLSTM-CRF的网络安全实体识别方法

摘要:利用海量网络安全威胁情报数据,构建网络安全知识图谱实施深度关联分析和挖掘,可帮助识别安全威胁并提出相应防御措施。这已成为网络安全领域研究热点。本文针对网络安全文本数据,研究实体识别算法,为构建网络安全知识图谱奠定基础。传统方法难以识别网络安全领域的新实体、中英文混合安全实体等,且提取的特征不够充分。本文在神经网络模型基础上,提出基于特征模板的CNN-BiLSTM-CRF网络安全实体识别算法。首先构建人工特征模板,提取局部上下文特征。再利用CNN提取字符特征,与局部上下文特征结合,传入BiLSTM模型提取语义特征。最后利用CRF对安全实体进行标注。结果表明,在大规模网络安全数据集上,该方法优于其它算法,F值达到86%。

关键词组:网络安全知识图谱;网络安全实体;特征模板;实体识别;神经网络

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bergstra J, Bengio Y, 2012. Random search for hyperparameter optimization. J Mach Learn Res, 13(1):281-305.

[2]Chiu JPC, Nichols E, 2015. Named entity recognition with bidirectional LSTM-CNNs. https://arxiv.org/abs/1511.08308

[3]Collobert R, Weston J, 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. Proc ACM 25th Int Conf on Machine Learning, p.160-167.

[4]Collobert R, Weston J, Bottou L, et al., 2011. Natural language processing (almost) from scratch. J Mach Learn Res, 12(1):2493-2537.

[5]Dong CH, Zhang JJ, Zong CQ, et al., 2016. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Lin CY, Xue N, Zhao D, et al. (Eds.), Natural Language Understanding and Intelligent Applications. Springer, Cham, p.239-250.

[6]Dos Santos C, Guimarães V, 2015. Boosting named entity recognition with neural character embeddings. Proc 5th Named Entity Workshop, joint with 53rd ACL and the 7th IJCNLP, p.25-33.

[7]Feng YH, Yu H, Sun G, et al., 2018. Named entity recognition method based on BLSTM. Comput Sci, 45(2):261-268 (in Chinese).

[8]Finkel JR, Manning CD, 2009. Joint parsing and named entity recognition. Human Language Technologies: the Annual Conf of the North American Chapter of the Association of Computational Linguistics, p.326-334.

[9]Gers FA, Schmidhuber A, Cummins F, 2000. Learning to forget: continual prediction with LSTM. Neur Comput, 12(10):2451-2471.

[10]Goller C, Kuchler A, 1996. Learning task-dependent distributed representations by backpropagation through structure. Proc Int Conf on Neural Networks, p.347-352.

[11]Hammerton J, 2003. Named entity recognition with long short-term memory. Proc 7th Conf on Natural Language Learning at HLT-NAACL, p.172-175.

[12]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780.

[13]Huang ZH, Wei X, Kai Y, 2015. Bidirectional LSTM-CRF models for sequence tagging. https://arxiv.org/abs/1508.01991

[14]Joshi A, Lal R, Finin T, et al., 2013. Extracting cybersecurity related linked data from text. IEEE 7th Int Conf on Semantic Computing, p.252-259.

[15]Koeling R, 2000. Chunking with maximum entropy models. Proc 2nd Workshop on Learning Language in Logic and the 4th Conf on Computational Natural Language Learning, p.139-141.

[16]Lafferty JD, McCallum A, Pereira FCN, 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. 18th Int Conf on Machine Learning, p.282-289.

[17]Lample G, Ballesteros M, Subramanian S, et al., 2016. Neural architectures for named entity recognition. Proc NAACL- HLT, p.260-270.

[18]LéCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324.

[19]Li JH, 2016. Overview of the technologies of threat intelligence sensing, sharing and analysis in cyber space. Chin J Network Inform Secur, 2(2):16-29 (in Chinese).

[20]Liu W, Li Y, Duan H, et al., 2016. Knowledge graph construction techniques. J Comput Res Dev, 53(3):582-600 (in Chinese).

[21]Luo G, Huang XJ, Li CY, et al., 2015. Joint named entity recognition and disambiguation. Proc Conf on Empirical Methods in Natural Language Processing, p.879-888.

[22]Ma XZ, Hovy E, 2016. End-to-end sequence labeling via bi- directional LSTM-CNNs-CRF.

[23]Mikolov T, Chen K, Corrado G, et al., 2013a. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

[24]Mikolov T, Sutskever I, Chen K, et al., 2013b. Distributed representations of words and phrases and their compositionality. https://arxiv.org/abs/1310.4546

[25]Passos A, Kumar V, McCallum A, 2014. Lexicon infused phrase embeddings for named entity resolution. Proc 18th Conf on Computational Language Learning, p.78-86.

[26]Peng NY, Dredze M, 2015. Named entity recognition for Chinese social media with jointly trained embeddings. Proc Conf on Empirical Methods in Natural Language Processing, p.548-554.

[27]Pennington J, Socher R, Manning C, 2014. Glove: global vectors for word representation. Proc Conf on Empirical Methods in Natural Language Processing, p.1532-1543.

[28]Pham V, Bluche T, Kermorvant C, et al., 2014. Dropout improves recurrent neural networks for handwriting recognition. 14th Int Conf on Frontiers in Handwriting Recognition, p.285-290.

[29]Qiu QQ, Miao DQ, Zhang ZF, 2013. Named entity recognition on Chinese microblog. Comput Sci, 40(6):196-198 (in Chinese).

[30]Rabiner LR, 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 77(2):257-286.

[31]Tang BZ, Cao HX, Wang XL, et al., 2014. Evaluating word representation features in biomedical named entity recognition tasks. Biomed Res Int, 2014:240403.

[32]Yang YM, 1999. An evaluation of statistical approaches to text categorization. Inform Retriev, 1(1-2):69-90.

[33]Yu HK, Zhang HP, Liu Q, et al., 2006. Chinese named entity identification using cascaded hidden Markov model. J Commun, 27(2):87-94 (in Chinese).

[34]Zhang XY, Wang T, Chen HW, 2005. Research on named entity recognition. Comput Sci, 32(4):44-48 (in Chinese).

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE