CLC number: TP391
On-line Access: 2025-07-28
Received: 2024-06-11
Revision Accepted: 2024-10-10
Crosschecked: 2025-07-30
Cited: 0
Clicked: 732
Citations: Bibtex RefMan EndNote GB/T7714
Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG. Shared-weight multimodal translation model for recognizing Chinese variant characters[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(7): 1066-1082.
@article{title="Shared-weight multimodal translation model for recognizing Chinese variant characters",
author="Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="7",
pages="1066-1082",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400504"
}
%0 Journal Article
%T Shared-weight multimodal translation model for recognizing Chinese variant characters
%A Yuankang SUN
%A Bing LI
%A Lexiang LI
%A Peng YANG
%A Dongmei YANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 7
%P 1066-1082
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400504
TY - JOUR
T1 - Shared-weight multimodal translation model for recognizing Chinese variant characters
A1 - Yuankang SUN
A1 - Bing LI
A1 - Lexiang LI
A1 - Peng YANG
A1 - Dongmei YANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 7
SP - 1066
EP - 1082
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400504
Abstract: The task of recognizing chinese variant characters aims to address the challenges of semantic ambiguity and confusion, which potentially cause risks to the security of Web content and complicate the governance of sensitive words. Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining, often overlooking the inherent phonological and morphological characteristics of the Chinese language. To address these issues, we propose a shared-weight multimodal translation model (SMTM) based on multimodal information of Chinese characters, which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant text. Specifically, we encode the Pinyin features of Chinese characters using the embedding layer, and the font features of Chinese characters are extracted based on convolutional neural networks directly. Considering the multimodal similarity between the source and target sentences of the Chinese variant-character-recognition task, we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process. The simulation results show that our proposed SMTM achieves remarkable performance of 89.550% and 79.480% on bilingual evaluation understudy (BLEU) and F1 metrics respectively, with significant improvement compared with state-of-the-art baseline models.
[1]Bao ZY, Li C, Wang R, 2020. Chunk-based Chinese spelling check with global optimization. Proc Findings of the Association for Computational Linguistics, p.2031-2040.
[2]Bryant C, Yuan Z, Qorib MR, et al., 2023. Grammatical error correction: a survey of the state of the art. Comput Linguist, 49(3):643-701.
[3]Chang Y, Kong L, Jia KJ, et al., 2021. Chinese named entity recognition method based on BERT. Proc IEEE Int Conf on Data Science and Computer Application, p.294-299.
[4]Chen KH, Wang R, Utiyama M, et al., 2018. Syntax-directed attention for neural machine translation. Proc 32nd AAAI Conf on Artificial Intelligence, p.4792-4799.
[5]Cheng XY, Xu WD, Chen KL, et al., 2020. SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.871-881.
[6]Cho K, van Merriënboer B, Gulcehre C, et al., 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.1724-1734.
[7]Choi H, Cho K, Bengio Y, 2018. Fine-grained attention mechanism for neural machine translation. Neurocomputing, 284:171-176.
[8]Chollampatt S, Taghipour K, Ng HT, 2016. Neural network translation models for grammatical error correction. Proc 25th Int Joint Conf on Artificial Intelligence, p.2768-2774.
[9]Cui YM, Che WX, Liu T, et al., 2021. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Process, 29:3504-3514.
[10]Dabre R, Chu CH, Kunchukuttan A, 2021. A survey of multilingual neural machine translation. ACM Comput Surv, 53(5):99.
[11]Dai F, Cai Z, 2017. Glyph-aware embedding of Chinese characters. Proc 1st Workshop on Subword and Character Level Models in NLP, p.64-69.
[12]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional Transformers for language understanding. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.
[13]Diao SZ, Bai JX, Song Y, et al., 2020. ZEN: pre-training Chinese text encoder enhanced by N-gram representations. Proc Findings of the Association for Computational Linguistics, p.4729-4740.
[14]Dubey A, Jauhri A, Pandey A, et al., 2024. The Llama 3 herd of models. https://arxiv.org/abs/2407.21783
[15]Gehring J, Auli M, Grangier D, et al., 2017. Convolutional sequence to sequence learning. Proc 34th Int Conf on Machine Learning, p.1243-1252.
[16]Hong YZ, Yu XG, He N, et al., 2019. FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. Proc 5th Workshop on Noisy User-Generated Text, p.160-169.
[17]Hu H, Richardson K, Xu L, et al., 2020. OCNLI: original Chinese natural language inference. Proc Findings of the Association for Computational Linguistics, p.3512-3526.
[18]Ji JS, Wang QL, Toutanova K, et al., 2017. A nested attention neural hybrid model for grammatical error correction. Proc 55th Annual Meeting of the Association for Computational Linguistics, p.753-762.
[19]Jia C, Shi YF, Yang QR, et al., 2020. Entity enhanced BERT pre-training for Chinese NER. Proc Conf on Empirical Methods in Natural Language Processing, p.6384-6396.
[20]Jia YZ, Xu XB, 2018. Chinese named entity recognition based on CNN-BiLSTM-CRF. Proc IEEE 9th Int Conf on Software Engineering and Service Science, p.1-4.
[21]Jin H, Zhang ZB, Yuan PP, 2022. Improving Chinese word representation using four corners features. IEEE Trans Big Data, 8(4):982-993.
[22]Li B, Yang P, Zhao HL, et al., 2023. Hierarchical sliding inference generator for question-driven abstractive answer summarization. ACM Trans Inform Syst, 41(1):7.
[23]Li B, Yang P, Sun YK, et al., 2024. Advances and challenges in artificial intelligence text generation. Front Inform Technol Electron Eng, 25(1):64-83.
[24]Li JT, Meng K, 2021. MFE-NER: multi-feature fusion embedding for Chinese named entity recognition. https://arxiv.org/abs/2109.07877
[25]Li WG, Ramos RM, Brom PC, 2024. Threshold determination for Chinese character image processing in multimodal information fusion. Proc 28th Int Conf on Asian Language Processing, p.43-48.
[26]Li WS, Wei YG, An D, et al., 2022. LSTM-TCN: dissolved oxygen prediction in aquaculture, based on combined model of long short-term memory network and temporal convolutional network. Environ Sci Pollut Res, 29(26):39545-39556.
[27]Li XN, Yan H, Qiu XP, et al., 2020. FLAT: Chinese NER using flat-lattice Transformer. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.6836-6842.
[28]Liang ZY, Du JP, Li CY, 2020. Abstractive social media text summarization using selective reinforced Seq2Seq attention model. Neurocomputing, 410:432-440.
[29]Liu J, Yang YH, Lv SQ, et al., 2019. Attention-based BiGRU-CNN for Chinese question classification. J Amb Intell Human Comput.
[30]Liu JG, Xia CH, Li XJ, et al., 2020. A BERT-based ensemble model for Chinese news topic prediction. Proc 2nd Int Conf on Big Data Engineering, p.18-23.
[31]Liu SL, Yang T, Yue TC, et al., 2021. PLOME: pre-training with misspelled knowledge for Chinese spelling correction. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.2991-3000.
[32]Liu WJ, Zhou P, Wang ZR, et al., 2020. FastBERT: a self-distilling BERT with adaptive inference time. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.6035-6044.
[33]Liu Y, Lapata M, 2019. Hierarchical Transformers for multi-document summarization. Proc 57th Conf of the Association for Computational Linguistics, p.5070-5081.
[34]Ma SM, Sun X, Lin JY, et al., 2018. Autoencoder as assistant supervisor: improving text representation for Chinese social media text summarization. Proc 56th Annual Meeting of the Association for Computational Linguistics, p.725-731.
[35]Maruf S, Saleh F, Haffari G, 2022. A survey on document-level neural machine translation: methods and evaluation. ACM Comput Surv, 54(2):45.
[36]Meng FD, Zhang JC, 2019. DTMT: a novel deep transition architecture for neural machine translation. Proc 33rd AAAI Conf on Artificial Intelligence, p.224-231.
[37]Meng FD, Lu ZD, Li H, et al., 2016. Interactive attention for neural machine translation. Proc 26th Int Conf on Computational Linguistics, p.2174-2185.
[38]Meng YX, Wu W, Wang F, et al., 2019. Glyce: glyph-vectors for Chinese character representations. Proc 33rd Int Conf on Neural Information Processing Systems, Article 247.
[39]Otter DW, Medina JR, Kalita JK, 2021. A survey of the usages of deep learning for natural language processing. IEEE Trans Neur Netw Learn Syst, 32(2):604-624.
[40]Papineni K, Roukos S, Ward T, et al., 2002. BLUE: a method for automatic evaluation of machine translation. Proc 40th Annual Meeting of the Association for Computational Linguistics, p.311-318.
[41]Reimers N, Gurevych I, 2019. Sentence-BERT: sentence embeddings using siamese BERT-networks. Proc Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing, p.3980-3990.
[42]Shao YF, Geng ZC, Liu YT, et al., 2024. CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation. Sci China Inform Sci, 67(5): 152102.
[43]Sheng L, Xu ZX, Li XL, et al., 2023. EDMSpell: incorporating the error discriminator mechanism into Chinese spelling correction for the overcorrection problem. J King Saud Univ-Comput Inform Sci, 35(6): 101573.
[44]Soydaner D, 2022. Attention mechanism in neural networks: where it comes and where it goes. Neur Comput Appl, 34(16):13371-13385.
[45]Stahlberg F, 2020. Neural machine translation: a review. J Artif Intell Res, 69:343-418.
[46]Sun Y, Wang SH, Li YK, et al., 2019. ERNIE: enhanced representation through knowledge integration. https://arxiv.org/abs/1904.09223
[47]Sun ZJ, Li XY, Sun XF, et al., 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.2065-2075.
[48]Tao HQ, Tong SW, Zhao HK, et al., 2019. A radical-aware attention-based model for Chinese text classification. Proc 33rd AAAI Conf on Artificial Intelligence, p.5125-5132.
[49]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010.
[50]Wang DM, Song Y, Li J, et al., 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. Proc Conf on Empirical Methods in Natural Language Processing, p.2517-2527.
[51]Wang DM, Tay Y, Zhong L, 2019. Confusionset-guided pointer networks for Chinese spelling check. Proc 57th Annual Meeting of the Association for Computational Linguistics, p.5780-5785.
[52]Wang YG, Cheng SB, Jiang LY, et al., 2017. Sogou neural machine translation systems for WMT17. Proc 2nd Conf on Machine Translation, p.410-415.
[53]Weng RX, Yu H, Huang SJ, et al., 2020. Acquiring knowledge from pre-trained model to neural machine translation. Proc 34th AAAI Conf on Artificial Intelligence, p.9266-9273.
[54]Wu FZ, Liu JX, Wu CH, et al., 2019. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation. Proc World Wide Web Conf, p.3342-3348.
[55]Xie JB, Hou YJ, Wang YJ, et al., 2020. Chinese text classification based on attention mechanism and feature-enhanced fusion neural network. Computing, 102(3):683-700.
[56]Xu HD, Li ZL, Zhou QY, et al., 2021. Read, listen, and see: leveraging multimodal information helps Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.716-728.
[57]Yan H, Deng BC, Li XN, et al., 2019. TENER: adapting Transformer encoder for named entity recognition. https://arxiv.org/abs/1911.04474
[58]Yang A, Yang BS, Hui BY, et al., 2024. Qwen2 technical report. https://arxiv.org/abs/2407.10671
[59]Yao YS, Huang Z, 2016. Bi-directional LSTM recurrent neural network for Chinese word segmentation. Proc 23rd Int Conf on Neural Information Processing, p.345-353.
[60]Zhang B, Xiong DY, Xie J, et al., 2020. Neural machine translation with GRU-gated attention model. IEEE Trans Neur Netw Learn Syst, 31(11):4688-4698.
[61]Zhang SH, Huang HR, Liu JC, et al., 2020. Spelling error correction with soft-masked BERT. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.882-890.
[62]Zhang Y, Liu YG, Zhu JJ, et al., 2019. Learning Chinese word embeddings from stroke, structure and pinyin of characters. Proc 28th ACM Int Conf on Information and Knowledge Management, p.1011-1020.
[63]Zhang YS, Zheng J, Jiang YR, et al., 2019. A text sentiment classification modeling method based on coordinated CNN-LSTM-attention model. Chin J Electron, 28(1):120-126.
[64]Zhao H, Cai D, Xin Y, et al., 2017. A hybrid model for Chinese spelling check. ACM Trans Asian Low-Resour Lang Inform Process, 16(3):21.
[65]Zhao S, Hu MH, Cai ZP, et al., 2023. Enhancing Chinese character representation with lattice-aligned attention. IEEE Trans Neur Netw Learn Syst, 34(7):3727-3736.
[66]Zhou J, Cui GQ, Hu SD, et al., 2020. Graph neural networks: a review of methods and applications. AI Open, 1:57-81.
[67]Zhou SY, Xu S, Xu B, 2018. Multilingual end-to-end speech recognition with a single Transformer on low-resource languages. https://arxiv.org/abs/1806.05059v2
[68]Zhuang H, Wang C, Li CL, et al., 2017. Natural language processing service based on stroke-level convolutional networks for Chinese text classification. Proc IEEE Int Conf on Web Services, p.404-411.
Open peer comments: Debate/Discuss/Question/Opinion
<1>