Affiliation(s): 1Department of Computer Science, University of Brasilia, Brasilia 70919-900, Brazil
2Department of Mathematics, Federal Institute of Brasilia, Brasilia 71200-020, Brazil
Li WEIGANG1, Pedro Carvalho BROM2. Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500298
@article{title="Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation", author="Li WEIGANG1, Pedro Carvalho BROM2", journal="Frontiers of Information Technology & Electronic Engineering", year="in press", publisher="Zhejiang University Press & Springer", doi="https://doi.org/10.1631/FITEE.2500298" }
%0 Journal Article %T Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation %A Li WEIGANG1 %A Pedro Carvalho BROM2 %J Frontiers of Information Technology & Electronic Engineering %P %@ 2095-9184 %D in press %I Zhejiang University Press & Springer doi="https://doi.org/10.1631/FITEE.2500298"
TY - JOUR T1 - Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation A1 - Li WEIGANG1 A1 - Pedro Carvalho BROM2 J0 - Frontiers of Information Technology & Electronic Engineering SP - EP - %@ 2095-9184 Y1 - in press PB - Zhejiang University Press & Springer ER - doi="https://doi.org/10.1631/FITEE.2500298"
Abstract: Large language models (LLMs) excel in multilingual translation tasks, yet often struggle with culturally and semantically rich Chinese texts. This study introduces the LLM-BT framework, back-translation (BT) powered by LLMs, to evaluate Chinese → intermediate language → Chinese translation quality across five LLMs and three traditional systems. We construct a diverse corpus containing scientific abstracts, historical paradoxes and literary metaphors, reflecting the complexity of Chinese at the lexical and semantic levels. Using our modular NLPMetrics system (including bilingual evaluation understudy [BLEU], character F?score [CHRF], translation edit rate [TER], and semantic similarity [SS]), we find that LLMs outperform traditional tools in cultural and literary tasks. However, the results of this study also uncover a high-dimensional behavioral phenomenon, the paradox of poetic intent, where surface fluency is preserved, but metaphorical or emotional depth is lost. Additionally, some models exhibit verbatim back-translation, suggesting a form of data-driven quasi-self-awareness, particularly under repeated or cross-model evaluation. To address BLEU's limitations for Chinese, we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting, improving sensitivity to lexical segmentation and term consistency. Supplementary tests show that in certain semantic dimensions, LLM outputs approach the fidelity of human poetic translations, despite lacking a deeper metaphorical intent. Overall, this study reframes traditional fidelity vs. fluency evaluation into a richer, multi-layered analysis of LLM behavior, offering a transparent framework that contributes to Explainable AI (XAI) and identifies new research pathways in cultural natural language processing (NLP) and multilingual LLM alignment.
Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference
Open peer comments: Debate/Discuss/Question/Opinion
Open peer comments: Debate/Discuss/Question/Opinion
<1>