JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C 2014 Vol.15 No.4 P.241-253

Topic-aware pivot language approach for statistical machine translation

Author(s): Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong
Affiliation(s): Software School, Xiamen University, Xiamen 361005, China; more
Corresponding email(s): jssu@xmu.edu.cn
Key Words: Natural language processing, Pivot-based statistical machine translation, Topical context information

Share this article to： More \|Next Article >>>

Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong. Topic-aware pivot language approach for statistical machine translation[J]. Journal of Zhejiang University Science C, 2014, 15(4): 241-253.

@article{title="Topic-aware pivot language approach for statistical machine translation",
author="Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong",
journal="Journal of Zhejiang University Science C",
volume="15",
number="4",
pages="241-253",
year="2014",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1300208"
}

%0 Journal Article
%T Topic-aware pivot language approach for statistical machine translation
%A Jin-song Su
%A Xiao-dong Shi
%A Yan-zhou Huang
%A Yang Liu
%A Qing-qiang Wu
%A Yi-dong Chen
%A Huai-lin Dong
%J Journal of Zhejiang University SCIENCE C
%V 15
%N 4
%P 241-253
%@ 1869-1951
%D 2014
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1300208

TY - JOUR
T1 - Topic-aware pivot language approach for statistical machine translation
A1 - Jin-song Su
A1 - Xiao-dong Shi
A1 - Yan-zhou Huang
A1 - Yang Liu
A1 - Qing-qiang Wu
A1 - Yi-dong Chen
A1 - Huai-lin Dong
J0 - Journal of Zhejiang University Science C
VL - 15
IS - 4
SP - 241
EP - 253
%@ 1869-1951
Y1 - 2014
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1300208

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot-side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.

主题敏感的枢轴语言统计机器翻译

研究目的：枢轴语言方法是解决统计机器翻译建模缺乏双语训练语言的一种方法。传统的枢轴语言方法忽视了枢轴语言文本存在的歧义性，导致建模得到的翻译模型概率知识不够准确。为此，本文使用主题模型为不同层次的上下文信息进行建模，并将上下文信息融入枢轴语言统计机器翻译的建模过程，以改善基于枢轴语言的统计机器翻译模型。
创新要点：使用传统的向量空间模型表示上下文，具有数据稀疏的缺点。本文采用主题模型将不同层次上下文信息概率化，使得枢轴语言文本的上下文信息能够较好融入翻译模型的概率计算，进而改善翻译模型。
研究方法：发挥主题模型的优势，使用主题模型对不同层次上下文进行降维表示；修改传统枢轴语言方法的建模公式，将上下文作为隐变量或相似度，重新调整翻译模型概率。
重要结论：数据实验表明，主题模型能够较好地表示不同层次的上下文，融入主题模型上下文的枢轴语言统计机器翻译模型比传统枢轴语言方法建立的模型具有更好效果。

关键词：统计机器翻译；枢轴语言；主题模型

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bertoldi, N., Federico, M., 2009. Domain adaptation for statistical machine translation with monolingual resources. Proc. 4th Workshop on Statistical Machine Translation, p.182-189.

[2]Bertoldi, N., Barbaiani, M., Federico, M., et al., 2008. Phrase-based statistical machine translation with pivot languages. Proc. Int. Workshop on Spoken Language Translation, p.143-149.

[3]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993-1022.

[4]Borin, L., 2000. You’ll take the high road and I’ll take the low road: using a third language to improve bilingual word alignment. Proc. 18th Conf. on Computational Linguistics, p.97-103.

[5]Callison-Burch, C., Koehn, P., Osborne, M., 2006. Improved statistical machine translation using paraphrases. Proc. Main Conf. on Human Language Technology Conf. of the North American Chapter of the Association of Computational Linguistics, p.17-24.

[6]Chen, B.X., Foster, G., Kuhn, R., 2010. Bilingual sense similarity for statistical machine translation. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.834-843.

[7]Clark, J.H., Dyer, C., Lavie, A., et al., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. Proc. 49th Annual Meeting of the Association for Computational Linguistics, p.176-181.

[8]Cohn, T., Lapata, M., 2007. Machine translation by triangulation: making effective use of multi-parallel corpora. Proc. 45th Annual Meeting of the Association for Computational Linguistics, p.728-735.

[9]Costa-Jussa, M.R., Henriquez, C., Banchs, R.E., 2011. Enhancing scarce-resource language translation through pivot combinations. Proc. 5th Int. Joint Conf. on Natural Language Processing, p.1361-1365.

[10]Crego, J.M., Max, A., Yvon, F., 2010. Local lexical adaptation in machine translation through triangulation: SMT helping SMT. Proc. 23rd Int. Conf. on Computational Linguistics, p.232-240.

[11]de Gispert, A., Marino, J.B., 2006. Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. Proc. 5th Int. Conf. on Language Resources and Evaluation, p.65-68.

[12]Denkowski, M., Lavie, A., 2011. Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. Proc. 6th Workshop on Statistical Machine Translation, p.85-91.

[13]Dinu, G., Lapata, M., 2010. Measuring distributional similarity in context. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1162-1172.

[14]Filali, K., Bilmes, J., 2005. Leveraging multiple languages to improve statistical MT word alignments. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, p.92-97.

[15]Gong, Z.X., Zhou, G.D., Li, L.Y., 2011. Improve SMT with source-side “topic-document” distributions. Proc. 13th Machine Translation Summit, p.496-502.

[16]Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, p.90-95.

[17]Habash, N., Hu, J., 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. Proc. 4th Workshop on Statistical Machine Translation, p.173-181.

[18]He, Z.J., Liu, Q., Lin, S.X., 2008. Improving statistical machine translation using lexicalized rule selection. Proc. 22nd Int. Conf. on Computational Linguistics, p.321-328.

[19]Hildebrand, A.S., Eck, M., Vogel, S., et al., 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. EAMT 10th Annual Conf., p.133-142.

[20]Huck, M., Ney, H., 2012. Pivot lightly-supervised training for statistical machine translation. Proc. 10th Conf. of the Association for Machine Translation in the Americas, p.50-57.

[21]Khalilov, M., Costa-Jussa, M.R., Henriquez, C.A., et al., 2008. The TALP&I2R SMT sytstems for IWSLT 2008. Proc. Int. Workshop on Spoken Language Translation, p.116-123.

[22]Koehn, P., 2004. Statistical significance tests for machine translation evaluation. Proc. Conf. on Empirical Methods in Natural Language Processing, p.388-395.

[23]Koehn, P., Och, F.J., Marcu, D., 2003. Statistical phrase-based translation. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics, p.48-54.

[24]Kumar, S., Och, F.J., Macherey, W., 2007. Improving word alignment with bridge languages. Proc. Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.42-50.

[25]Mauser, A., Hasan, S., Ney, H., 2009. Extending statistical machine translation with discriminative and trigger-based lexicon models. Proc. Conf. on Empirical Methods in Natural Language Processing, p.210-218.

[26]Och, F.J., 2003. Minimum error rate training in statistical machine translation. Proc. 41st Annual Meeting on Association for Computational Linguistics, p.160-167.

[27]Och, F.J., Ney, H., 2003. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1):19-51.

[28]Papineni, K., Roukos, S., Ward, T., et al., 2002. BLEU: a method for automatic evaluation of machine translation. Proc. 40th Annual Meeting on Association for Computational Linguistics, p.311-318.

[29]Paul, M., Yamamoto, H., Sumita, E., et al., 2009. On the importance of pivot language selection for statistical machine translation. Proc. Annual Conf. of the North American Chapter of the Association for Computational Linguistics, p.221-224.

[30]Ruiz, N., Federico, M., 2011. Topic adaptation for lecture translation through bilingual latent semantic models. Proc. 6th Workshop on Statistical Machine Translation, p.294-302.

[31]Schwenk, H., 2008. Investigations on large-scale lightly-supervised training for statistical machine translation. Proc. Int. Workshop on Spoken Language Translation, p.182-189.

[32]Shen, L.B., Xu, J.X., Zhang, B., et al., 2009. Effective use of linguistic and contextual information for statistical machine translation. Proc. Conf. on Empirical Methods in Natural Language Processing, p.72-80.

[33]Stolcke, A., 2002. SRILM - an extensible language modeling toolkit. Proc. 7th Int. Conf. on Spoken Language Processing, p.901-904.

[34]Su, J.S., Wu, H., Wang, H.F., et al., 2012. Translation model adaptation for statistical machine translation with monolingual topic information. Proc. 50th Annual Meeting of the Association for Computational Linguistics, p.459-468.

[35]Tam, Y.C., Lane, I., Schultz, T., 2007. Bilingual LSA-based adaptation for statistical machine translation. Mach. Transl., 21(4):187-207.

[36]Tanaka, R., Murakami, Y., Ishida, T., 2009. Context-based approach for pivot translation services. Proc. 21st Int. Joint Conf. on Artificial Intelligence, p.1555-1561.

[37]Ueffing, N., Haffari, G., Sarkar, A., 2007. Semi-supervised model adaptation for statistical machine translation. Mach. Transl., 21(2):77-94.

[38]Utiyama, M., Isahara, H., 2007. A comparison of pivot methods for phrase-based statistical machine translation. Proc. Annual Conf. of the North American Chapter of the Association for Computational Linguistics, p.484-491.

[39]Wang, H.F., Wu, H., Liu, Z.Y., 2006. Word alignment for languages with scarce resources using bilingual corpora of other language pairs. Proc. 21st Int. Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, p.874-881.

[40]Wu, H., Wang, H.F., 2007. Pivot language approach for phrase-based statistical machine translation. Mach. Transl., 21(3):165-181.

[41]Wu, H., Wang, H.F., 2009. Revisiting pivot language approach for machine translation. Proc. Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th Int. Joint Conf. on Natural Language Processing, p.154-162.

[42]Xiao, X.Y., Xiong, D.Y., Zhang, M., et al., 2012. A topic similarity model for hierarchical phrase-based translation. Proc. 50th Annual Meeting of the Association for Computational Linguistics, p.750-758.

[43]Zhang, Y., Vogel, S., Waibel, A., 2004. Interpreting BLEU/NIST scores: how much improvement do we need to have a better system? Proc. 4th Int. Conf. on Language Resources and Evaluation, p.2051-2054.

[44]Zhao, B., Xing, E.P., 2006. BiTAM: bilingual topic AdMixture models for word alignment. Proc. 21st Int. Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, p.969-976.

[45]Zhao, B., Xing, E.P., 2007. HM-BiTAM: bilingual topic exploration, word alignment, and translation. Proc. Advances in Neural Information Processing Systems, p.1689-1696.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

主题敏感的枢轴语言统计机器翻译

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference