research-article

Chinese-Japanese Machine Translation Exploiting Chinese Characters

Authors:

Toshiaki Nakazawa,

Daisuke Kawahara,

Sadao KurohashiAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 12, Issue 4

Article No.: 16, Pages 1 - 25

https://doi.org/10.1145/2523057.2523059

Published: 01 October 2013 Publication History

Abstract

The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.

References

[1]

Bai, M.-H., Chen, K.-J., and Chang, J. S. 2008. Improving word alignment by adjusting Chinese word segmentation. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 249--256.

[2]

Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Assoc. Comput. Linguist. 19, 2, 263--312.

Digital Library

[3]

Chang, P.-C., Galley, M., and Manning, C. D. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation. Association for Computational Linguistics, 224--232.

Digital Library

[4]

Chen, W., Kawahara, D., Uchimoto, K., Zhang, Y., and Isahara, H. 2008. Dependency parsing with short dependency relation in unlabeled data. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 88--94.

[5]

Chou, Y.-M. and Huang, C.-R. 2006. Hantology: A linguistic resource for Chinese language processing and studying. In Proceedings of the 5th International Conference on Language Resources and Evaluation. 587--590.

[6]

Chou, Y.-M., Huang, C.-R., and Hong, J.-F. 2008. The extended architecture of Hantology for kanji. In Proceedings of the 6th International Conference on Language Resources and Evaluation. 1693--1696.

[7]

Chu, C., Nakazawa, T., and Kurohashi, S. 2011. Japanese-Chinese phrase alignment using common Chinese characters information. In Proceedings of the MT Summit XIII. 475--482.

[8]

Chu, C., Nakazawa, T., Kawahara, D., and Kurohashi, S. 2012a. Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12).

[9]

Chu, C., Nakazawa, T., and Kurohashi, S. 2012b. Chinese characters mapping table of Japanese, traditional Chinese and simplified Chinese. In Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC’12).

[10]

Chu, C., Nakazawa, T., and Kurohashi, S. 2012c. Japanese-Chinese phrase alignment exploiting shared Chinese characters. In Proceedings of the 18th Annual Meeting of the Association for Natural Language Processing (NLP’12). 143--146.

[11]

Collins, M. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1--8.

Digital Library

[12]

DeNero, J. and Klein, D. 2007. Tailoring word alignments to syntactic machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 17--24.

[13]

Goh, C.-L., Asahara, M., and Matsumoto, Y. 2005. Building a Japanese-Chinese dictionary using kanji/hanzi conversion. In Proceedings of the International Joint Conference on Natural Language Processing. 670--681.

Digital Library

[14]

Huang, C.-R., Chou, Y.-M., Hotani, C., Chen, S.-Y., and Lin, W.-Y. 2008. Multilingual conceptual access to lexicon based on shared orthography: An ontology-driven study of Chinese and Japanese. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008). 47--54.

Digital Library

[15]

Kawahara, D. and Kurohashi, S. 2006. A fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, 176--183.

Digital Library

[16]

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’03). 127--133.

Digital Library

[17]

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177--180.

Digital Library

[18]

Kondrak, G., Marcu, D., and Knight, K. 2003. Cognates can improve statistical translation models. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 46--48.

Digital Library

[19]

Kudo, T., Yamamoto, K., and Matsumoto, Y. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). D. Lin and D. Wu Eds., Association for Computational Linguistics, 230--237.

[20]

Kurohashi, S., Nakamura, T., Matsumoto, Y., and Nagao, M. 1994. Improvements of Japanese morphological analyzer JUMAN. In Proceedings of the International Workshop on Sharable Natural Language. 22--28.

[21]

Low, J. K., Tou Ng, H., and Guo, W. 2005. A maximum entropy approach to Chinese word segmentation. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (SIGHAN’05). 161--164.

[22]

Ma, Y. and Way, A. 2009. Bilingually motivated domain-adapted word segmentation for statistical machine translation. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL’09). Association for Computational Linguistics, 549--557.

Digital Library

[23]

Nakazawa, T. and Kurohashi, S. 2011a. Bayesian subtree alignment model based on dependency trees. In Proceedings of the 5th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics.

[24]

Nakazawa, T. and Kurohashi, S. 2011b. EBMT system of KYOTO team in PatentMT task at NTCIR-9. In Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (NTCIR-9).

[25]

Niles, I. and Pease, A. 2001. Towards a standard upper ontology. In Proceedings of the International Conference on Formal Ontology in Information Systems. ACM Press, 2--9.

Digital Library

[26]

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Assoc. Comput. Linguist. 29, 1, 19--51.

Digital Library

[27]

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311--318.

Digital Library

[28]

Peng, F., Feng, F., and McCallum, A. 2004. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics (COLING). 562--568.

Digital Library

[29]

Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Vol. 2, 901--904.

[30]

Tan, C. L. and Nagao, M. 1995. Automatic alignment of Japanese-Chinese bilingual texts. IEICE Trans. Inform. Syst. E78-D, 1, 68--76.

[31]

Wang, Y., Uchimoto, K., Kazama, J., Kruengkrai, C., and Torisawa, K. 2010. Adapting Chinese word segmentation for machine translation based on short units. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). 19--21.

[32]

Wang, Y., Kazama, J., Tsuruoka, Y., Chen, W., Zhang, Y., and Torisawa, K. 2011. Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data. In Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 309--317.

[33]

Xia, F., Xue, M. P. N., Okurowski, M. E., Kovarik, J., dong Chiou, F., and Huang, S. 2000. Developing guidelines and ensuring consistency for Chinese text annotation. In Proceedings of the 2nd International Conference on Language Resources and Evaluation.

[34]

Xu, J., Zens, R., and Ney, H. 2004. Do we need Chinese word segmentation for statistical machine translation? In Proceedings of the ACL SIGHAN Workshop. O. Streiter and Q. Lu Eds., Association for Computational Linguistics, 122--128.

Cited By

Li SZhou CWang K(2025)WA-NetEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109674140:COnline publication date: 15-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109674
Delnevo GIm MTse RLam CTang SSalomoni PPau GGhini VMirri S(2023)Italian-Chinese Neural Machine Translation: results and lessons learntProceedings of the 2023 ACM Conference on Information Technology for Social Good10.1145/3582515.3609567(455-461)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3582515.3609567
Li YLiang JHuang X(2020)Ancient Chinese Lexicon Construction Based on Unsupervised Algorithm of Minimum Entropy and CBDB OptimizationHuman Centered Computing10.1007/978-3-030-70626-5_15(143-149)Online publication date: 14-Dec-2020
https://dl.acm.org/doi/10.1007/978-3-030-70626-5_15
Show More Cited By

Index Terms

Chinese-Japanese Machine Translation Exploiting Chinese Characters
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Machine Recognition of Hand-Printed Chinese Characters

The recognition of Chinese characters has been an area of great interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: ...
Stroke effect on legibility of Japanese characters

This study applied a computer program to analyze the descriptors of Japanese characters, including 56 Hiragana, 56 Katakana, and 98 Kanji characters. An experiment was designed to test the legibility of these characters by 40 Japanese students studying ...
Recognition of hand-printed Chinese characters using decision trees/machine learning C4.5 system

Recognition of Chinese characters has been an area of major interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: Chinese ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 12, Issue 4

October 2013

86 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/2523057

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Accepted: 01 April 2013

Revised: 01 February 2013

Received: 01 August 2012

Published in TALIP Volume 12, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
394
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li SZhou CWang K(2025)WA-NetEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109674140:COnline publication date: 15-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109674
Delnevo GIm MTse RLam CTang SSalomoni PPau GGhini VMirri S(2023)Italian-Chinese Neural Machine Translation: results and lessons learntProceedings of the 2023 ACM Conference on Information Technology for Social Good10.1145/3582515.3609567(455-461)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3582515.3609567
Li YLiang JHuang X(2020)Ancient Chinese Lexicon Construction Based on Unsupervised Algorithm of Minimum Entropy and CBDB OptimizationHuman Centered Computing10.1007/978-3-030-70626-5_15(143-149)Online publication date: 14-Dec-2020
https://dl.acm.org/doi/10.1007/978-3-030-70626-5_15
Che CZhao HWu XZhou DZhang Q(2019)A Word Segmentation Method of Ancient Chinese Based on Word AlignmentNatural Language Processing and Chinese Computing10.1007/978-3-030-32233-5_59(761-772)Online publication date: 9-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32233-5_59
Liu WWang LZhang X(2018)Fast-Syntax-Matching-Based Japanese-Chinese Limited Machine TranslationComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_6(63-73)Online publication date: 21-Mar-2018
https://doi.org/10.1007/978-3-319-75487-1_6
XU JCHEN YRU KZHANG YARAKI K(2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
https://doi.org/10.1587/transinf.2016EDP7425
Liu WWang L(2016)Fast-Syntax-Matching-Based Japanese-Chinese Limited Machine TranslationNatural Language Understanding and Intelligent Applications10.1007/978-3-319-50496-4_55(621-630)Online publication date: 2-Dec-2016
https://doi.org/10.1007/978-3-319-50496-4_55
Chu CNakazawa TKurohashi S(2015)Integrated Parallel Sentence and Fragment Extraction from Comparable CorporaACM Transactions on Asian and Low-Resource Language Information Processing10.1145/283308915:2(1-22)Online publication date: 11-Dec-2015
https://dl.acm.org/doi/10.1145/2833089
Mo YGuo JYu ZLuo LGao S(2014)A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraintInternational Journal of Machine Learning and Cybernetics10.1007/s13042-014-0293-66:4(537-543)Online publication date: 26-Aug-2014
https://doi.org/10.1007/s13042-014-0293-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents