Abstract
A design for an Arabic-to-English translation system is presented. The core of the system implements a standard phrase-based statistical machine translation architecture, but it is extended by incorporating a local discriminative phrase selection model to address the semantic ambiguity of Arabic. Local classifiers are trained using linguistic information and context to translate a phrase, and this significantly increases the accuracy in phrase selection with respect to the most frequent translation traditionally considered. These classifiers are integrated into the translation system so that the global task gets benefits from the discriminative learning. As a result, we obtain significant improvements in the full translation task at the lexical, syntactic, and semantic levels as measured by an heterogeneous set of automatic evaluation metrics.
- Bangalore, S., Haffner, P., and Kanthak, S. 2007. Statistical machine translation through global lexical selection and sentence reconstruction. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 152--159.Google Scholar
- Bishop, C. M. 1995. 6.4: Modeling conditional distributions. In Neural Networks for Pattern Recognition. Oxford University Press, 215.Google ScholarDigital Library
- Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85. Google ScholarDigital Library
- Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311. Google ScholarDigital Library
- Carpuat, M. and Wu, D. 2005. Evaluating the word sense disambiguation performance of statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’05).Google Scholar
- Carpuat, M. and Wu, D. 2007. How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07).Google Scholar
- Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 263--270. Google ScholarDigital Library
- Diab, M., Hacioglu, K., and Jurafsky, D. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL’04). Google ScholarDigital Library
- El Isbihani, A., Khadivi, S., Bender, O., and Ney, H. 2006. Morpho-syntactic Arabic preprocessing for Arabic to English statistical machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Workshop on Statistical Machine Translation. Association for Computational Linguistics (HLT-NAACL’06). 15--22. Google ScholarDigital Library
- Fellbaum, C., Alkhalifa, M., Black, W. J., Elkateb, S., Pease, A., Rodríguez, H., and Vossen, P. 2006. Introducing the Arabic wordnet project. In Proceedings of the 3rd Global Wordnet Conference (GWA’06).Google Scholar
- Giménez, J. 2007. IQMT v 2.1. Technical Manual (LSI-07-29-R). Tech. rep., TALP Research Center. LSI Department. http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v2.1.pdf.Google Scholar
- Giménez, J. and Amigó, E. 2006. IQMT: A framework for automatic machine translation evaluation. In Proceedings of the 5th Annual Conference on Language Resources and Evaluation (LREC’06). 685--690.Google Scholar
- Giménez, J. and Màrquez, L. 2008. Discriminative phrase selection for SMT. In Proceedings of the Conference on Advances in Neutral Information Processing Systems (NIPS’08). 205--236.Google Scholar
- Joachims, T. 1999. Making large-scale support vector machine learning practical, 169--184.Google Scholar
- Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04).Google Scholar
- Koehn, P., Hoang, H., Mayne, A. B., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL’07). 177--180. Google ScholarDigital Library
- Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL’03). Google ScholarDigital Library
- Koehn, P., Shen, W., Federico, M., Bertoldi, N., Callison-Burch, C., Cowan, B., Dyer, C., Hoang, H., Bojar, O., Zens, R., Constantin, A., Herbst, E., and Moran, C. 2006. Open source toolkit for statistical machine translation. Tech. rep., Johns Hopkins University Summer Workshop. http://www.statmt.org/jhuws/.Google Scholar
- Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernelbased text analysis. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarDigital Library
- Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarDigital Library
- Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 295--302. Google ScholarDigital Library
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarDigital Library
- Och, F. J. and Ney, H. 2004. The alignment template approach to statistical machine translation. Comput. Linguist. 30, 4, 417--449. Google ScholarDigital Library
- Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarDigital Library
- Specia, L., Sankaran, B., and das Graças Volpe Nunes, M. 2008. N-best reranking for the efficient integration of word sense disambiguation and statistical Machine translation. In Proceedings of the Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4919, Springer, Berlin, 399--410. Google ScholarDigital Library
- Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’02).Google Scholar
- Stroppa, N., van den Bosch, A., and Way, A. 2007. Exploiting source similarity for SMT using context-informed features. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07). 231--240.Google Scholar
- Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. 2005. Word-sense disambiguation for machine translation. In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP’05). Google ScholarDigital Library
- Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Association of Computer Linguistics (ACL’01). 523--530. Google ScholarDigital Library
Index Terms
- Discriminative Phrase-Based Models for Arabic Machine Translation
Recommendations
Integrating source-language context into phrase-based statistical machine translation
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Evaluation of English to Arabic Machine Translation Systems using BLEU and GTM
ICETC '17: Proceedings of the 9th International Conference on Education Technology and ComputersThe aim of this research study is to compare the effectiveness of three systems: Google Translator, Bing Translator and Golden Alwafi that are used to translate the corpus sentences from English language to Arabic language and then evaluate these ...
Comments