research-article

Discriminative Phrase-Based Models for Arabic Machine Translation

Authors:
Cristina España-Bonet

TALP, Universitat Politècnica de Catalunya

TALP, Universitat Politècnica de Catalunya
View Profile

,
Jesús Giménez

TALP, Universitat Politècnica de Catalunya

TALP, Universitat Politècnica de Catalunya
View Profile

,
Lluís Màrquez

TALP, Universitat Politècnica de Catalunya

TALP, Universitat Politècnica de Catalunya
View Profile

ACM Transactions on Asian Language Information Processing Volume 8 Issue 4Article No.: 15pp 1–20https://doi.org/10.1145/1644879.1644882

Published:01 December 2009Publication History

ACM Transactions on Asian Language Information Processing

Abstract

A design for an Arabic-to-English translation system is presented. The core of the system implements a standard phrase-based statistical machine translation architecture, but it is extended by incorporating a local discriminative phrase selection model to address the semantic ambiguity of Arabic. Local classifiers are trained using linguistic information and context to translate a phrase, and this significantly increases the accuracy in phrase selection with respect to the most frequent translation traditionally considered. These classifiers are integrated into the translation system so that the global task gets benefits from the discriminative learning. As a result, we obtain significant improvements in the full translation task at the lexical, syntactic, and semantic levels as measured by an heterogeneous set of automatic evaluation metrics.

References

Bangalore, S., Haffner, P., and Kanthak, S. 2007. Statistical machine translation through global lexical selection and sentence reconstruction. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 152--159.Google Scholar
Bishop, C. M. 1995. 6.4: Modeling conditional distributions. In Neural Networks for Pattern Recognition. Oxford University Press, 215.Google ScholarDigital Library
Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85. Google ScholarDigital Library
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311. Google ScholarDigital Library
Carpuat, M. and Wu, D. 2005. Evaluating the word sense disambiguation performance of statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’05).Google Scholar
Carpuat, M. and Wu, D. 2007. How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07).Google Scholar
Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 263--270. Google ScholarDigital Library
Diab, M., Hacioglu, K., and Jurafsky, D. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL’04). Google ScholarDigital Library
El Isbihani, A., Khadivi, S., Bender, O., and Ney, H. 2006. Morpho-syntactic Arabic preprocessing for Arabic to English statistical machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Workshop on Statistical Machine Translation. Association for Computational Linguistics (HLT-NAACL’06). 15--22. Google ScholarDigital Library
Fellbaum, C., Alkhalifa, M., Black, W. J., Elkateb, S., Pease, A., Rodríguez, H., and Vossen, P. 2006. Introducing the Arabic wordnet project. In Proceedings of the 3rd Global Wordnet Conference (GWA’06).Google Scholar
Giménez, J. 2007. IQMT v 2.1. Technical Manual (LSI-07-29-R). Tech. rep., TALP Research Center. LSI Department. http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v2.1.pdf.Google Scholar
Giménez, J. and Amigó, E. 2006. IQMT: A framework for automatic machine translation evaluation. In Proceedings of the 5th Annual Conference on Language Resources and Evaluation (LREC’06). 685--690.Google Scholar
Giménez, J. and Màrquez, L. 2008. Discriminative phrase selection for SMT. In Proceedings of the Conference on Advances in Neutral Information Processing Systems (NIPS’08). 205--236.Google Scholar
Joachims, T. 1999. Making large-scale support vector machine learning practical, 169--184.Google Scholar
Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04).Google Scholar
Koehn, P., Hoang, H., Mayne, A. B., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL’07). 177--180. Google ScholarDigital Library
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL’03). Google ScholarDigital Library
Koehn, P., Shen, W., Federico, M., Bertoldi, N., Callison-Burch, C., Cowan, B., Dyer, C., Hoang, H., Bojar, O., Zens, R., Constantin, A., Herbst, E., and Moran, C. 2006. Open source toolkit for statistical machine translation. Tech. rep., Johns Hopkins University Summer Workshop. http://www.statmt.org/jhuws/.Google Scholar
Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernelbased text analysis. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarDigital Library
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarDigital Library
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 295--302. Google ScholarDigital Library
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarDigital Library
Och, F. J. and Ney, H. 2004. The alignment template approach to statistical machine translation. Comput. Linguist. 30, 4, 417--449. Google ScholarDigital Library
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarDigital Library
Specia, L., Sankaran, B., and das Graças Volpe Nunes, M. 2008. N-best reranking for the efficient integration of word sense disambiguation and statistical Machine translation. In Proceedings of the Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4919, Springer, Berlin, 399--410. Google ScholarDigital Library
Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’02).Google Scholar
Stroppa, N., van den Bosch, A., and Way, A. 2007. Exploiting source similarity for SMT using context-informed features. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07). 231--240.Google Scholar
Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. 2005. Word-sense disambiguation for machine translation. In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP’05). Google ScholarDigital Library
Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Association of Computer Linguistics (ACL’01). 523--530. Google ScholarDigital Library

Index Terms

Discriminative Phrase-Based Models for Arabic Machine Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
  2. Machine learning

Recommendations

Integrating source-language context into phrase-based statistical machine translation

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More
Evaluation of English to Arabic Machine Translation Systems using BLEU and GTM
ICETC '17: Proceedings of the 9th International Conference on Education Technology and Computers

The aim of this research study is to compare the effectiveness of three systems: Google Translator, Bing Translator and Golden Alwafi that are used to translate the corpus sentences from English language to Arabic language and then evaluate these ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 8, Issue 4
December 2009
121 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1644879
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2009
- Accepted: 1 September 2009
- Revised: 1 August 2009
- Received: 1 March 2009
Published in talip Volume 8, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arabic
English
discriminative learning
statistical machine translation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 308
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discriminative Phrase-Based Models for Arabic Machine Translation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Integrating source-language context into phrase-based statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Evaluation of English to Arabic Machine Translation Systems using BLEU and GTM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Discriminative Phrase-Based Models for Arabic Machine Translation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Integrating source-language context into phrase-based statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Evaluation of English to Arabic Machine Translation Systems using BLEU and GTM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media