skip to main content
research-article

Discriminative Phrase-Based Models for Arabic Machine Translation

Published:01 December 2009Publication History
Skip Abstract Section

Abstract

A design for an Arabic-to-English translation system is presented. The core of the system implements a standard phrase-based statistical machine translation architecture, but it is extended by incorporating a local discriminative phrase selection model to address the semantic ambiguity of Arabic. Local classifiers are trained using linguistic information and context to translate a phrase, and this significantly increases the accuracy in phrase selection with respect to the most frequent translation traditionally considered. These classifiers are integrated into the translation system so that the global task gets benefits from the discriminative learning. As a result, we obtain significant improvements in the full translation task at the lexical, syntactic, and semantic levels as measured by an heterogeneous set of automatic evaluation metrics.

References

  1. Bangalore, S., Haffner, P., and Kanthak, S. 2007. Statistical machine translation through global lexical selection and sentence reconstruction. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 152--159.Google ScholarGoogle Scholar
  2. Bishop, C. M. 1995. 6.4: Modeling conditional distributions. In Neural Networks for Pattern Recognition. Oxford University Press, 215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carpuat, M. and Wu, D. 2005. Evaluating the word sense disambiguation performance of statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’05).Google ScholarGoogle Scholar
  6. Carpuat, M. and Wu, D. 2007. How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07).Google ScholarGoogle Scholar
  7. Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 263--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Diab, M., Hacioglu, K., and Jurafsky, D. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. El Isbihani, A., Khadivi, S., Bender, O., and Ney, H. 2006. Morpho-syntactic Arabic preprocessing for Arabic to English statistical machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, Workshop on Statistical Machine Translation. Association for Computational Linguistics (HLT-NAACL’06). 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fellbaum, C., Alkhalifa, M., Black, W. J., Elkateb, S., Pease, A., Rodríguez, H., and Vossen, P. 2006. Introducing the Arabic wordnet project. In Proceedings of the 3rd Global Wordnet Conference (GWA’06).Google ScholarGoogle Scholar
  11. Giménez, J. 2007. IQMT v 2.1. Technical Manual (LSI-07-29-R). Tech. rep., TALP Research Center. LSI Department. http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v2.1.pdf.Google ScholarGoogle Scholar
  12. Giménez, J. and Amigó, E. 2006. IQMT: A framework for automatic machine translation evaluation. In Proceedings of the 5th Annual Conference on Language Resources and Evaluation (LREC’06). 685--690.Google ScholarGoogle Scholar
  13. Giménez, J. and Màrquez, L. 2008. Discriminative phrase selection for SMT. In Proceedings of the Conference on Advances in Neutral Information Processing Systems (NIPS’08). 205--236.Google ScholarGoogle Scholar
  14. Joachims, T. 1999. Making large-scale support vector machine learning practical, 169--184.Google ScholarGoogle Scholar
  15. Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04).Google ScholarGoogle Scholar
  16. Koehn, P., Hoang, H., Mayne, A. B., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL’07). 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Koehn, P., Shen, W., Federico, M., Bertoldi, N., Callison-Burch, C., Cowan, B., Dyer, C., Hoang, H., Bojar, O., Zens, R., Constantin, A., Herbst, E., and Moran, C. 2006. Open source toolkit for statistical machine translation. Tech. rep., Johns Hopkins University Summer Workshop. http://www.statmt.org/jhuws/.Google ScholarGoogle Scholar
  19. Kudo, T. and Matsumoto, Y. 2003. Fast methods for kernelbased text analysis. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Och, F. J. and Ney, H. 2004. The alignment template approach to statistical machine translation. Comput. Linguist. 30, 4, 417--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Association of Computational Linguistics (ACL’02). 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Specia, L., Sankaran, B., and das Graças Volpe Nunes, M. 2008. N-best reranking for the efficient integration of word sense disambiguation and statistical Machine translation. In Proceedings of the Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4919, Springer, Berlin, 399--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’02).Google ScholarGoogle Scholar
  27. Stroppa, N., van den Bosch, A., and Way, A. 2007. Exploiting source similarity for SMT using context-informed features. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07). 231--240.Google ScholarGoogle Scholar
  28. Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. 2005. Word-sense disambiguation for machine translation. In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Association of Computer Linguistics (ACL’01). 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discriminative Phrase-Based Models for Arabic Machine Translation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian Language Information Processing
        ACM Transactions on Asian Language Information Processing  Volume 8, Issue 4
        December 2009
        121 pages
        ISSN:1530-0226
        EISSN:1558-3430
        DOI:10.1145/1644879
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 December 2009
        • Accepted: 1 September 2009
        • Revised: 1 August 2009
        • Received: 1 March 2009
        Published in talip Volume 8, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader