research-article

Discriminative Training for Log-Linear Based SMT: Global or Local Methods

Authors:

Conghui ZhuAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 13, Issue 4

Article No.: 17, Pages 1 - 25

https://doi.org/10.1145/2637478

Published: 19 December 2014 Publication History

Abstract

In statistical machine translation, the standard methods such as MERT tune a single weight with regard to a given development data. However, these methods suffer from two problems due to the diversity and uneven distribution of source sentences. First, their performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, the sentence level translation quality is not assured since tuning is performed on the document level rather than on sentence level. In contrast with the standard global training in which a single weight is learned, we propose novel local training methods to address these two problems. We perform training and testing in one step by locally learning the sentence-wise weight for each input sentence. Since the time of each tuning step is unnegligible and learning sentence-wise weights for the entire test set means many passes of tuning, it is a great challenge for the efficiency of local training. We propose an efficient two-phase method to put the local training into practice by employing the ultraconservative update. On NIST Chinese-to-English translation tasks with both medium and large scales of training data, our local training methods significantly outperform standard methods with the maximal improvements up to 2.0 BLEU points, meanwhile their efficiency is comparable to that of the standard methods.

References

[1]

Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1 (2008), 117--122.

Digital Library

[2]

Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 200--208.

[3]

Léon Bottou and Vladimir Vapnik. 1992. Local learning algorithms. Neural Comput. 4, 6, 888--900.

Digital Library

[4]

Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Harvard University.

[5]

Haibin Cheng, Pang-Ning Tan, and Rong Jin. 2010. Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22, 4, 537--549.

Digital Library

[6]

Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 427--436.

Digital Library

[7]

David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 263--270.

Digital Library

[8]

David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 224--233.

Digital Library

[9]

Lauren Eby Clemens, Pedro Mateo Pedro, Maria Polinsky, and Gabrielle Tandet. 2011.New approaches to linguistic work in Mesoamerican communities. Mexico and Central America Program Speaker Series.

[10]

Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551--585.

Digital Library

[11]

Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3 (March 2003), 951--991.

Digital Library

[12]

George Foster, Cyril Goutte, and Roland Kuhn. 2010. Discriminative instance weighting for domain adaptation in statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 451--459.

Digital Library

[13]

Michel Galley and Chris Quirk. 2011. Optimal search for minimum error rate training. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 38--49.

Digital Library

[14]

William W. Hager and Hongchao Zhang. 2006. Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32, 1, 113--137.

Digital Library

[15]

Yifan He, Yanjun Ma, Josef van Genabith, and Andy Way. 2010. Bridging SMT and TM with translation recommendation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 622--630.

Digital Library

[16]

S. Hildebrand, M. Eck, S. Vogel, and Alex Waibel. 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of 10th Conference of the European Association for Machine Translation. Association for Computational Linguistics.

[17]

Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1352--1362.

Digital Library

[18]

Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 264--271.

[19]

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing. Association for Computational Linguistics.

[20]

Philipp Koehn, Hieu Hoang, Alexandra Birch, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL’07). Association for Computational Linguistics, 177--180.

Digital Library

[21]

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.

Digital Library

[22]

Mu Li, Yinggong Zhao, Dongdong Zhang, and Ming Zhou. 2010. Adaptive development data selection for log-linear model in statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Association for Computational Linguistics, 662--670.

Digital Library

[23]

Shujie Liu, Chi-ho Lee Lee, Mu Li, and Ming Zhou. 2012. A co-training framework for feature weight optimization of statistic machine translation. Journal of Software (In Chinese) 23, 12.

[24]

Yajuan Lü, Jin Huang, and Qun Liu. 2007. Improving statistical machine translation performance by training data selection and optimization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 343--350.

[25]

Yanjun Ma, Yifan He, Andy Way, and Josef van Genabith. 2011. Consistent translation using discriminative learning: A translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1239--1248.

Digital Library

[26]

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

Digital Library

[27]

Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistic - Vol. 1 (COLING’08). Association for Computational Linguistics, 585--592.

Digital Library

[28]

Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160--167.

Digital Library

[29]

Franz Josef Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 440--447.

Digital Library

[30]

Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 295--302.

Digital Library

[31]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine Translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311--318.

Digital Library

[32]

Adam Pauls, John Denero, and Dan Klein. 2009. Consensus training for consensus decoding in machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1418--1427.

Digital Library

[33]

John C. Platt. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. MIT Press, Cambridge, MA. 185--208.

Digital Library

[34]

Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. 2007. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, 807--814.

Digital Library

[35]

Nobuyuki Shimizu and Andrew Haas. 2006. Exact decoding for jointly labeling and chunking sequences. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (COLING-ACL’06). Association for Computational Linguistics,763--770.

Digital Library

[36]

David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear Models. In Proceedings of the COLING/ACL Main Conference Poster Sessions. Association for Computational Linguistics, 787--794.

Digital Library

[37]

Andreas Stolcke. 2002. SRILM: An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing.

[38]

Aleš Tamchyna, Petra Galuščáková, Amir Kamran, Miloš Stanojević, and Ondřej Bojar. 2012.Selecting data for English-to-Czech machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation (WMT’12). Association for Computational Linguistics, 374--381.

Digital Library

[39]

K. Wäschle, P. Simianer, S. Bertoldi, N. Riezler, and M. Federico. 2013. Generative and discriminative methods for online adaptation in SMT. In Proceedings of MT Summit.

[40]

Taro Watanabe and Eiichiro Sumita. 2003. Example-based decoding for statistical machine translation. In Proceedings of MT Summit. 410--417.

[41]

Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, 764--773.

[42]

Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. 2006. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition: Volume 2 (CVPR’06). IEEE, 2126--2136.

Digital Library

[43]

Bing Zhao and Shengyuan Chen. 2009. A simplex Armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (NAACL-Short’09). Association for Computational Linguistics, 21--24.

Digital Library

[44]

Yinggong Zhao, Shujie Liu, Yangsheng Ji, Jiajun Chen, and Guodong Zhou. 2011.Transductive minimum error rate training for statistical machine translation. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 641--648.

[45]

Zhongguang Zheng, Zhongjun He, Yao Meng, and Hao Yu. 2010. Domain adaptation for statistical machine translation in development corpus selection. In Proceedings of Universal Communication Symposium. 2--7.

Index Terms

Discriminative Training for Log-Linear Based SMT: Global or Local Methods
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Locally training the log-linear model for SMT
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two ...
Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

Syntactic reordering on the source side has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. In this article, we focus on the Chinese (DE) construction which is flexible and ...
Fast generation of translation forest for large-scale SMT discriminative training
EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 13, Issue 4

December 2014

84 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/2701119

Editor:
Richard Sproat
Google, Inc., USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2014

Accepted: 01 June 2014

Revised: 01 March 2014

Received: 01 October 2013

Published in TALIP Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
249
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents