skip to main content
research-article

Discriminative Training for Log-Linear Based SMT: Global or Local Methods

Published: 19 December 2014 Publication History

Abstract

In statistical machine translation, the standard methods such as MERT tune a single weight with regard to a given development data. However, these methods suffer from two problems due to the diversity and uneven distribution of source sentences. First, their performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, the sentence level translation quality is not assured since tuning is performed on the document level rather than on sentence level. In contrast with the standard global training in which a single weight is learned, we propose novel local training methods to address these two problems. We perform training and testing in one step by locally learning the sentence-wise weight for each input sentence. Since the time of each tuning step is unnegligible and learning sentence-wise weights for the entire test set means many passes of tuning, it is a great challenge for the efficiency of local training. We propose an efficient two-phase method to put the local training into practice by employing the ultraconservative update. On NIST Chinese-to-English translation tasks with both medium and large scales of training data, our local training methods significantly outperform standard methods with the maximal improvements up to 2.0 BLEU points, meanwhile their efficiency is comparable to that of the standard methods.

References

[1]
Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1 (2008), 117--122.
[2]
Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 200--208.
[3]
Léon Bottou and Vladimir Vapnik. 1992. Local learning algorithms. Neural Comput. 4, 6, 888--900.
[4]
Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Harvard University.
[5]
Haibin Cheng, Pang-Ning Tan, and Rong Jin. 2010. Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22, 4, 537--549.
[6]
Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 427--436.
[7]
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 263--270.
[8]
David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 224--233.
[9]
Lauren Eby Clemens, Pedro Mateo Pedro, Maria Polinsky, and Gabrielle Tandet. 2011.New approaches to linguistic work in Mesoamerican communities. Mexico and Central America Program Speaker Series.
[10]
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551--585.
[11]
Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3 (March 2003), 951--991.
[12]
George Foster, Cyril Goutte, and Roland Kuhn. 2010. Discriminative instance weighting for domain adaptation in statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 451--459.
[13]
Michel Galley and Chris Quirk. 2011. Optimal search for minimum error rate training. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 38--49.
[14]
William W. Hager and Hongchao Zhang. 2006. Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32, 1, 113--137.
[15]
Yifan He, Yanjun Ma, Josef van Genabith, and Andy Way. 2010. Bridging SMT and TM with translation recommendation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 622--630.
[16]
S. Hildebrand, M. Eck, S. Vogel, and Alex Waibel. 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of 10th Conference of the European Association for Machine Translation. Association for Computational Linguistics.
[17]
Mark Hopkins and Jonathan May. 2011. Tuning as ranking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1352--1362.
[18]
Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 264--271.
[19]
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing. Association for Computational Linguistics.
[20]
Philipp Koehn, Hieu Hoang, Alexandra Birch, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL’07). Association for Computational Linguistics, 177--180.
[21]
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
[22]
Mu Li, Yinggong Zhao, Dongdong Zhang, and Ming Zhou. 2010. Adaptive development data selection for log-linear model in statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Association for Computational Linguistics, 662--670.
[23]
Shujie Liu, Chi-ho Lee Lee, Mu Li, and Ming Zhou. 2012. A co-training framework for feature weight optimization of statistic machine translation. Journal of Software (In Chinese) 23, 12.
[24]
Yajuan Lü, Jin Huang, and Qun Liu. 2007. Improving statistical machine translation performance by training data selection and optimization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 343--350.
[25]
Yanjun Ma, Yifan He, Andy Way, and Josef van Genabith. 2011. Consistent translation using discriminative learning: A translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1239--1248.
[26]
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
[27]
Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistic - Vol. 1 (COLING’08). Association for Computational Linguistics, 585--592.
[28]
Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160--167.
[29]
Franz Josef Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 440--447.
[30]
Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 295--302.
[31]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine Translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311--318.
[32]
Adam Pauls, John Denero, and Dan Klein. 2009. Consensus training for consensus decoding in machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1418--1427.
[33]
John C. Platt. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. MIT Press, Cambridge, MA. 185--208.
[34]
Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. 2007. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, 807--814.
[35]
Nobuyuki Shimizu and Andrew Haas. 2006. Exact decoding for jointly labeling and chunking sequences. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (COLING-ACL’06). Association for Computational Linguistics,763--770.
[36]
David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear Models. In Proceedings of the COLING/ACL Main Conference Poster Sessions. Association for Computational Linguistics, 787--794.
[37]
Andreas Stolcke. 2002. SRILM: An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing.
[38]
Aleš Tamchyna, Petra Galuščáková, Amir Kamran, Miloš Stanojević, and Ondřej Bojar. 2012.Selecting data for English-to-Czech machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation (WMT’12). Association for Computational Linguistics, 374--381.
[39]
K. Wäschle, P. Simianer, S. Bertoldi, N. Riezler, and M. Federico. 2013. Generative and discriminative methods for online adaptation in SMT. In Proceedings of MT Summit.
[40]
Taro Watanabe and Eiichiro Sumita. 2003. Example-based decoding for statistical machine translation. In Proceedings of MT Summit. 410--417.
[41]
Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, 764--773.
[42]
Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. 2006. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition: Volume 2 (CVPR’06). IEEE, 2126--2136.
[43]
Bing Zhao and Shengyuan Chen. 2009. A simplex Armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (NAACL-Short’09). Association for Computational Linguistics, 21--24.
[44]
Yinggong Zhao, Shujie Liu, Yangsheng Ji, Jiajun Chen, and Guodong Zhou. 2011.Transductive minimum error rate training for statistical machine translation. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 641--648.
[45]
Zhongguang Zheng, Zhongjun He, Yao Meng, and Hao Yu. 2010. Domain adaptation for statistical machine translation in development corpus selection. In Proceedings of Universal Communication Symposium. 2--7.

Index Terms

  1. Discriminative Training for Log-Linear Based SMT: Global or Local Methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian Language Information Processing
    ACM Transactions on Asian Language Information Processing  Volume 13, Issue 4
    December 2014
    84 pages
    ISSN:1530-0226
    EISSN:1558-3430
    DOI:10.1145/2701119
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 December 2014
    Accepted: 01 June 2014
    Revised: 01 March 2014
    Received: 01 October 2013
    Published in TALIP Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. global training
    2. local training
    3. log-linear model
    4. ultraconservative update

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 249
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media