research-article

Two-Stage Hypotheses Generation for Spoken Language Translation

Authors:

Ai Ti AwAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 8, Issue 1

Article No.: 4, Pages 1 - 22

https://doi.org/10.1145/1482343.1482347

Published: 01 March 2009 Publication History

Abstract

Spoken Language Translation (SLT) is the research area that focuses on the translation of speech or text between two spoken languages. Phrase-based and syntax-based methods represent the state-of-the-art for statistical machine translation (SMT). The phrase-based method specializes in modeling local reorderings and translations of multiword expressions. The syntax-based method is enhanced by using syntactic knowledge, which can better model long word reorderings, discontinuous phrases, and syntactic structure. In this article, we leverage on the strength of these two methods and propose a strategy based on multiple hypotheses generation in a two-stage framework for spoken language translation. The hypotheses are generated in two stages, namely, decoding and regeneration. In the decoding stage, we apply state-of-the-art, phrase-based, and syntax-based methods to generate basic translation hypotheses. Then in the regeneration stage, much more hypotheses that cannot be captured by the decoding algorithms are produced from the basic hypotheses. We study three regeneration methods: redecoding, n-gram expansion, and confusion network in the second stage. Finally, an additional reranking pass is introduced to select the translation outputs by a linear combination of rescoring models. Experimental results on the Chinese-to-English IWSLT-2006 challenge task of translating the transcription of spontaneous speech show that the proposed mechanism achieves significant improvements over the baseline of about 2.80 BLEU-score.

References

[1]

Aiello, D., Cerrato, L., Delogu, C., and Di Carlo, A. 1999. EUTRANS project: FUB activity in spoken machine translation. In Proceedings of the Venezia per il Trattamento Automatico delle Lingue (VEXTAL’99).

[2]

Amengual, J. C., Castano, A., Castellanos, A., Jimenez, V. M., Llorens, D., Marzal A., Prat, F., Vilar, J. M., Benedi, J. M., Casacuberta, F., Pastor, M., and Vidal, E. 2000. The EuTrans Spoken Language Translation System. J. Mach. Transl. 15, 75--103.

Digital Library

[3]

Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’01), 351--354.

[4]

Bangalore, S. and Riccardi, G. 2002. Stochastic finite-state models for spoken language machine translation. J. Mach. Transl. 17, 3, 165--184.

Digital Library

[5]

Berger, A. L., Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Kehler, A. S., and Mercer, R. L. 1996. Language translation apparatus and methods using context-based translation models. U.S. Patent No. 5,510,981.

[6]

Besacier, L., Blanchon, H., Fouquet, Y., Guilbaud, J. P., Helme, S., Mazenot, S., Moraru, D., and Vaufreydaz, D. 2001. Speech translation for French in the NESPOLE! European project. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH’01).

[7]

Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 192, 263--312.

Digital Library

[8]

Casacuberta, F., Vidal, E., and Vilar, J. M. 2002. Architectures for speech-to-speech translation using finite-state models. In Proceedings of the Speech-to-Speech Translation Workshop, 39--44.

Digital Library

[9]

Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., and Federico, M. 2005. The ITC-irst SMT System for IWSLT-2005. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’05), 98--104.

[10]

Chen, B., Cettolo, M., and Federico, M. 2006. Reordering rules for phrase-based statistical machine translation. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’06).

[11]

Chen, B., Federico, M., and Cettolo, M. 2007a. Better N-best translations through generative n-gram language models. In Proceedings of the Machine Translation Summit XI (MT’07).

[12]

Chen, B., Sun, J., Jiang, H., Zhang M., and Aw, A. T. 2007b. I2R Chinese-English translation system for IWSLT-2007. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’07).

[13]

Chen, B., Zhang M., Aw, A. T., and Li, H. 2008. Regenerating hypotheses for statistical machine translation. In Proceedings of the International Conference on Computer Linguistics (COLING’08).

Digital Library

[14]

Chen, S. F. and Goodman, J. 1998. An empirical study of smoothing techniques for language modeling. Tech. rep. TR-10-98, Center for Research in Computing Technology, Harvard University.

[15]

Doddington, G. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’02).

Digital Library

[16]

Eurospeech. 2003. Special session: Multilingual speech-to-speech translation. In Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH’03), 361--384.

[17]

Fellbaum, C. Ed. 1998. WordNet: An Electronic Lexical Database. MIT Press.

[18]

Fiscus, J. G. 1997. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’97), 347--352.

[19]

Frederking, R. and Nirenburg, S. 1994. Three heads are better than one. In Proceedings of the 4th ACL Conference on Applied Natural Language Processing (ANLP’94), 95--100.

Digital Library

[20]

Frederking, R., Rudnicky, A., and Hogan, C. 1997. Interactive speech translation in the DIPLOMAT project. In Proceedings of the Workshop on Spoken Language Translation (IWSLT’97).

[21]

Gao, Y., Zhou, B., Diao, Z., Sorensen, J., and Picheny, M. 2002. MARS: A statistical semantic parsing and generation-based multilingual automatic Translation system. Mach. Transl. 17, 3, 185--212.

Digital Library

[22]

He, Z., Mi, H., Liu, Y., Xiong, D., Luo, W., Huang, Y., Ren, Z., Lu, Y., and Liu, Q. 2007. The ICT statistical machine translation systems for IWSLT 2007. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’07).

[23]

Hitoshi, I., Sumita, E., and Furuse, O. 1996. Spoken language translation method using examples. In Proceedings of the International Conference on Computer Linguistics (COLING’96).

Digital Library

[24]

Hoge, H. 2002. Project proposal TC-STAR: Make speech-to-speech translation real. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’02), 136--141.

[25]

Horiguchi K. and Franz, A. 1997. A formal basis for spoken language translation by analogy. In Proceedings of the Workshop Spoken Language Translation in Conjunction with ACL/EACL’97 (IWSLT’97), 32--39.

[26]

Hovy, E. 1994. PANGLOSS: Knowledge-based machine translation. In Proceedings of the Workshop on Human Language Technology (HLT’94), 478--478.

Digital Library

[27]

Hsiao, R., Venugopal, A., Kohler, T. Zhang, Y., Zollmann, P. C. A., Vogel, S., Black, A. W., Schultz, T., and Waibel, A. 2006. Optimizing components for handheld two-way speech translation for an English-Iraqi Arabic system. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’06).

[28]

Huang, F. and Papineni, K. 2007. Hierarchical system combination for machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), 277--286.

[29]

Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of Joint Conference on Empirical Methods in Natural Language Processing (EMNLP’04), 388--395.

[30]

Koehn, P., Axelrod, A., Mayne, A. B., Callison-Burch, C., Osborne, M., and Talbot, D. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of the Workshop Spoken Language Translation (IWSLT’05).

[31]

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of Human Language Technology Conference/North American Chapter on the Association for Computational Linguistics (HLT/NAACL’03), 127--133.

Digital Library

[32]

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’07), 177--180.

Digital Library

[33]

Kraif, O. and Chen, B. 2004. Combining clues for lexical level aligning using the Null hypothesis a roach. In Proceedings of the International Conference on Computer Linguistics (COLING’04), 1261--1264.

Digital Library

[34]

Kumar, S. and Byrne, W. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Workshop on Human Language Technology (HLT’04).

[35]

Lavie, A., Langley, C., Waibel, A., Pianesi, F., Lazzari, G., Coletti, P., Taddei, L., and Balducci, F. 2001. Architecture and design considerations in NESPOLE! A speech translation system for e-commerce applications. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT’01), J. Allan, Ed., 31--39.

Digital Library

[36]

Lee, Y.-S., Yi, W. S., Seneff, S., and Weinstein, C. 2001. Interlingua-based broad-coverage Korean-to-English translation in CCLINC. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT’01).

Digital Library

[37]

Macherey, W. and Och, F. J. 2007. An empirical study on computing consensus translations from multiple machine translation systems. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), 986--995.

[38]

Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proceedings of the Conference on the European Chapter of the Association for Computational Linguistic (EACL’06).

[39]

Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguist. 262, 221--249.

Digital Library

[40]

Ney, H., Nieben, S., Och, F. J., Sawaf, H., Tillmann, C., and Vogel, S. 2000. Algorithms for statistical translation of spoken language. In IEEE Trans. Speech Audio Process. 8, 1, 24--36.

[41]

Ney, H. 2003. The statistical approach to machine translation and a roadmap for speech translation. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’03), 361--364.

[42]

Nirenburg, S., Carbonell, J., Tomita, M., and Goodman, K. 1992. Machine Translation: A Knowledge-Based Approach. Morgan Kaufmann Publishers, San Mateo, CA.

Digital Library

[43]

Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’03).

Digital Library

[44]

Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’02).

Digital Library

[45]

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 291, 19--51.

Digital Library

[46]

Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Association of Computer Linguistics (ACL’02).

Digital Library

[47]

Paul, M., Doi, T., Hwang, Y., Imamura, K., Okuma, H., and Sumita, E. 2005. Nobody is perfect: ATR’s hybrid a roach to spoken language translation. In Proceedings of the Workshop Spoken Language Translation (IWSLT’05), 55--62.

[48]

Quirk, C. and Menezes, A. 2006. Do we need phrases? Challenging the conventional wisdom in SMT. In Proceedings of the International Conference on Computer Linguistics (COLING’06), 9--16.

Digital Library

[49]

Rayner, M. and Bouillon, P. 1995. Hybrid transfer in an English-French spoken language translator. In Proceedings of IA’95, 153--162.

[50]

Rayner, M. and Carter, D. 1997. Hybrid language processing in the spoken language translator. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97).

Digital Library

[51]

Rosti, A., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R., and Dorr, B. 2007a. Combining outputs from multiple machine translation systems. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03), 228--235.

[52]

Rosti, A., Matsoukas, S., and Schwartz, R. 2007b. Improved word-level system combination for machine translation. In Proceedings of the Association of Computer Linguistics (ACL’07).

[53]

Sebastian, S., Zong, C., Reichert, J., Cao, W., Kolss, M., Xie, G., Peterson, K., Ding, P., Arranz, V., Yu, J., and Waibel, A. 2006. Speech-to-speech translation services for the Olympic games 2008. In Proceedings of the 3rd Joint Workshop on Machine Learning and Multimodal Interaction (MLMI’06).

[54]

Seligman, M. 2000. Nine issues in speech translation. Machine Translation 15, 149--185.

Digital Library

[55]

Sim, K. C., Byrne, W. J., Gales, M. J. F., Sahbi, H., and Woodland, P. C. 2007. Consensus network decoding for statistical machine translation system combination. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07).

[56]

Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).

[57]

Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language (ICSLP’02), 901--904.

[58]

Sugaya, F., Takezawa, T., Yokoo, A., Sagisaka, Y., and Yamamoto, S. 1999. End-to-end evaluation in ATR-MATRIX: Speech translation system between English and Japanese. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’99), 2431--2434.

[59]

Sumita, E., Yamada, S., Yamamoto, K., Paul, M., Kashioka, H., Ishikawa, K., and Shirai S. 1999. Solutions to problems inherent in spoken-language translation: The ATR-MATRIX a roach. In Proceedings of Machine Translation Summit VII (MT’99), 229--235.

[60]

Sumita, E., Akiba, Y., and Doi, T. 2003. A corpus-centered a roach to spoken language translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’03), 171--174.

Digital Library

[61]

Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., and Yamamoto, S. 2002. Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’02).

[62]

Takezawa, T., Morimoto, T., Sagisaka, Y., Campbell, N., Iida, H., Sugaya, F., Yokoo, A., and Yamamoto, S. 1998. A Japanese-to-English speech translation system: ATR-MATRIX. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98).

[63]

Tillmann, C. and Zhang, T. 2005. A localized prediction model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’05), 557--564.

Digital Library

[64]

Ueffing, N. and Ney, H. 2007. Word-level confidence estimation for machine translation. Comput. Linguist. 331, 9--40.

Digital Library

[65]

Vidal, E. 1997. Finite-state speech-to-speech translation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97).

Digital Library

[66]

Vogel, S., Hewavitharana, S., Kolss, M., and Waibel, A. 2004. The ISL statistical translation system for spoken language translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT’04).

[67]

Waibel, A., Jain, A., Mcnair, A., Saito, H., Hauptmann, A., and Tebelskis, J. 1991. JANUS: A speech-to-speech translation system using connectionist and symbolic processing strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’91).

Digital Library

[68]

Wahlster, W., Ed. 2000. Verbmobil: Foundations of Speech-to-Speech Translations. Springer Verlag, Berlin.

Cited By

Li YRen JWang YWang GLi XLiu H(2023)Audio–visual keyword transformer for unconstrained sentence‐level keyword spottingCAAI Transactions on Intelligence Technology10.1049/cit2.122129:1(142-152)Online publication date: 20-Mar-2023
https://dl.acm.org/doi/10.1049/cit2.12212
Shao DMa R(2022)English Long Sentence Segmentation and Translation Optimization of Professional Literature Based on Hierarchical Network of ConceptsMobile Information Systems10.1155/2022/30901152022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3090115
Chen KLiu SChen BWang HChen H(2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.specom.2016.03.006
Show More Cited By

Index Terms

Two-Stage Hypotheses Generation for Spoken Language Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Dependency treelet translation: the convergence of statistical and example-based machine-translation?

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...
Integrating source-language context into phrase-based statistical machine translation

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 8, Issue 1

March 2009

75 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1482343

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2009

Accepted: 01 November 2008

Revised: 01 August 2008

Received: 01 February 2008

Published in TALIP Volume 8, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
364
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YRen JWang YWang GLi XLiu H(2023)Audio–visual keyword transformer for unconstrained sentence‐level keyword spottingCAAI Transactions on Intelligence Technology10.1049/cit2.122129:1(142-152)Online publication date: 20-Mar-2023
https://dl.acm.org/doi/10.1049/cit2.12212
Shao DMa R(2022)English Long Sentence Segmentation and Translation Optimization of Professional Literature Based on Hierarchical Network of ConceptsMobile Information Systems10.1155/2022/30901152022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3090115
Chen KLiu SChen BWang HChen H(2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.specom.2016.03.006
Chen HCooper MJoshi DGirod BHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Multi-modal Language Models for Lecture Video RetrievalProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654964(1081-1084)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654964
Chiu HChen KChen B(2014)Leveraging topical and positional cues for language modeling in speech recognitionMultimedia Tools and Applications10.1007/s11042-013-1456-272:2(1465-1481)Online publication date: 1-Sep-2014
https://dl.acm.org/doi/10.1007/s11042-013-1456-2
Zhong CMiao Z(2014)Graph regularized GM-pLSA and its applications to video content analysisMultimedia Systems10.1007/s00530-014-0378-920:4(429-445)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1007/s00530-014-0378-9
Chu WKao YChen BHung JLiao YTsai WYu L(2011)Probabilistic modulation spectrum factorization for robust speech recognitionROCLING 2011 Poster Papers10.5555/2207071.2207072(194-206)Online publication date: 8-Sep-2011
https://dl.acm.org/doi/10.5555/2207071.2207072
Lin SChen BHajič J(2010)A risk minimization framework for extractive speech summarizationProceedings of the 48th Annual Meeting of the Association for Computational Linguistics10.5555/1858681.1858690(79-87)Online publication date: 11-Jul-2010
https://dl.acm.org/doi/10.5555/1858681.1858690
Naptali WTsuchiya MNakagawa S(2010)Topic-Dependent Language Model with Voting on Noun HistoryACM Transactions on Asian Language Information Processing10.1145/1781134.17811379:2(1-31)Online publication date: 1-Jun-2010
https://dl.acm.org/doi/10.1145/1781134.1781137
Hu XIsotani RNakamura SEnami K(2009)Spoken document retrieval using topic modelsProceedings of the 3rd International Universal Communication Symposium10.1145/1667780.1667862(400-403)Online publication date: 3-Dec-2009
https://dl.acm.org/doi/10.1145/1667780.1667862
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents