skip to main content
research-article

Two-Stage Hypotheses Generation for Spoken Language Translation

Published: 01 March 2009 Publication History

Abstract

Spoken Language Translation (SLT) is the research area that focuses on the translation of speech or text between two spoken languages. Phrase-based and syntax-based methods represent the state-of-the-art for statistical machine translation (SMT). The phrase-based method specializes in modeling local reorderings and translations of multiword expressions. The syntax-based method is enhanced by using syntactic knowledge, which can better model long word reorderings, discontinuous phrases, and syntactic structure. In this article, we leverage on the strength of these two methods and propose a strategy based on multiple hypotheses generation in a two-stage framework for spoken language translation. The hypotheses are generated in two stages, namely, decoding and regeneration. In the decoding stage, we apply state-of-the-art, phrase-based, and syntax-based methods to generate basic translation hypotheses. Then in the regeneration stage, much more hypotheses that cannot be captured by the decoding algorithms are produced from the basic hypotheses. We study three regeneration methods: redecoding, n-gram expansion, and confusion network in the second stage. Finally, an additional reranking pass is introduced to select the translation outputs by a linear combination of rescoring models. Experimental results on the Chinese-to-English IWSLT-2006 challenge task of translating the transcription of spontaneous speech show that the proposed mechanism achieves significant improvements over the baseline of about 2.80 BLEU-score.

References

[1]
Aiello, D., Cerrato, L., Delogu, C., and Di Carlo, A. 1999. EUTRANS project: FUB activity in spoken machine translation. In Proceedings of the Venezia per il Trattamento Automatico delle Lingue (VEXTAL’99).
[2]
Amengual, J. C., Castano, A., Castellanos, A., Jimenez, V. M., Llorens, D., Marzal A., Prat, F., Vilar, J. M., Benedi, J. M., Casacuberta, F., Pastor, M., and Vidal, E. 2000. The EuTrans Spoken Language Translation System. J. Mach. Transl. 15, 75--103.
[3]
Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’01), 351--354.
[4]
Bangalore, S. and Riccardi, G. 2002. Stochastic finite-state models for spoken language machine translation. J. Mach. Transl. 17, 3, 165--184.
[5]
Berger, A. L., Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Kehler, A. S., and Mercer, R. L. 1996. Language translation apparatus and methods using context-based translation models. U.S. Patent No. 5,510,981.
[6]
Besacier, L., Blanchon, H., Fouquet, Y., Guilbaud, J. P., Helme, S., Mazenot, S., Moraru, D., and Vaufreydaz, D. 2001. Speech translation for French in the NESPOLE! European project. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH’01).
[7]
Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 192, 263--312.
[8]
Casacuberta, F., Vidal, E., and Vilar, J. M. 2002. Architectures for speech-to-speech translation using finite-state models. In Proceedings of the Speech-to-Speech Translation Workshop, 39--44.
[9]
Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., and Federico, M. 2005. The ITC-irst SMT System for IWSLT-2005. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’05), 98--104.
[10]
Chen, B., Cettolo, M., and Federico, M. 2006. Reordering rules for phrase-based statistical machine translation. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’06).
[11]
Chen, B., Federico, M., and Cettolo, M. 2007a. Better N-best translations through generative n-gram language models. In Proceedings of the Machine Translation Summit XI (MT’07).
[12]
Chen, B., Sun, J., Jiang, H., Zhang M., and Aw, A. T. 2007b. I2R Chinese-English translation system for IWSLT-2007. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’07).
[13]
Chen, B., Zhang M., Aw, A. T., and Li, H. 2008. Regenerating hypotheses for statistical machine translation. In Proceedings of the International Conference on Computer Linguistics (COLING’08).
[14]
Chen, S. F. and Goodman, J. 1998. An empirical study of smoothing techniques for language modeling. Tech. rep. TR-10-98, Center for Research in Computing Technology, Harvard University.
[15]
Doddington, G. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’02).
[16]
Eurospeech. 2003. Special session: Multilingual speech-to-speech translation. In Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH’03), 361--384.
[17]
Fellbaum, C. Ed. 1998. WordNet: An Electronic Lexical Database. MIT Press.
[18]
Fiscus, J. G. 1997. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’97), 347--352.
[19]
Frederking, R. and Nirenburg, S. 1994. Three heads are better than one. In Proceedings of the 4th ACL Conference on Applied Natural Language Processing (ANLP’94), 95--100.
[20]
Frederking, R., Rudnicky, A., and Hogan, C. 1997. Interactive speech translation in the DIPLOMAT project. In Proceedings of the Workshop on Spoken Language Translation (IWSLT’97).
[21]
Gao, Y., Zhou, B., Diao, Z., Sorensen, J., and Picheny, M. 2002. MARS: A statistical semantic parsing and generation-based multilingual automatic Translation system. Mach. Transl. 17, 3, 185--212.
[22]
He, Z., Mi, H., Liu, Y., Xiong, D., Luo, W., Huang, Y., Ren, Z., Lu, Y., and Liu, Q. 2007. The ICT statistical machine translation systems for IWSLT 2007. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT’07).
[23]
Hitoshi, I., Sumita, E., and Furuse, O. 1996. Spoken language translation method using examples. In Proceedings of the International Conference on Computer Linguistics (COLING’96).
[24]
Hoge, H. 2002. Project proposal TC-STAR: Make speech-to-speech translation real. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’02), 136--141.
[25]
Horiguchi K. and Franz, A. 1997. A formal basis for spoken language translation by analogy. In Proceedings of the Workshop Spoken Language Translation in Conjunction with ACL/EACL’97 (IWSLT’97), 32--39.
[26]
Hovy, E. 1994. PANGLOSS: Knowledge-based machine translation. In Proceedings of the Workshop on Human Language Technology (HLT’94), 478--478.
[27]
Hsiao, R., Venugopal, A., Kohler, T. Zhang, Y., Zollmann, P. C. A., Vogel, S., Black, A. W., Schultz, T., and Waibel, A. 2006. Optimizing components for handheld two-way speech translation for an English-Iraqi Arabic system. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’06).
[28]
Huang, F. and Papineni, K. 2007. Hierarchical system combination for machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), 277--286.
[29]
Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of Joint Conference on Empirical Methods in Natural Language Processing (EMNLP’04), 388--395.
[30]
Koehn, P., Axelrod, A., Mayne, A. B., Callison-Burch, C., Osborne, M., and Talbot, D. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of the Workshop Spoken Language Translation (IWSLT’05).
[31]
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of Human Language Technology Conference/North American Chapter on the Association for Computational Linguistics (HLT/NAACL’03), 127--133.
[32]
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’07), 177--180.
[33]
Kraif, O. and Chen, B. 2004. Combining clues for lexical level aligning using the Null hypothesis a roach. In Proceedings of the International Conference on Computer Linguistics (COLING’04), 1261--1264.
[34]
Kumar, S. and Byrne, W. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Workshop on Human Language Technology (HLT’04).
[35]
Lavie, A., Langley, C., Waibel, A., Pianesi, F., Lazzari, G., Coletti, P., Taddei, L., and Balducci, F. 2001. Architecture and design considerations in NESPOLE! A speech translation system for e-commerce applications. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT’01), J. Allan, Ed., 31--39.
[36]
Lee, Y.-S., Yi, W. S., Seneff, S., and Weinstein, C. 2001. Interlingua-based broad-coverage Korean-to-English translation in CCLINC. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT’01).
[37]
Macherey, W. and Och, F. J. 2007. An empirical study on computing consensus translations from multiple machine translation systems. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), 986--995.
[38]
Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proceedings of the Conference on the European Chapter of the Association for Computational Linguistic (EACL’06).
[39]
Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguist. 262, 221--249.
[40]
Ney, H., Nieben, S., Och, F. J., Sawaf, H., Tillmann, C., and Vogel, S. 2000. Algorithms for statistical translation of spoken language. In IEEE Trans. Speech Audio Process. 8, 1, 24--36.
[41]
Ney, H. 2003. The statistical approach to machine translation and a roadmap for speech translation. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’03), 361--364.
[42]
Nirenburg, S., Carbonell, J., Tomita, M., and Goodman, K. 1992. Machine Translation: A Knowledge-Based Approach. Morgan Kaufmann Publishers, San Mateo, CA.
[43]
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’03).
[44]
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Association of Computer Linguistics (ACL’02).
[45]
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 291, 19--51.
[46]
Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Association of Computer Linguistics (ACL’02).
[47]
Paul, M., Doi, T., Hwang, Y., Imamura, K., Okuma, H., and Sumita, E. 2005. Nobody is perfect: ATR’s hybrid a roach to spoken language translation. In Proceedings of the Workshop Spoken Language Translation (IWSLT’05), 55--62.
[48]
Quirk, C. and Menezes, A. 2006. Do we need phrases? Challenging the conventional wisdom in SMT. In Proceedings of the International Conference on Computer Linguistics (COLING’06), 9--16.
[49]
Rayner, M. and Bouillon, P. 1995. Hybrid transfer in an English-French spoken language translator. In Proceedings of IA’95, 153--162.
[50]
Rayner, M. and Carter, D. 1997. Hybrid language processing in the spoken language translator. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97).
[51]
Rosti, A., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R., and Dorr, B. 2007a. Combining outputs from multiple machine translation systems. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03), 228--235.
[52]
Rosti, A., Matsoukas, S., and Schwartz, R. 2007b. Improved word-level system combination for machine translation. In Proceedings of the Association of Computer Linguistics (ACL’07).
[53]
Sebastian, S., Zong, C., Reichert, J., Cao, W., Kolss, M., Xie, G., Peterson, K., Ding, P., Arranz, V., Yu, J., and Waibel, A. 2006. Speech-to-speech translation services for the Olympic games 2008. In Proceedings of the 3rd Joint Workshop on Machine Learning and Multimodal Interaction (MLMI’06).
[54]
Seligman, M. 2000. Nine issues in speech translation. Machine Translation 15, 149--185.
[55]
Sim, K. C., Byrne, W. J., Gales, M. J. F., Sahbi, H., and Woodland, P. C. 2007. Consensus network decoding for statistical machine translation system combination. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07).
[56]
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).
[57]
Stolcke, A. 2002. SRILM -- An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language (ICSLP’02), 901--904.
[58]
Sugaya, F., Takezawa, T., Yokoo, A., Sagisaka, Y., and Yamamoto, S. 1999. End-to-end evaluation in ATR-MATRIX: Speech translation system between English and Japanese. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’99), 2431--2434.
[59]
Sumita, E., Yamada, S., Yamamoto, K., Paul, M., Kashioka, H., Ishikawa, K., and Shirai S. 1999. Solutions to problems inherent in spoken-language translation: The ATR-MATRIX a roach. In Proceedings of Machine Translation Summit VII (MT’99), 229--235.
[60]
Sumita, E., Akiba, Y., and Doi, T. 2003. A corpus-centered a roach to spoken language translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’03), 171--174.
[61]
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., and Yamamoto, S. 2002. Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’02).
[62]
Takezawa, T., Morimoto, T., Sagisaka, Y., Campbell, N., Iida, H., Sugaya, F., Yokoo, A., and Yamamoto, S. 1998. A Japanese-to-English speech translation system: ATR-MATRIX. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98).
[63]
Tillmann, C. and Zhang, T. 2005. A localized prediction model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’05), 557--564.
[64]
Ueffing, N. and Ney, H. 2007. Word-level confidence estimation for machine translation. Comput. Linguist. 331, 9--40.
[65]
Vidal, E. 1997. Finite-state speech-to-speech translation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97).
[66]
Vogel, S., Hewavitharana, S., Kolss, M., and Waibel, A. 2004. The ISL statistical translation system for spoken language translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT’04).
[67]
Waibel, A., Jain, A., Mcnair, A., Saito, H., Hauptmann, A., and Tebelskis, J. 1991. JANUS: A speech-to-speech translation system using connectionist and symbolic processing strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’91).
[68]
Wahlster, W., Ed. 2000. Verbmobil: Foundations of Speech-to-Speech Translations. Springer Verlag, Berlin.

Cited By

View all
  • (2023)Audio–visual keyword transformer for unconstrained sentence‐level keyword spottingCAAI Transactions on Intelligence Technology10.1049/cit2.122129:1(142-152)Online publication date: 20-Mar-2023
  • (2022)English Long Sentence Segmentation and Translation Optimization of Professional Literature Based on Hierarchical Network of ConceptsMobile Information Systems10.1155/2022/30901152022Online publication date: 1-Jan-2022
  • (2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
  • Show More Cited By

Index Terms

  1. Two-Stage Hypotheses Generation for Spoken Language Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian Language Information Processing
    ACM Transactions on Asian Language Information Processing  Volume 8, Issue 1
    March 2009
    75 pages
    ISSN:1530-0226
    EISSN:1558-3430
    DOI:10.1145/1482343
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2009
    Accepted: 01 November 2008
    Revised: 01 August 2008
    Received: 01 February 2008
    Published in TALIP Volume 8, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Spoken language translation
    2. hypotheses generation
    3. statistical machine translation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Audio–visual keyword transformer for unconstrained sentence‐level keyword spottingCAAI Transactions on Intelligence Technology10.1049/cit2.122129:1(142-152)Online publication date: 20-Mar-2023
    • (2022)English Long Sentence Segmentation and Translation Optimization of Professional Literature Based on Hierarchical Network of ConceptsMobile Information Systems10.1155/2022/30901152022Online publication date: 1-Jan-2022
    • (2016)Exploring the use of unsupervised query modeling techniques for speech recognition and summarizationSpeech Communication10.1016/j.specom.2016.03.00680:C(49-59)Online publication date: 1-Jun-2016
    • (2014)Multi-modal Language Models for Lecture Video RetrievalProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654964(1081-1084)Online publication date: 3-Nov-2014
    • (2014)Leveraging topical and positional cues for language modeling in speech recognitionMultimedia Tools and Applications10.1007/s11042-013-1456-272:2(1465-1481)Online publication date: 1-Sep-2014
    • (2014)Graph regularized GM-pLSA and its applications to video content analysisMultimedia Systems10.1007/s00530-014-0378-920:4(429-445)Online publication date: 1-Jul-2014
    • (2011)Probabilistic modulation spectrum factorization for robust speech recognitionROCLING 2011 Poster Papers10.5555/2207071.2207072(194-206)Online publication date: 8-Sep-2011
    • (2010)A risk minimization framework for extractive speech summarizationProceedings of the 48th Annual Meeting of the Association for Computational Linguistics10.5555/1858681.1858690(79-87)Online publication date: 11-Jul-2010
    • (2010)Topic-Dependent Language Model with Voting on Noun HistoryACM Transactions on Asian Language Information Processing10.1145/1781134.17811379:2(1-31)Online publication date: 1-Jun-2010
    • (2009)Spoken document retrieval using topic modelsProceedings of the 3rd International Universal Communication Symposium10.1145/1667780.1667862(400-403)Online publication date: 3-Dec-2009
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media