skip to main content
research-article

Distributed speech translation technologies for multiparty multilingual communication

Authors Info & Claims
Published:02 August 2012Publication History
Skip Abstract Section

Abstract

Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.

References

  1. Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. 2002. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asahara, M. and Matsumoto, Y. 2000. Extended models and tools for high performance part-of-speech tagger. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). 21--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brown, P., Della-Pietra, S., Della-Pietra, V., and Mercer, R. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19, 2, 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CCITT. 1984. Absolute Category Rating (ACR) Method for Subjective Testing of Digital Processors. Red Book.Google ScholarGoogle Scholar
  5. Finch, A. and Sumita, E. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Statistical Machine Translation Workshop (WMT). 208--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Foster, G. and Kuhn, R. 2007. Mixture model adaptation for SMT. In Proceedings of the Statistical Machine Translation Workshop (WMT). 128--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fujimoto, M. and Nakamura, S. 2006. A non-stationary noise suppression method based on particle filtering and Polyak averaging. IEICE Trans. Inform. Syst. J89-ED, 3, 922--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97--109.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hu, X., Isotani, R., and Nakamura, S. 2009. Construction of Chinese segmented and POS-tagged conversational corpora and their evaluations on spontaneous speech recognitions. In Proceedings of the 7th Workshop on Asian Language Resource, Annual Meeting of Association for Computational Linguistics (ACL). 65--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jitsuhiro, T., Matsui, T., and Nakamura, S. 2004. Automatic generation of non-uniform HMM topologies based on the MDL criterion. IEICE Trans. Inform. Syst. E87-D, 8, 2121--2129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kawai, H., Toda, T., Ni, J., Tsuzaki, M., and Tokuda, K. 2004. XIMERA: A new TTS from ATR based on corpus-based technologies. In Proceedings of the ISCA Speech Synthesis Workshop (SSW5). 179--184.Google ScholarGoogle Scholar
  12. Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S. 2003. Creating corpora for speech-to-speech translation. In Proceedings of EUROSPEECH. 381--384.Google ScholarGoogle Scholar
  13. Kikui, G., Takezawa, T., Mizushima, M., Yamamoto, S., Sasaki, Y., Kawai, H., and Nakamura, S. 2005. Monitor experiments of ATR speech-to-speech translation system. In Proceedings of the Autumn Meeting of the Acoustical Society of Japan (ASJ). 19--20.Google ScholarGoogle Scholar
  14. Kikui, G., Yamamoto, S., Takezawa, T., and Sumita, E. 2006. Comparative study on corpora for speech translation. IEEE Trans. Audio Speech Lang. Process. 14, 5, 1674--1682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference. 127--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lo, W. K. and Soong, F. K. 2005. Generalized posterior probability for minimum error verification of recognized sentences. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 85--88.Google ScholarGoogle Scholar
  17. Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1, 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ostendorf, M. and Singer, H. 1997. HMM topology design using maximum likelihood successive state splitting. Comput. Speech Lang. 11, 17--41.Google ScholarGoogle ScholarCross RefCross Ref
  21. Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paul, M., Okuma, H., Yamamoto, H., Sumita, E., Matsuda, S., Shimizu, T., and Nakamura, S. 2008. Multilingual mobile-phone translation services for world travelers. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). Companion Volume.165--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of the pivot language selection for statistical machine translation. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT). 221--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sakti, S., Kimura, N., Paul, M., Hori, C., Sumita, E., Nakamura, S., Park, J., Wutiwiwachai, C., Xu, B., Riza, H., Arora, K., Luong, C., and Li, H. 2009. The Asian network-based speech-to-speech translation system. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 507--512.Google ScholarGoogle Scholar
  25. Segura, J. C., Torre, A. D. L., Benitez, M. C., and Peinado, A. M. 2001. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks. In Proceedings of EUROSPEECH. 221--224.Google ScholarGoogle Scholar
  26. Soong, F. K., Loo, W. K., and Nakamura, S. 2004. Optimal acoustic and language model weight for minimizing word verification errors. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 441--444.Google ScholarGoogle Scholar
  27. Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.Google ScholarGoogle Scholar
  28. Takami, J. and Sagayama, S. 1992. A successive state splitting algorithm for efficient allophone modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 573--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Takezawa, T. and Kikui, G. 2003. Collecting machine-translation aided bilingual dialogues for corpus-based speech-to-speech translation. In Proceedings of EUROSPEECH. 2757--2760.Google ScholarGoogle Scholar
  30. Takezawa, T. and Kikui, G. 2004. A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC). 1589--1592.Google ScholarGoogle Scholar
  31. Toda, T., Kawai, H., and Tsuzaki, M. 2004. Optimizing sub-cost functions for segment selection based on perceptual evaluation in concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 657--660.Google ScholarGoogle Scholar
  32. Tokuda, K., Kobayashi, T., and Imai, S. 1995. Speech parameter generation from HMM using dynamic features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 660--663.Google ScholarGoogle Scholar
  33. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., and Kitamura, T. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1215--1218.Google ScholarGoogle Scholar
  34. Yamamoto, H., Isogai, S., and Sagisaka, Y. 2003. Multi-class composite N-gram language model. Speech Comm. 41, 369--379.Google ScholarGoogle ScholarCross RefCross Ref
  35. Yamamoto, H. and Sumita, E. 2007. Bilingual cluster-based model for statistical machine translation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNNL). 514--523.Google ScholarGoogle Scholar

Index Terms

  1. Distributed speech translation technologies for multiparty multilingual communication

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Speech and Language Processing
            ACM Transactions on Speech and Language Processing   Volume 9, Issue 2
            July 2012
            58 pages
            ISSN:1550-4875
            EISSN:1550-4883
            DOI:10.1145/2287710
            Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 August 2012
            • Accepted: 1 March 2012
            • Revised: 1 February 2012
            • Received: 1 August 2011
            Published in tslp Volume 9, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader