Abstract
Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.
- Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. 2002. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188. Google ScholarDigital Library
- Asahara, M. and Matsumoto, Y. 2000. Extended models and tools for high performance part-of-speech tagger. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). 21--27. Google ScholarDigital Library
- Brown, P., Della-Pietra, S., Della-Pietra, V., and Mercer, R. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19, 2, 263--311. Google ScholarDigital Library
- CCITT. 1984. Absolute Category Rating (ACR) Method for Subjective Testing of Digital Processors. Red Book.Google Scholar
- Finch, A. and Sumita, E. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Statistical Machine Translation Workshop (WMT). 208--215. Google ScholarDigital Library
- Foster, G. and Kuhn, R. 2007. Mixture model adaptation for SMT. In Proceedings of the Statistical Machine Translation Workshop (WMT). 128--135. Google ScholarDigital Library
- Fujimoto, M. and Nakamura, S. 2006. A non-stationary noise suppression method based on particle filtering and Polyak averaging. IEICE Trans. Inform. Syst. J89-ED, 3, 922--930. Google ScholarDigital Library
- Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97--109.Google ScholarCross Ref
- Hu, X., Isotani, R., and Nakamura, S. 2009. Construction of Chinese segmented and POS-tagged conversational corpora and their evaluations on spontaneous speech recognitions. In Proceedings of the 7th Workshop on Asian Language Resource, Annual Meeting of Association for Computational Linguistics (ACL). 65--70. Google ScholarDigital Library
- Jitsuhiro, T., Matsui, T., and Nakamura, S. 2004. Automatic generation of non-uniform HMM topologies based on the MDL criterion. IEICE Trans. Inform. Syst. E87-D, 8, 2121--2129. Google ScholarDigital Library
- Kawai, H., Toda, T., Ni, J., Tsuzaki, M., and Tokuda, K. 2004. XIMERA: A new TTS from ATR based on corpus-based technologies. In Proceedings of the ISCA Speech Synthesis Workshop (SSW5). 179--184.Google Scholar
- Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S. 2003. Creating corpora for speech-to-speech translation. In Proceedings of EUROSPEECH. 381--384.Google Scholar
- Kikui, G., Takezawa, T., Mizushima, M., Yamamoto, S., Sasaki, Y., Kawai, H., and Nakamura, S. 2005. Monitor experiments of ATR speech-to-speech translation system. In Proceedings of the Autumn Meeting of the Acoustical Society of Japan (ASJ). 19--20.Google Scholar
- Kikui, G., Yamamoto, S., Takezawa, T., and Sumita, E. 2006. Comparative study on corpora for speech translation. IEEE Trans. Audio Speech Lang. Process. 14, 5, 1674--1682. Google ScholarDigital Library
- Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference. 127--133. Google ScholarDigital Library
- Lo, W. K. and Soong, F. K. 2005. Generalized posterior probability for minimum error verification of recognized sentences. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 85--88.Google Scholar
- Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 160--167. Google ScholarDigital Library
- Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 295--302. Google ScholarDigital Library
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1, 19--51. Google ScholarDigital Library
- Ostendorf, M. and Singer, H. 1997. HMM topology design using maximum likelihood successive state splitting. Comput. Speech Lang. 11, 17--41.Google ScholarCross Ref
- Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 311--318. Google ScholarDigital Library
- Paul, M., Okuma, H., Yamamoto, H., Sumita, E., Matsuda, S., Shimizu, T., and Nakamura, S. 2008. Multilingual mobile-phone translation services for world travelers. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). Companion Volume.165--168. Google ScholarDigital Library
- Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of the pivot language selection for statistical machine translation. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT). 221--224. Google ScholarDigital Library
- Sakti, S., Kimura, N., Paul, M., Hori, C., Sumita, E., Nakamura, S., Park, J., Wutiwiwachai, C., Xu, B., Riza, H., Arora, K., Luong, C., and Li, H. 2009. The Asian network-based speech-to-speech translation system. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 507--512.Google Scholar
- Segura, J. C., Torre, A. D. L., Benitez, M. C., and Peinado, A. M. 2001. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks. In Proceedings of EUROSPEECH. 221--224.Google Scholar
- Soong, F. K., Loo, W. K., and Nakamura, S. 2004. Optimal acoustic and language model weight for minimizing word verification errors. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 441--444.Google Scholar
- Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.Google Scholar
- Takami, J. and Sagayama, S. 1992. A successive state splitting algorithm for efficient allophone modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 573--576. Google ScholarDigital Library
- Takezawa, T. and Kikui, G. 2003. Collecting machine-translation aided bilingual dialogues for corpus-based speech-to-speech translation. In Proceedings of EUROSPEECH. 2757--2760.Google Scholar
- Takezawa, T. and Kikui, G. 2004. A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC). 1589--1592.Google Scholar
- Toda, T., Kawai, H., and Tsuzaki, M. 2004. Optimizing sub-cost functions for segment selection based on perceptual evaluation in concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 657--660.Google Scholar
- Tokuda, K., Kobayashi, T., and Imai, S. 1995. Speech parameter generation from HMM using dynamic features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 660--663.Google Scholar
- Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., and Kitamura, T. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1215--1218.Google Scholar
- Yamamoto, H., Isogai, S., and Sagisaka, Y. 2003. Multi-class composite N-gram language model. Speech Comm. 41, 369--379.Google ScholarCross Ref
- Yamamoto, H. and Sumita, E. 2007. Bilingual cluster-based model for statistical machine translation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNNL). 514--523.Google Scholar
Index Terms
- Distributed speech translation technologies for multiparty multilingual communication
Recommendations
The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks
This paper describes our recent improvements to IBM TRANSTAC speech-to-speech translation systems that address various issues arising from dealing with resource-constrained tasks, which include both limited amounts of linguistic resources and training ...
Impacts of machine translation and speech synthesis on speech-to-speech translation
This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech ...
The VoiceTRAN speech-to-speech translation communicator
AEE'06: Proceedings of the 5th WSEAS international conference on Applications of electrical engineeringThis paper describes the design phases of the VoiceTRAN Communicator, which integrates speech recognition, machine translation, and text-to-speech synthesis using the Galaxy architecture. The aim of the work was to build a robust multimodal speech-to-...
Comments