research-article

Distributed speech translation technologies for multiparty multilingual communication

Authors:
Sakriani Sakti

Nara Institute of Science and Technology, Japan

Nara Institute of Science and Technology, Japan
View Profile

,
Michael Paul

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Andrew Finch

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Xinhui Hu

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Jinfu Ni

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Noriyuki Kimura

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Shigeki Matsuda

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Chiori Hori

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Yutaka Ashikari

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Hisashi Kawai

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Hideki Kashioka

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Eiichiro Sumita

National Institute of Information and Communications Technology, Japan

National Institute of Information and Communications Technology, Japan
View Profile

,
Satoshi Nakamura

Nara Institute of Science and Technology, Japan; National Institute of Information and Communications Technology, Japan

Nara Institute of Science and Technology, Japan; National Institute of Information and Communications Technology, Japan
View Profile

ACM Transactions on Speech and Language Processing Volume 9 Issue 2Article No.: 4pp 1–27https://doi.org/10.1145/2287710.2287712

Published:02 August 2012Publication History

ACM Transactions on Speech and Language Processing

Abstract

Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.

References

Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. 2002. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188. Google ScholarDigital Library
Asahara, M. and Matsumoto, Y. 2000. Extended models and tools for high performance part-of-speech tagger. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). 21--27. Google ScholarDigital Library
Brown, P., Della-Pietra, S., Della-Pietra, V., and Mercer, R. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19, 2, 263--311. Google ScholarDigital Library
CCITT. 1984. Absolute Category Rating (ACR) Method for Subjective Testing of Digital Processors. Red Book.Google Scholar
Finch, A. and Sumita, E. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Statistical Machine Translation Workshop (WMT). 208--215. Google ScholarDigital Library
Foster, G. and Kuhn, R. 2007. Mixture model adaptation for SMT. In Proceedings of the Statistical Machine Translation Workshop (WMT). 128--135. Google ScholarDigital Library
Fujimoto, M. and Nakamura, S. 2006. A non-stationary noise suppression method based on particle filtering and Polyak averaging. IEICE Trans. Inform. Syst. J89-ED, 3, 922--930. Google ScholarDigital Library
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97--109.Google ScholarCross Ref
Hu, X., Isotani, R., and Nakamura, S. 2009. Construction of Chinese segmented and POS-tagged conversational corpora and their evaluations on spontaneous speech recognitions. In Proceedings of the 7th Workshop on Asian Language Resource, Annual Meeting of Association for Computational Linguistics (ACL). 65--70. Google ScholarDigital Library
Jitsuhiro, T., Matsui, T., and Nakamura, S. 2004. Automatic generation of non-uniform HMM topologies based on the MDL criterion. IEICE Trans. Inform. Syst. E87-D, 8, 2121--2129. Google ScholarDigital Library
Kawai, H., Toda, T., Ni, J., Tsuzaki, M., and Tokuda, K. 2004. XIMERA: A new TTS from ATR based on corpus-based technologies. In Proceedings of the ISCA Speech Synthesis Workshop (SSW5). 179--184.Google Scholar
Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S. 2003. Creating corpora for speech-to-speech translation. In Proceedings of EUROSPEECH. 381--384.Google Scholar
Kikui, G., Takezawa, T., Mizushima, M., Yamamoto, S., Sasaki, Y., Kawai, H., and Nakamura, S. 2005. Monitor experiments of ATR speech-to-speech translation system. In Proceedings of the Autumn Meeting of the Acoustical Society of Japan (ASJ). 19--20.Google Scholar
Kikui, G., Yamamoto, S., Takezawa, T., and Sumita, E. 2006. Comparative study on corpora for speech translation. IEEE Trans. Audio Speech Lang. Process. 14, 5, 1674--1682. Google ScholarDigital Library
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference. 127--133. Google ScholarDigital Library
Lo, W. K. and Soong, F. K. 2005. Generalized posterior probability for minimum error verification of recognized sentences. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 85--88.Google Scholar
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 160--167. Google ScholarDigital Library
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 295--302. Google ScholarDigital Library
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1, 19--51. Google ScholarDigital Library
Ostendorf, M. and Singer, H. 1997. HMM topology design using maximum likelihood successive state splitting. Comput. Speech Lang. 11, 17--41.Google ScholarCross Ref
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 311--318. Google ScholarDigital Library
Paul, M., Okuma, H., Yamamoto, H., Sumita, E., Matsuda, S., Shimizu, T., and Nakamura, S. 2008. Multilingual mobile-phone translation services for world travelers. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). Companion Volume.165--168. Google ScholarDigital Library
Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of the pivot language selection for statistical machine translation. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT). 221--224. Google ScholarDigital Library
Sakti, S., Kimura, N., Paul, M., Hori, C., Sumita, E., Nakamura, S., Park, J., Wutiwiwachai, C., Xu, B., Riza, H., Arora, K., Luong, C., and Li, H. 2009. The Asian network-based speech-to-speech translation system. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 507--512.Google Scholar
Segura, J. C., Torre, A. D. L., Benitez, M. C., and Peinado, A. M. 2001. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks. In Proceedings of EUROSPEECH. 221--224.Google Scholar
Soong, F. K., Loo, W. K., and Nakamura, S. 2004. Optimal acoustic and language model weight for minimizing word verification errors. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 441--444.Google Scholar
Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.Google Scholar
Takami, J. and Sagayama, S. 1992. A successive state splitting algorithm for efficient allophone modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 573--576. Google ScholarDigital Library
Takezawa, T. and Kikui, G. 2003. Collecting machine-translation aided bilingual dialogues for corpus-based speech-to-speech translation. In Proceedings of EUROSPEECH. 2757--2760.Google Scholar
Takezawa, T. and Kikui, G. 2004. A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC). 1589--1592.Google Scholar
Toda, T., Kawai, H., and Tsuzaki, M. 2004. Optimizing sub-cost functions for segment selection based on perceptual evaluation in concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 657--660.Google Scholar
Tokuda, K., Kobayashi, T., and Imai, S. 1995. Speech parameter generation from HMM using dynamic features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 660--663.Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., and Kitamura, T. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1215--1218.Google Scholar
Yamamoto, H., Isogai, S., and Sagisaka, Y. 2003. Multi-class composite N-gram language model. Speech Comm. 41, 369--379.Google ScholarCross Ref
Yamamoto, H. and Sumita, E. 2007. Bilingual cluster-based model for statistical machine translation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNNL). 514--523.Google Scholar

Index Terms

Distributed speech translation technologies for multiparty multilingual communication

Recommendations

The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

This paper describes our recent improvements to IBM TRANSTAC speech-to-speech translation systems that address various issues arising from dealing with resource-constrained tasks, which include both limited amounts of linguistic resources and training ...
Read More
Impacts of machine translation and speech synthesis on speech-to-speech translation

This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech ...
Read More
The VoiceTRAN speech-to-speech translation communicator
AEE'06: Proceedings of the 5th WSEAS international conference on Applications of electrical engineering

This paper describes the design phases of the VoiceTRAN Communicator, which integrates speech recognition, machine translation, and text-to-speech synthesis using the Galaxy architecture. The aim of the work was to build a robust multimodal speech-to-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Speech and Language Processing Volume 9, Issue 2
July 2012
58 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/2287710
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 August 2012
- Accepted: 1 March 2012
- Revised: 1 February 2012
- Received: 1 August 2011
Published in tslp Volume 9, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed architecture platforms
machine translation
multiparty multilingual communication
speech recognition
speech-to-speech translation
text-to-speech
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 391
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed speech translation technologies for multiparty multilingual communication

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

Impacts of machine translation and speech synthesis on speech-to-speech translation

The VoiceTRAN speech-to-speech translation communicator

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed speech translation technologies for multiparty multilingual communication

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

Impacts of machine translation and speech synthesis on speech-to-speech translation

The VoiceTRAN speech-to-speech translation communicator

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media