article

Using tone information in Cantonese continuous speech recognition

Authors:
Tan Lee

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
View Profile

,
Wai Lau

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
View Profile

,
Y. W. Wong

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
View Profile

,
P. C. Ching

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
View Profile

ACM Transactions on Asian Language Information Processing Volume 1 Issue 1pp 83–102https://doi.org/10.1145/595576.595581

Published:01 March 2002Publication History

ACM Transactions on Asian Language Information Processing

Abstract

In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of Chinese. In particular, we focus on the Cantonese dialect, which is spoken by tens of millions of people in Southern China and Hong Kong. Cantonese is well known for its complicated tone system, which makes automatic tone recognition very difficult. This article describes an effective approach to explicit tone recognition of Cantonese in continuously spoken utterances. Tone feature vectors are derived, on a short-time basis, to characterize the syllable-wide patterns of F0 (fundamental frequency) and energy movements. A moving-window normalization technique is proposed to reduce the tone-irrelevant fluctuation of F0 and energy features. Hidden Markov models are employed for context-dependent acoustic modeling of different tones. A tone recognition accuracy of 66.4% has been achieved in the speaker-independent case. The recognized tone patterns are then utilized to assist Cantonese large-vocabulary continuous speech recognition (LVCSR) via a lattice expansion approach. Experimental results show that reliable tone information helps to improve the overall performance of LVCSR.

References

BAHL, L. R., DESOUZA, P. V., GOPALAKRISHNAN, P. S., NAHAMOO, D., AND PICHENY, M. A. 1991. Decision trees for phonological rules in continuous speech. In Proceedings of the 1991 International Conference on Acoustics, Speech and Signal Processing (Toronto, Ont., April), 185-188.]] Google Scholar
CAO, Y., DENG Y., ZHANG, H., HUANG, T. AND XU, B. 2000. Decision-tree based Mandarin tone model and its application to speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1759-1762.]]Google Scholar
CCDICT: Dictionary of Chinese Characters, Ver. 3.0. March 2000 http://www.chinalanguage.com/CCDICT/.]]Google Scholar
CHANG, E., ZHOU, J., DI, S., HUANG, C. AND LEE, K. -F. 2000. Large vocabulary Mandarin speech recognition with different approaches in modeling tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 983-986.]]Google Scholar
CHEN, C.-J., GOPINATH, R. A., MONKOWSKI, M. D., PICHENY, M. A., AND SHEN, K. 1997. New methods in continuous Mandarin speech recognition. In Proceedings of the 5th European Conference on Speech Communication and Technology (Rhodes, Greece, Sept.), 1543-1546.]]Google Scholar
CHEN, M. Y. 2000. Tone Sandhi: Patterns Across Chinese Dialects. Cambridge University Press, Cambridge, UK.]]Google Scholar
CHEN, S.-H. AND WANG, Y.-R. 1995. Tone recognition of continuous Mandarin speech based on neural networks. IEEE Trans. Speech Audio Process. 3, 2 (March 1995), 146-150.]]Google Scholar
CUCorpora: Cantonese Spoken Language Resources. 2001. http://dsp.ee.cuhk.edu.hk/speech/.]]Google Scholar
GAO, S., LEE, T., WONG, Y. W., XU, B., CHING, P. C., AND HUANG, T. 2000. Acoustic modeling for Chinese speech recognition: A comparative study of Mandarin and Cantonese. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1261-1264.]]Google Scholar
GAO, Y., HON, H.-W., LIN, Z., LOUDON, G., YOGANANTHAN, S., AND YUAN, B. 1995. Tangerine: A large-vocabulary Mandarin dictation system. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 77-80.]]Google Scholar
HASHIMOTO, O.-K. Y. 1972. Studies in Yue Dialects 1: Phonology of Cantonese, Cambridge University Press, Cambridge, UK.]]Google Scholar
HESS, W. J. 1983. Pitch Determination of Speech Signals: Algorithms and Devices, Springer Verlag, Berlin.]]Google Scholar
HOMBERT, J. -M. 1978. Consonant types, vowel quality and tone. In Tone: A Linguistic Survey. Fromkin, V. A., Eds. Academic Press, Ch. III, 77-112.]]Google Scholar
HUANG, H. AND SEIDE, F. 2000. Pitch tracking and tone features for Mandarin speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1523-1526.]]Google Scholar
LAU, W., LEE, T., WONG, Y. W. AND CHING, P. C. 2000. Incorporating tone information into Cantonese large-vocabulary continuous speech recognition. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 883-886.]]Google Scholar
LEE, T., CHING, P. C., CHAN, L. W., MAK, B., AND CHENG, Y. H. 1995. Tone recognition of isolated Cantonese syllables. IEEE Trans. Speech Audio Process. 3, 3 (May), 204-209.]]Google Scholar
LEE, T. AND CHING, P. C. 1999. Cantonese syllable recognition using neural networks. IEEE Trans. Speech Audio Process. 7, 4 (July), 466-472.]]Google Scholar
LIN, C.-H., WU, C.-H., TING, P.-Y. AND WANG, H. -M. 1996. Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units. Speech Commun. 18, 2 (April), 175-190.]] Google Scholar
LIU, F. -H., PICHENY, M., SRINIVASA, P., MONKOWSKI, M., AND CHEN, J. 1996. Speech recognition on Mandarin Call Home: A large-vocabulary, conversational and telephone speech corpus. In Proceedings of the 1996 International Conference on Acoustics, Speech and Signal Processing (Atlanta, GA, May), 157-160.]] Google Scholar
LIU, J. AND YU, T. 2000. New tone recognition methods for Chinese continuous speech. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 377-380.]]Google Scholar
LO, W. K., LEE, T., AND CHING, P. C. 1998. Development of Cantonese spoken language corpora for speech applications. In Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing (Singapore, Dec.), 102-107.]]Google Scholar
LINGUISTIC SOCIETY OF HONG KONG (LSHK). 1997. Hong Kong Jyut Ping Characters Table(<img src="pic-1.gif">), Linguistic Society of Hong Kong Press (<img src="pic-2.gif">).]]Google Scholar
LYU, R.-Y., CHIEN, L.-F., HWANG, S.-H., HSIEH, H. -Y., YANG, R. -C., BAI, B. -R., WENG, J. -C., YANG, Y. -J., LIN, S.-W., CHEN, K.-J., TSENG, C.-Y. AND LEE, L. -S. 1995. Golden Mandarin (III): A user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 57-60.]]Google Scholar
OHALA, J. J. 1978. Production of tones. In Tone: A Linguistic Survey, Fromkin, V. A., Ed. Academic Press, Ch. I, 5-50.]]Google Scholar
RABINER, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustics, Speech Signal Process. 25, 1, 24-33.]]Google Scholar
SEIDE, F. AND WANG, N. 2000. Two-stream modeling of Mandarin tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 867-870.]]Google Scholar
TALKIN, A. D. 1995. A robust algorithm for pitch tracking (RAPT). In Speech Coding and Synthesis, Kleijn, W. B. and Paliwal, K. K., Eds. Elsevier Science B.V., Amsterdam, 495-518.]]Google Scholar
WANG, C. AND SENEFF, S. 1998. A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition. In Proceedings of the 1998 International Conference on Spoken Language Processing (Sydney, Nov.), 695-698.]]Google Scholar
WANG, C. AND SENEFF, S. 2000. Improved tone recognition by normalizing for co-articulation and intonation effects. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 83-86.]]Google Scholar
WANG, H. -M., HO, T. -H., YANG, R. -C., SHEN, J. -L., BAI, B. -R., HONG, J. -C., CHEN, W. -P., YU, T. -L.. AND LEE, L.-S. 1997. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Trans. Speech Audio Process. 5, 2, 195-200.]] Google Scholar
WANG, Y.-R., SHIEH, J. -M., AND CHEN, S. -H. 1994. Tone recognition of continuous Mandarin speech based on hidden Markov model. Int. J. Pattern Recogn. Artif. Intell. 8, 1, 233-246.]]Google Scholar
WONG, Y. W., CHOW, K. F., LAU, W., LO, W. K., LEE, T., AND CHING, P. C. 1999. Acoustic modeling and language modeling for Cantonese LVCSR. In Proceedings of the 6th European Conference on Speech Communication and Technology (Budapest, Sept.), 1091-1094.]]Google Scholar
YANG, W.-J., LEE, J.-C., CHANG, Y.-C. AND WANG H. -C. 1988. Hidden Markov model for Mandarin lexical tone recognition. IEEE Trans. Acoustics, Speech Signal Process. 36, 7, 988-992.]]Google Scholar
YOUNG, S. J., ODELL, J. J. AND WOODLAND, P. C. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop Human Language Technology, Morgan Kaufmann, 307-312.]] Google Scholar
ZHANG, J.-S. AND HIROSE, K. 2000. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, June), 1419-1422.]]Google Scholar

Index Terms

Using tone information in Cantonese continuous speech recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

Analysis and modeling of F0 contours for cantonese text-to-speech

For the generation of highly natural synthetic speech, the control of prosody is of primary importance. The fundamental frequency (F0) is one of the most important components of speech prosody. This research investigates the variation of F0 in ...
Read More
Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as ...
Read More
Spoken language resources for Cantonese speech processing

This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 1, Issue 1
March 2002
102 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/595576
Issue’s Table of Contents

Copyright © 2002 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2002
Published in talip Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese dialects
F0 normalization
knowledge integration
speech recognition
tone recognition
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 1,102
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using tone information in Cantonese continuous speech recognition

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Analysis and modeling of F0 contours for cantonese text-to-speech

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Spoken language resources for Cantonese speech processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using tone information in Cantonese continuous speech recognition

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Analysis and modeling of F0 contours for cantonese text-to-speech

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Spoken language resources for Cantonese speech processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media