Abstract
In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of Chinese. In particular, we focus on the Cantonese dialect, which is spoken by tens of millions of people in Southern China and Hong Kong. Cantonese is well known for its complicated tone system, which makes automatic tone recognition very difficult. This article describes an effective approach to explicit tone recognition of Cantonese in continuously spoken utterances. Tone feature vectors are derived, on a short-time basis, to characterize the syllable-wide patterns of F0 (fundamental frequency) and energy movements. A moving-window normalization technique is proposed to reduce the tone-irrelevant fluctuation of F0 and energy features. Hidden Markov models are employed for context-dependent acoustic modeling of different tones. A tone recognition accuracy of 66.4% has been achieved in the speaker-independent case. The recognized tone patterns are then utilized to assist Cantonese large-vocabulary continuous speech recognition (LVCSR) via a lattice expansion approach. Experimental results show that reliable tone information helps to improve the overall performance of LVCSR.
- BAHL, L. R., DESOUZA, P. V., GOPALAKRISHNAN, P. S., NAHAMOO, D., AND PICHENY, M. A. 1991. Decision trees for phonological rules in continuous speech. In Proceedings of the 1991 International Conference on Acoustics, Speech and Signal Processing (Toronto, Ont., April), 185-188.]] Google Scholar
- CAO, Y., DENG Y., ZHANG, H., HUANG, T. AND XU, B. 2000. Decision-tree based Mandarin tone model and its application to speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1759-1762.]]Google Scholar
- CCDICT: Dictionary of Chinese Characters, Ver. 3.0. March 2000 http://www.chinalanguage.com/CCDICT/.]]Google Scholar
- CHANG, E., ZHOU, J., DI, S., HUANG, C. AND LEE, K. -F. 2000. Large vocabulary Mandarin speech recognition with different approaches in modeling tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 983-986.]]Google Scholar
- CHEN, C.-J., GOPINATH, R. A., MONKOWSKI, M. D., PICHENY, M. A., AND SHEN, K. 1997. New methods in continuous Mandarin speech recognition. In Proceedings of the 5th European Conference on Speech Communication and Technology (Rhodes, Greece, Sept.), 1543-1546.]]Google Scholar
- CHEN, M. Y. 2000. Tone Sandhi: Patterns Across Chinese Dialects. Cambridge University Press, Cambridge, UK.]]Google Scholar
- CHEN, S.-H. AND WANG, Y.-R. 1995. Tone recognition of continuous Mandarin speech based on neural networks. IEEE Trans. Speech Audio Process. 3, 2 (March 1995), 146-150.]]Google Scholar
- CUCorpora: Cantonese Spoken Language Resources. 2001. http://dsp.ee.cuhk.edu.hk/speech/.]]Google Scholar
- GAO, S., LEE, T., WONG, Y. W., XU, B., CHING, P. C., AND HUANG, T. 2000. Acoustic modeling for Chinese speech recognition: A comparative study of Mandarin and Cantonese. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1261-1264.]]Google Scholar
- GAO, Y., HON, H.-W., LIN, Z., LOUDON, G., YOGANANTHAN, S., AND YUAN, B. 1995. Tangerine: A large-vocabulary Mandarin dictation system. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 77-80.]]Google Scholar
- HASHIMOTO, O.-K. Y. 1972. Studies in Yue Dialects 1: Phonology of Cantonese, Cambridge University Press, Cambridge, UK.]]Google Scholar
- HESS, W. J. 1983. Pitch Determination of Speech Signals: Algorithms and Devices, Springer Verlag, Berlin.]]Google Scholar
- HOMBERT, J. -M. 1978. Consonant types, vowel quality and tone. In Tone: A Linguistic Survey. Fromkin, V. A., Eds. Academic Press, Ch. III, 77-112.]]Google Scholar
- HUANG, H. AND SEIDE, F. 2000. Pitch tracking and tone features for Mandarin speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1523-1526.]]Google Scholar
- LAU, W., LEE, T., WONG, Y. W. AND CHING, P. C. 2000. Incorporating tone information into Cantonese large-vocabulary continuous speech recognition. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 883-886.]]Google Scholar
- LEE, T., CHING, P. C., CHAN, L. W., MAK, B., AND CHENG, Y. H. 1995. Tone recognition of isolated Cantonese syllables. IEEE Trans. Speech Audio Process. 3, 3 (May), 204-209.]]Google Scholar
- LEE, T. AND CHING, P. C. 1999. Cantonese syllable recognition using neural networks. IEEE Trans. Speech Audio Process. 7, 4 (July), 466-472.]]Google Scholar
- LIN, C.-H., WU, C.-H., TING, P.-Y. AND WANG, H. -M. 1996. Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units. Speech Commun. 18, 2 (April), 175-190.]] Google Scholar
- LIU, F. -H., PICHENY, M., SRINIVASA, P., MONKOWSKI, M., AND CHEN, J. 1996. Speech recognition on Mandarin Call Home: A large-vocabulary, conversational and telephone speech corpus. In Proceedings of the 1996 International Conference on Acoustics, Speech and Signal Processing (Atlanta, GA, May), 157-160.]] Google Scholar
- LIU, J. AND YU, T. 2000. New tone recognition methods for Chinese continuous speech. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 377-380.]]Google Scholar
- LO, W. K., LEE, T., AND CHING, P. C. 1998. Development of Cantonese spoken language corpora for speech applications. In Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing (Singapore, Dec.), 102-107.]]Google Scholar
- LINGUISTIC SOCIETY OF HONG KONG (LSHK). 1997. Hong Kong Jyut Ping Characters Table(<img src="pic-1.gif">), Linguistic Society of Hong Kong Press (<img src="pic-2.gif">).]]Google Scholar
- LYU, R.-Y., CHIEN, L.-F., HWANG, S.-H., HSIEH, H. -Y., YANG, R. -C., BAI, B. -R., WENG, J. -C., YANG, Y. -J., LIN, S.-W., CHEN, K.-J., TSENG, C.-Y. AND LEE, L. -S. 1995. Golden Mandarin (III): A user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 57-60.]]Google Scholar
- OHALA, J. J. 1978. Production of tones. In Tone: A Linguistic Survey, Fromkin, V. A., Ed. Academic Press, Ch. I, 5-50.]]Google Scholar
- RABINER, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustics, Speech Signal Process. 25, 1, 24-33.]]Google Scholar
- SEIDE, F. AND WANG, N. 2000. Two-stream modeling of Mandarin tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 867-870.]]Google Scholar
- TALKIN, A. D. 1995. A robust algorithm for pitch tracking (RAPT). In Speech Coding and Synthesis, Kleijn, W. B. and Paliwal, K. K., Eds. Elsevier Science B.V., Amsterdam, 495-518.]]Google Scholar
- WANG, C. AND SENEFF, S. 1998. A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition. In Proceedings of the 1998 International Conference on Spoken Language Processing (Sydney, Nov.), 695-698.]]Google Scholar
- WANG, C. AND SENEFF, S. 2000. Improved tone recognition by normalizing for co-articulation and intonation effects. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 83-86.]]Google Scholar
- WANG, H. -M., HO, T. -H., YANG, R. -C., SHEN, J. -L., BAI, B. -R., HONG, J. -C., CHEN, W. -P., YU, T. -L.. AND LEE, L.-S. 1997. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Trans. Speech Audio Process. 5, 2, 195-200.]] Google Scholar
- WANG, Y.-R., SHIEH, J. -M., AND CHEN, S. -H. 1994. Tone recognition of continuous Mandarin speech based on hidden Markov model. Int. J. Pattern Recogn. Artif. Intell. 8, 1, 233-246.]]Google Scholar
- WONG, Y. W., CHOW, K. F., LAU, W., LO, W. K., LEE, T., AND CHING, P. C. 1999. Acoustic modeling and language modeling for Cantonese LVCSR. In Proceedings of the 6th European Conference on Speech Communication and Technology (Budapest, Sept.), 1091-1094.]]Google Scholar
- YANG, W.-J., LEE, J.-C., CHANG, Y.-C. AND WANG H. -C. 1988. Hidden Markov model for Mandarin lexical tone recognition. IEEE Trans. Acoustics, Speech Signal Process. 36, 7, 988-992.]]Google Scholar
- YOUNG, S. J., ODELL, J. J. AND WOODLAND, P. C. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop Human Language Technology, Morgan Kaufmann, 307-312.]] Google Scholar
- ZHANG, J.-S. AND HIROSE, K. 2000. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, June), 1419-1422.]]Google Scholar
Index Terms
- Using tone information in Cantonese continuous speech recognition
Recommendations
Analysis and modeling of F0 contours for cantonese text-to-speech
For the generation of highly natural synthetic speech, the control of prosody is of primary importance. The fundamental frequency (F0) is one of the most important components of speech prosody. This research investigates the variation of F0 in ...
Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network
A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as ...
Spoken language resources for Cantonese speech processing
This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve ...
Comments