skip to main content
article

Using tone information in Cantonese continuous speech recognition

Authors Info & Claims
Published:01 March 2002Publication History
Skip Abstract Section

Abstract

In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of Chinese. In particular, we focus on the Cantonese dialect, which is spoken by tens of millions of people in Southern China and Hong Kong. Cantonese is well known for its complicated tone system, which makes automatic tone recognition very difficult. This article describes an effective approach to explicit tone recognition of Cantonese in continuously spoken utterances. Tone feature vectors are derived, on a short-time basis, to characterize the syllable-wide patterns of F0 (fundamental frequency) and energy movements. A moving-window normalization technique is proposed to reduce the tone-irrelevant fluctuation of F0 and energy features. Hidden Markov models are employed for context-dependent acoustic modeling of different tones. A tone recognition accuracy of 66.4% has been achieved in the speaker-independent case. The recognized tone patterns are then utilized to assist Cantonese large-vocabulary continuous speech recognition (LVCSR) via a lattice expansion approach. Experimental results show that reliable tone information helps to improve the overall performance of LVCSR.

References

  1. BAHL, L. R., DESOUZA, P. V., GOPALAKRISHNAN, P. S., NAHAMOO, D., AND PICHENY, M. A. 1991. Decision trees for phonological rules in continuous speech. In Proceedings of the 1991 International Conference on Acoustics, Speech and Signal Processing (Toronto, Ont., April), 185-188.]] Google ScholarGoogle Scholar
  2. CAO, Y., DENG Y., ZHANG, H., HUANG, T. AND XU, B. 2000. Decision-tree based Mandarin tone model and its application to speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1759-1762.]]Google ScholarGoogle Scholar
  3. CCDICT: Dictionary of Chinese Characters, Ver. 3.0. March 2000 http://www.chinalanguage.com/CCDICT/.]]Google ScholarGoogle Scholar
  4. CHANG, E., ZHOU, J., DI, S., HUANG, C. AND LEE, K. -F. 2000. Large vocabulary Mandarin speech recognition with different approaches in modeling tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 983-986.]]Google ScholarGoogle Scholar
  5. CHEN, C.-J., GOPINATH, R. A., MONKOWSKI, M. D., PICHENY, M. A., AND SHEN, K. 1997. New methods in continuous Mandarin speech recognition. In Proceedings of the 5th European Conference on Speech Communication and Technology (Rhodes, Greece, Sept.), 1543-1546.]]Google ScholarGoogle Scholar
  6. CHEN, M. Y. 2000. Tone Sandhi: Patterns Across Chinese Dialects. Cambridge University Press, Cambridge, UK.]]Google ScholarGoogle Scholar
  7. CHEN, S.-H. AND WANG, Y.-R. 1995. Tone recognition of continuous Mandarin speech based on neural networks. IEEE Trans. Speech Audio Process. 3, 2 (March 1995), 146-150.]]Google ScholarGoogle Scholar
  8. CUCorpora: Cantonese Spoken Language Resources. 2001. http://dsp.ee.cuhk.edu.hk/speech/.]]Google ScholarGoogle Scholar
  9. GAO, S., LEE, T., WONG, Y. W., XU, B., CHING, P. C., AND HUANG, T. 2000. Acoustic modeling for Chinese speech recognition: A comparative study of Mandarin and Cantonese. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1261-1264.]]Google ScholarGoogle Scholar
  10. GAO, Y., HON, H.-W., LIN, Z., LOUDON, G., YOGANANTHAN, S., AND YUAN, B. 1995. Tangerine: A large-vocabulary Mandarin dictation system. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 77-80.]]Google ScholarGoogle Scholar
  11. HASHIMOTO, O.-K. Y. 1972. Studies in Yue Dialects 1: Phonology of Cantonese, Cambridge University Press, Cambridge, UK.]]Google ScholarGoogle Scholar
  12. HESS, W. J. 1983. Pitch Determination of Speech Signals: Algorithms and Devices, Springer Verlag, Berlin.]]Google ScholarGoogle Scholar
  13. HOMBERT, J. -M. 1978. Consonant types, vowel quality and tone. In Tone: A Linguistic Survey. Fromkin, V. A., Eds. Academic Press, Ch. III, 77-112.]]Google ScholarGoogle Scholar
  14. HUANG, H. AND SEIDE, F. 2000. Pitch tracking and tone features for Mandarin speech recognition. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, Turkey, June), 1523-1526.]]Google ScholarGoogle Scholar
  15. LAU, W., LEE, T., WONG, Y. W. AND CHING, P. C. 2000. Incorporating tone information into Cantonese large-vocabulary continuous speech recognition. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 883-886.]]Google ScholarGoogle Scholar
  16. LEE, T., CHING, P. C., CHAN, L. W., MAK, B., AND CHENG, Y. H. 1995. Tone recognition of isolated Cantonese syllables. IEEE Trans. Speech Audio Process. 3, 3 (May), 204-209.]]Google ScholarGoogle Scholar
  17. LEE, T. AND CHING, P. C. 1999. Cantonese syllable recognition using neural networks. IEEE Trans. Speech Audio Process. 7, 4 (July), 466-472.]]Google ScholarGoogle Scholar
  18. LIN, C.-H., WU, C.-H., TING, P.-Y. AND WANG, H. -M. 1996. Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units. Speech Commun. 18, 2 (April), 175-190.]] Google ScholarGoogle Scholar
  19. LIU, F. -H., PICHENY, M., SRINIVASA, P., MONKOWSKI, M., AND CHEN, J. 1996. Speech recognition on Mandarin Call Home: A large-vocabulary, conversational and telephone speech corpus. In Proceedings of the 1996 International Conference on Acoustics, Speech and Signal Processing (Atlanta, GA, May), 157-160.]] Google ScholarGoogle Scholar
  20. LIU, J. AND YU, T. 2000. New tone recognition methods for Chinese continuous speech. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 377-380.]]Google ScholarGoogle Scholar
  21. LO, W. K., LEE, T., AND CHING, P. C. 1998. Development of Cantonese spoken language corpora for speech applications. In Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing (Singapore, Dec.), 102-107.]]Google ScholarGoogle Scholar
  22. LINGUISTIC SOCIETY OF HONG KONG (LSHK). 1997. Hong Kong Jyut Ping Characters Table(<img src="pic-1.gif">), Linguistic Society of Hong Kong Press (<img src="pic-2.gif">).]]Google ScholarGoogle Scholar
  23. LYU, R.-Y., CHIEN, L.-F., HWANG, S.-H., HSIEH, H. -Y., YANG, R. -C., BAI, B. -R., WENG, J. -C., YANG, Y. -J., LIN, S.-W., CHEN, K.-J., TSENG, C.-Y. AND LEE, L. -S. 1995. Golden Mandarin (III): A user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary. In Proceedings of the 1995 International Conference on Acoustics, Speech and Signal Processing (Detroit, MI, May), 57-60.]]Google ScholarGoogle Scholar
  24. OHALA, J. J. 1978. Production of tones. In Tone: A Linguistic Survey, Fromkin, V. A., Ed. Academic Press, Ch. I, 5-50.]]Google ScholarGoogle Scholar
  25. RABINER, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustics, Speech Signal Process. 25, 1, 24-33.]]Google ScholarGoogle Scholar
  26. SEIDE, F. AND WANG, N. 2000. Two-stream modeling of Mandarin tones. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 867-870.]]Google ScholarGoogle Scholar
  27. TALKIN, A. D. 1995. A robust algorithm for pitch tracking (RAPT). In Speech Coding and Synthesis, Kleijn, W. B. and Paliwal, K. K., Eds. Elsevier Science B.V., Amsterdam, 495-518.]]Google ScholarGoogle Scholar
  28. WANG, C. AND SENEFF, S. 1998. A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition. In Proceedings of the 1998 International Conference on Spoken Language Processing (Sydney, Nov.), 695-698.]]Google ScholarGoogle Scholar
  29. WANG, C. AND SENEFF, S. 2000. Improved tone recognition by normalizing for co-articulation and intonation effects. In Proceedings of the 2000 International Conference on Spoken Language Processing (Beijing, Oct.), 83-86.]]Google ScholarGoogle Scholar
  30. WANG, H. -M., HO, T. -H., YANG, R. -C., SHEN, J. -L., BAI, B. -R., HONG, J. -C., CHEN, W. -P., YU, T. -L.. AND LEE, L.-S. 1997. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Trans. Speech Audio Process. 5, 2, 195-200.]] Google ScholarGoogle Scholar
  31. WANG, Y.-R., SHIEH, J. -M., AND CHEN, S. -H. 1994. Tone recognition of continuous Mandarin speech based on hidden Markov model. Int. J. Pattern Recogn. Artif. Intell. 8, 1, 233-246.]]Google ScholarGoogle Scholar
  32. WONG, Y. W., CHOW, K. F., LAU, W., LO, W. K., LEE, T., AND CHING, P. C. 1999. Acoustic modeling and language modeling for Cantonese LVCSR. In Proceedings of the 6th European Conference on Speech Communication and Technology (Budapest, Sept.), 1091-1094.]]Google ScholarGoogle Scholar
  33. YANG, W.-J., LEE, J.-C., CHANG, Y.-C. AND WANG H. -C. 1988. Hidden Markov model for Mandarin lexical tone recognition. IEEE Trans. Acoustics, Speech Signal Process. 36, 7, 988-992.]]Google ScholarGoogle Scholar
  34. YOUNG, S. J., ODELL, J. J. AND WOODLAND, P. C. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop Human Language Technology, Morgan Kaufmann, 307-312.]] Google ScholarGoogle Scholar
  35. ZHANG, J.-S. AND HIROSE, K. 2000. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech. In Proceedings of the 2000 International Conference on Acoustics, Speech and Signal Processing (Istanbul, June), 1419-1422.]]Google ScholarGoogle Scholar

Index Terms

  1. Using tone information in Cantonese continuous speech recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader