article

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

Authors:
Wai-Kit Lo

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

,
Helen Meng

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

,
P. C. Ching

The Chinese University of Hong Kong

The Chinese University of Hong Kong
View Profile

Authors Info & Claims

ACM Transactions on Asian Language Information Processing Volume 2 Issue 1pp 1–26https://doi.org/10.1145/964161.964162

Published:01 March 2003Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Cross-language spoken document retrieval (CL-SDR) is the technology that facilitates automatic retrieval of relevant information from a collection of spoken documents in a language that is different from that used in the queries. Information sources that are in different languages can then be retrieved automatically with CL-SDR, and the number of searchable information sources will increase significantly. The HMM-based retrieval model is a probabilistic formulation for the retrieval problem. Extensions to this retrieval model can be made by taking advantage of its probabilistic nature. Specifically, we have incorporated the translation component to make it possible to perform cross-language information retrieval (CLIR). In addition, this HMM-based CLIR retrieval model is also extended for retrieval at subword scales.In this work the extended HMM-based retrieval model has been applied to an English-Mandarin CL-SDR task, which is to search the Mandarin spoken document collection with English queries at word and subword scales. Retrieval results obtained from these indexing scales are then fused for multi-scale CL-SDR. Experimental results demonstrate that improvement in CL-SDR retrieval performance can be achieved by fusion of word and subword scales.

References

BAI, B. R., CHEN, B., AND WANG, H. M. 2000. Syllable-based Chinese text/spoken document retrieval using text/speech queries. J. Pattern Recogn. Artif. Intell. 4, 603--616.Google Scholar
BALLESTEROS, L. AND CROFT, W. B. 1997. Phrasal translation and query expansion techniques for cross-language information retireval. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 84--91. Google Scholar
BARTELL, B. T., COTTRELL, G. W., AND BELEW, R. K. 1994. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 173--181. Google Scholar
BBN. 2000. Identifinder(TM). http://www.bbn.com/speech/identifinder.html.Google Scholar
BELKIN, N. J., KANTOR, P., FOX, E. A., AND SHAW, J. A. 1995. Combining the evidence of multiple query representations for information retrieval. Inf. Process. Manage. 31, 431--448. Google Scholar
BERGER, A. AND LAFFERTY, J. 1999a. Information retrieval as statistical translation. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 222--229. Google Scholar
BERGER, A. AND LAFFERTY, J. 1999b. The Weaver system for document retrieval. In Proceedings of the 8th Text REtrieval Conference. NIST, 163--174.Google Scholar
BRASCHLER, M., KRAUSE, J., PETERS, C., AND SCHAUBLE, P. 1998. Cross-language information retrieval (CLIR) track overview. In Proceedings of the 7th Text REtrieval Conference. NIST.Google Scholar
CHEN, A. 2000. Phrasal translation for English-Chinese cross language information retrieval. In Proceedings of Workshop on English-Chinese Cross Language Information Retrieval at the 2000 International Conference on Chinese Language Computing. 195--202.Google Scholar
CHEN, A. 2001. Berkeley at NTCIR-2: Chinese, Japanese, and English IR experiments. In Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization. 32--39.Google Scholar
CHEN, A., JIANG, H., AND GEY, F. 2000. English-Chinese cross-language IR using bilingual dictionaries. In Proceedings of the 9th Text REtrieval Conference. NIST.Google Scholar
CHEN, B., WANG, H. M., AND LEE, L. S. 2000. Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing. 1771--1774.Google Scholar
CHEN, B., WANG, H. M., ANDLEE, L. S. 2001. An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. In Proceedings of the 7th European Conference on Speech Communication and Technology. Vol. 2. 1045--1048.Google Scholar
FOX, E. A. AND SHAW, J. 1993. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference. NIST, 243--252.Google Scholar
GAO, J., NIE, J. Y., XUN, E., ZHANG, J., ZHOU, M., AND HUANG, C. 2001. Improving query translation for cross-language information retrieval using statistical models. In Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 96--104. Google Scholar
GAO, J., NIE, J. Y., ZHANG, J., AND XUN, E. 2000. TREC-9 CLIR experiments at MSRCN. In Proceedings of the 9th Text REtrieval Conference. NIST, 343--353.Google Scholar
GAROFOLO, J. S., AUZANNE, C. G. P., AND VOORHEES, E. M. 1999. The TREC spoken document retrieval track: A success story. In Proceedings of the 8th Text REtrieval Conference. NIST, 107--129.Google Scholar
GAROFOLO, J. S., VOORHEES, E. M., AUZANNE, C. G. P., STANFORD, V. M., AND LUND, B. A. 1998. 1998 TREC-7 spoken document retrieval track overview and results. In Proceedings of the 7th Text REtrieval Conference. NIST, 79--89.Google Scholar
GAROFOLO, J. S., VOORHEES, E. M., STANFORD, V. M., AND JONES, K. S. 1997. TREC-6 1997 spoken document retrieval track overview and results. In Proceedings of the 6th Text REtrieval Conference. NIST, 83--91.Google Scholar
GREFENSTETTE, G. 1998. Cross-Language Information Retrieval. Kluwer Academic, Boston MA. Google Scholar
HAUPTMANN, A. G., SCHEYTT, P., WACTLAR, H. D., AND KENNEDY, P. E. 1998. Multi-lingual Informedia: A demonstration of speech recognition and information retrieval across multiple languages. In Proceedings of 1998 Broadcast News Transcription and Understanding Workshop.Google Scholar
HIEMSTRA, D. 2000. Using language models for information retrieval. Ph.D. thesis, Centre for Telematics and Information Technology, University of Twente,.Google Scholar
HUANG, S., BIAN, X., WU, G., AND MCLEMORE, C. 1997. CALLHOME Mandarin Chinese lexicon. Tech. Rep., Linguistic Data Consortium, {online} http://www.ldc.upenn.edu/Catalog/LDC96L15.html.Google Scholar
HULL, D. A. AND GREFENSTETTE, G. 1996. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 49--57. Google Scholar
KWOK, K. L. 1997. Comparing representations in Chinese information retrieval. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 34--41. Google Scholar
KWOK, K. L. 1999. English-Chinese cross-language retrieval based on a translation package. In Proceedings of the Workshop of Machine Translation for Cross Language Information Retrieval, Machines Translation Summit VII.Google Scholar
LDC. 2000. Project topic detection and tracking phase two (TDT-2). Linguistic Data Consortium. http://www.ldc.upenn.edu/Projects/TDT2.Google Scholar
LEVOW, G. A. AND OARD, D. W. 2000. Translingual topic tracking: Applying lessons from the MEI project. In Proceedings of the 2000 Topic Detection and Tracking Workshop.Google Scholar
LO, W. K., SCHONE, P., AND MENG, H. M. 2001. Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system. In Proceedings of the 7th European Conference on Speech Communication and Technology. Vol. 2. 1303--1306.Google Scholar
MAKHOUL, J., KUBALA, F., LEEK, T., LIU, D., NGUYEN, L., SCHWARTZ, R., AND SRIVASTAVA, A. 2000. Speech and language technologies for audio indexing and retrieval. Proc. IEEE 88, 1338--1353.Google Scholar
MATEEV, B., MUNTEANU, E., SHERIDAN, P., WECHSLER, M., AND SCHAUBLE, P. 1997. ETH TREC-6: Routing, Chinese, cross-language and spoken document retrieval. In Proceedings of the 6th Text REtrieval Conference. NIST, 623--636.Google Scholar
MENG, H. M., CHEN, B., GRAMS, E., KHUDANPUR, S., LO, W. K., LEVOW, G. A., OARD, D., SCHONE, P., TANG, K., WANG, H. M., AND WANG, J. Q. 2000. Mandarin-English information (MEI): Investigating translingual speech retrieval. Tech. Rep., Johns Hopkins Univ., Baltimore, MD. Final report: {online} http//www.clsp.jhu.edu/ws2000/final_reports/mei.Google Scholar
MENG, H. M., CHEN, B., KHUDANPUR, S., LEVOW, G. A., LO, W. K., OARD, D., SCHONE, P., TANG, K., WANG, H. M., AND WANG, J. Q. 2001. Mandarin-English information (MEI): Investigating translingual speech retrieval. In Proceedings of the 2001 Human Language Technology Conference. Google Scholar
MENG, H. M., LO, W. K., LI, Y. C., AND CHING, P. C. 2000. Multi-scale audio indexing for Chinese spoken document retrieval. In Proceedings of the 6th International Conference on Spoken Language Processing. Vol. IV. 101--104.Google Scholar
MILLER, D. R. H., LEEK, T., AND SCHWARTZ, R. M. 1998. BBN at TREC7: Using hidden Markov models for information retrieval. In Proceedings of the 7th Text REtrieval Conference. NIST, 133--142.Google Scholar
NG, K. 2000. Subword-based approaches for spoken document retrieval. Speech Commun. 32, 157--186. Google Scholar
NIE, J. Y. AND REN, F. 1999. Chinese information retrieval: using characters or words? Inf. Process. Manage. 35, 443--462.Google Scholar
NIE, J. Y., SIMARD, M., ISABELLE, P., AND DURAND, R. 1999. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts. In Proceedings of the 22th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 74--81. Google Scholar
PIRKOLA, A., HEDLUND, T., AND KESKUSTALO, H. 2001. Dictionary-based cross-language information retrieval: problems, methods and research findings. Inf. Retrieval 4, 209--230. Google Scholar
RESNIK, P., OARD, D. W., AND LEVOW, G. A. 2001. Improved cross-language retrieval using backoff translation. In Proceedings of the 2001 Human Language Technology Conference. Google Scholar
SALTON, G. AND MCGILL, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York. Google Scholar
SCHAUBLE, P. AND SHERIDAN, P. 1997. Cross-language information retrieval (CLIR) track overview. In Proceedings of the 6th Text REtrieval Conference. NIST, 31--44.Google Scholar
SHERIDAN, P. AND BALLERINI, J. P. 1996. Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 58--65. Google Scholar
SHERIDAN, P., BRASCHLER, M., AND SCHAUBLE, P. 1997. Cross-language information retrieval in a multi-lingual legal domain. In Proceedings of the 1st European Conference on Research and Advanced Technology for Digital Libraries. 253--268. Google Scholar
SHERIDAN, P., WECHSLER, M., AND SCHAUBLE, P. 1997. Cross language speech retrieval: Establishing a baseline performance. In Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 99--108. Google Scholar
SONG, F. AND CROFT, W. B. 1999a. A general language model for information retrieval. In Proceedings of the 8th International Conference on Information and Knowledge Management. 316--321. Google Scholar
SONG, F. AND CROFT, W. B. 1999b. A general language model for information retrieval. In Proceedings of the 22th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 279--280. Google Scholar
VOGT, C. C. AND COTTRELL, G. W. 1999. Fusion via a linear combination of scores. Inf. Retrieval 1, 151--173. Google Scholar
WANG, H. M. 2000. Experiments in syllable-based retrieval of broadcast news speech inMandarin Chinese. Speech Commun. 32, 49--60. Google Scholar
WANG, H. M. AND CHEN, B. 2001. Comparison of word and subword indexing techniques for Mandarin Chinese spoken document retrieval. In Proceedings of the 2nd Pacific-Rim Conference on Multimedia. Google Scholar
WANG, H. M., MENG, H. M., SCHONE, P., CHEN, B., AND LO, W. K. 2001. Multi-scale audio indexing for translingual spoken document retrieval. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. 605--608.Google Scholar
WOODLAND, P. C., JOHNSON, S. E., JOURLIN, P., AND JONES, K. S. 2000. Effect of out of vocabulary words in spoken document retrieval. In Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 372--374. Google Scholar
XU, J. AND WEISCHEDEL, R. 2000. TREC-9 cross-lingual retrieval at BBN. In Proceedings of the 9th Text REtrieval Conference. NIST, 106--115.Google Scholar
ZHAI, C. AND LAFFERTY, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 334--342. Google Scholar
ZHAN, P. 1999. Dragon systems' 1998 broadcast news transcription system for Mandarin. In Proceedings of the DARPA Broadcast News Workshop '99.Google Scholar

Index Terms

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion
1. Information systems
  1. Information systems applications

Recommendations

Content-based language models for spoken document retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to ...
Read More
Information fusion for monolingual and cross-language spoken document retrieval
Read More
New Approaches to Spoken Document Retrieval
Abstract
This paper presents four novel techniques for open-vocabulary spoken document retrieval: a method to detect slots that possibly contain a query feature; a method to estimate occurrence probabilities; a technique that we call collection-wide ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 2, Issue 1
March 2003
77 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/964161
Issue’s Table of Contents

Copyright © 2003 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2003
Published in talip Volume 2, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cross-language information retrieval
Multi-scale data fusion
Spoken document retrieval
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 891
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Content-based language models for spoken document retrieval

Information fusion for monolingual and cross-language spoken document retrieval

New Approaches to Spoken Document Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Content-based language models for spoken document retrieval

Information fusion for monolingual and cross-language spoken document retrieval

New Approaches to Spoken Document Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media