ACM Home Page
Please provide us with feedback. Feedback
PM-based indexing for Chinese text retrieval
Full text PdfPdf (410 KB)
Source International Workshop on Information Retrieval with Asia Languages archive
Proceedings of the fifth international workshop on on Information retrieval with Asian languages table of contents
Hong Kong, China
Pages: 55 - 59  
Year of Publication: 2000
ISBN:1-58113-300-6
Authors
Du Lin  Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China
Zhang Yibo  Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China
Sun Le  Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China
Sun Yufang  Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China
Han Jie  Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGLINK: Hypertext, Hypermedia, and Web
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM Hong Kong Chapter : ACM Hong Kong Chapter Executive Committee
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 12,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/355214.355222
What is a DOI?

ABSTRACT

This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented words. For the word-based indexing, the out-of-vocabulary words, such as the proper nouns, or domain terminology, are usually mis-segmented due to the limited vocabulary coverage of the segmentation dictionaries and thus impair the query precision. In this paper, several indexing and ranking methods, including the novel PM-based ranking, were tested so as to compare their efficiency in dealing with the new words in Chinese text retrieval. The experiment has shown that the query precision of the PM + word method is 10% higher than the word indexing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Leong, M. K., Zhou, H., Preliminary qualitative analysis of segmented vs bigram indexing in Chinese, In Text Retrieval Conference (TREC-6), NIST, Gaithersburg, Maryland, 1997, pp. 551-558.
 
3
He, J., Xu, J., Berkeley Chinese information retrieval at TREC-5: technical report, In Text Retrieval Conference (TREC-5). NIST, Gaithersburg, Maryland, 1996, pp. 191-196.
 
4
Tsang, T., Luk, R., Wong, K. F., A hybrid indexing strategy using words and bigrams, IRAL '99, Taibei, 1999, http://www.iis.sinica.edu.tw/~IRAL99/.
 
5
Sproat, R. and Shih, C., A statistical method for finding word boundaries in Chinese text, Computer Proceeding of Chinese and Oriental languages, 4:4,1990, pp. 336-351.
6
 
7
Fagan, J., Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods, Ph.D. Thesis, Cornell University, 1987.
 
8
Liu, Y., Modem Chinese word segmentation specification and methodology. Tshinghua University Press, 1994.
 
9
Sun, M., Huang, C., Identifying Chinese names in unrestricted texts, Communications of COLIPS, Vol. 4, No. 2, 1994, pp. 113-122.
 
10
Liu, K. Y., The evaluation report of Chinese word segmentation, Applied Linguistics fin Chinese), Vol. 21, No. 1, 1997, pp. 101-106.
 
11
 
12


Collaborative Colleagues:
Du Lin: colleagues
Zhang Yibo: colleagues
Sun Le: colleagues
Sun Yufang: colleagues
Han Jie: colleagues

Peer to Peer - Readers of this Article have also read: