| A practical web-based approach to generating topic hierarchy for text segments |
| Full text |
Pdf
(351 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management
table of contents
Washington, D.C., USA
SESSION: IR-2 (information retrieval): web information retrieval
table of contents
Pages: 127 - 136
Year of Publication: 2004
ISBN:1-58113-874-1
|
|
Authors
|
|
Shui-Lung Chuang
|
Institute of Information Science, Academia Sinica, Taiwan, R.O.C.
|
|
Lee-Feng Chien
|
Institute of Information Science, Academia Sinica, Taiwan, R.O.C.
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 108, Citation Count: 5
|
|
|
ABSTRACT
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , Robert L. Mercer, Word-sense disambiguation using statistical methods, Proceedings of the 29th annual meeting on Association for Computational Linguistics, p.264-270, June 18-21, 1991, Berkeley, California
[doi> 10.3115/981344.981378]
|
| |
4
|
C. Buckley, G. Salton, and J. Allan. Automatic retrieval with locality information using smart. In Proceedings of the First Text REtrieval Conference (TREC-1), pages 59--72, 1992.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
Eric Glover , David M. Pennock , Steve Lawrence , Robert Krovetz, Inferring hierarchical descriptions, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584876]
|
 |
10
|
|
| |
11
|
S. Johansson, E. Atwell, R. Garside, and G. Leech. THE TAGGED LOB CORPUS: Users' Manual, 1986.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
G. W. Milligan and M. C. Cooper. An examination of procedures for detecting the number of clusters in a data set. Psychometrika, 50:159--179, 1985.
|
| |
17
|
B. Mirkin. Mathematical Classification and Clustering. Kluwer, 1996.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
M. Suan N. M. Semi-automatic taxonomy for efficient information searching. In Proceedings of the 2nd International Conference on Information Technology for Application, 2004.
|
| |
24
|
D. Sullivan. Document warehousing & content management: Poor search quality in your enterprise information portal? DM Review, January 2002.
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
Hua-Jun Zeng , Qi-Cai He , Zheng Chen , Wei-Ying Ma , Jinwen Ma, Learning to cluster web search results, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009030]
|
|