ACM Home Page
Please provide us with feedback. Feedback
Mining topic-specific concepts and definitions on the web
Full text PdfPdf (246 KB)
Source International World Wide Web Conference archive
Proceedings of the 12th international conference on World Wide Web table of contents
Budapest, Hungary
SESSION: Writing the web table of contents
Pages: 251 - 260  
Year of Publication: 2003
ISBN:1-58113-680-3
Authors
Bing Liu  University of Illinois at Chicago, Chicago, IL
Chee Wee Chin  National University of Singapore, Singapore
Hwee Tou Ng  National University of Singapore, Singapore
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 277,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775152.775188
What is a DOI?

ABSTRACT

Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Web's convenience and its richness of information. In many cases, learning from the Web may even be essential because in our fast changing world, emerging topics appear constantly and rapidly. There is often not enough time for someone to write a book on such topics. To learn such emerging topics, one can resort to research papers. However, research papers are often hard to understand by non-researchers, and few research papers cover every aspect of the topic. In contrast, many Web pages often contain intuitive descriptions of the topic. To find such Web pages, one typically uses a search engine. However, current search techniques are not designed for in-depth learning. Top ranking pages from a search engine may not contain any description of the topic. Even if they do, the description is usually incomplete since it is unlikely that the owner of the page has good knowledge of every aspect of the topic. In this paper, we attempt a novel and challenging task, mining topic-specific knowledge on the Web. Our goal is to help people learn in-depth knowledge of a topic systematically on the Web. The proposed techniques first identify those sub-topics or salient concepts of the topic, and then find and organize those informative pages, containing definitions and descriptions of the topic and sub-topics, just like those in a book. Experimental results using 28 topics show that the proposed techniques are highly effective.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
AskJeeves, Inc., AskJeeves Question-Answering Search Engine, http://www.ask.com.
5
 
6
Bennett, N.A., He, Q., Powell, K., Schatz, B.R.: Extracting noun phrases for all of MEDLINE, In Proc. American Medical Informatics Assoc., 1999.
 
7
8
 
9
 
10
 
11
Cooper, R.J. & Rüüger, S. M.: A simple question answering system, In Proc. of TREC 9, 2000.
 
12
Daille, B.: Study and implementation of combined techniques for automatic extraction of terminology, In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, 1996
 
13
14
 
15
 
16
 
17
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V. & Morarescu, P. : FALCON: Boosting knowledge for answering engines, In Proc. of TREC 9, 2000.
 
18
Katz, B.: From sentence parsing to information access on the WWW, In AAAI Spring Symposium on Natural Language Processing for the WWW, 1997 http://www.ai.mit.edu/projects/infolab/ailab.html
 
19
Klavans, J. L. & Muresan, S.: DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from on-line text, In proc. of American Medical Informatics Assoc., 2000.
 
20
 
21
22
 
23
Lawrence, S.: Context in Web Search, In IEEE Data Engineering Bulletin 23(3): 25--32, 2000.
 
24
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining, In Proc. of KDD-98, 1998.
 
25
 
26
Mendelzon, A., Mihaila, G. & Milo, T.: Querying the World Wide Web, In Journal of Digital Libraries 1(1): 68--88, 1997.
 
27
 
28
Page, L., Brin, S., Motwani, R. & Winograd, T. : The PageRank citation ranking: Bringing order to the Web, In Stanford CS Technical Report, 1998.
 
29
Porter, M.F. : An algorithm for suffix stripping, Program 14(3):130--137,1980 http://www.tartarus.org/~martin/PorterStemmer/
 
30
 
31
Smadja, F.: Retrieving collocations from text: Xtract, In Using Large Corpora. London: MIT Press pp143--177, 1994
 
32
 
33
Voutilainen, A. : NPtool: A detector of English noun phrase, In Proc. of Workshop on Very Large Corpora, 1993.

CITED BY  14
 
 
 
 

Collaborative Colleagues:
Bing Liu: colleagues
Chee Wee Chin: colleagues
Hwee Tou Ng: colleagues

Peer to Peer - Readers of this Article have also read: