|
ABSTRACT
Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Web's convenience and its richness of information. In many cases, learning from the Web may even be essential because in our fast changing world, emerging topics appear constantly and rapidly. There is often not enough time for someone to write a book on such topics. To learn such emerging topics, one can resort to research papers. However, research papers are often hard to understand by non-researchers, and few research papers cover every aspect of the topic. In contrast, many Web pages often contain intuitive descriptions of the topic. To find such Web pages, one typically uses a search engine. However, current search techniques are not designed for in-depth learning. Top ranking pages from a search engine may not contain any description of the topic. Even if they do, the description is usually incomplete since it is unlikely that the owner of the page has good knowledge of every aspect of the topic. In this paper, we attempt a novel and challenging task, mining topic-specific knowledge on the Web. Our goal is to help people learn in-depth knowledge of a topic systematically on the Web. The proposed techniques first identify those sub-topics or salient concepts of the topic, and then find and organize those informative pages, containing definitions and descriptions of the topic and sub-topics, just like those in a book. Experimental results using 28 topics show that the proposed techniques are highly effective.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
AskJeeves, Inc., AskJeeves Question-Answering Search Engine, http://www.ask.com.
|
 |
5
|
|
| |
6
|
Bennett, N.A., He, Q., Powell, K., Schatz, B.R.: Extracting noun phrases for all of MEDLINE, In Proc. American Medical Informatics Assoc., 1999.
|
| |
7
|
|
 |
8
|
S. Ceri , S. Comai , E. Damiani , P. Fraternali , L. Tanca, Complex queries in XML-GL, Proceedings of the 2000 ACM symposium on Applied computing, p.888-893, March 2000, Como, Italy
[doi> 10.1145/338407.338677]
|
| |
9
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
10
|
|
| |
11
|
Cooper, R.J. & Rüüger, S. M.: A simple question answering system, In Proc. of TREC 9, 2000.
|
| |
12
|
Daille, B.: Study and implementation of combined techniques for automatic extraction of terminology, In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, 1996
|
| |
13
|
|
 |
14
|
Ronen Feldman , Yair Liberzon , Binyamin Rosenfeld , Jonathan Schler , Jonathan Stoppi, A framework for specifying explicit bias for revision of approximate information extraction rules, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.189-197, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347125]
|
| |
15
|
|
| |
16
|
|
| |
17
|
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V. & Morarescu, P. : FALCON: Boosting knowledge for answering engines, In Proc. of TREC 9, 2000.
|
| |
18
|
Katz, B.: From sentence parsing to information access on the WWW, In AAAI Spring Symposium on Natural Language Processing for the WWW, 1997 http://www.ai.mit.edu/projects/infolab/ailab.html
|
| |
19
|
Klavans, J. L. & Muresan, S.: DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from on-line text, In proc. of American Medical Informatics Assoc., 2000.
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Lawrence, S.: Context in Web Search, In IEEE Data Engineering Bulletin 23(3): 25--32, 2000.
|
| |
24
|
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining, In Proc. of KDD-98, 1998.
|
| |
25
|
|
| |
26
|
Mendelzon, A., Mihaila, G. & Milo, T.: Querying the World Wide Web, In Journal of Digital Libraries 1(1): 68--88, 1997.
|
| |
27
|
|
| |
28
|
Page, L., Brin, S., Motwani, R. & Winograd, T. : The PageRank citation ranking: Bringing order to the Web, In Stanford CS Technical Report, 1998.
|
| |
29
|
Porter, M.F. : An algorithm for suffix stripping, Program 14(3):130--137,1980 http://www.tartarus.org/~martin/PorterStemmer/
|
| |
30
|
|
| |
31
|
Smadja, F.: Retrieving collocations from text: Xtract, In Using Large Corpora. London: MIT Press pp143--177, 1994
|
| |
32
|
|
| |
33
|
Voutilainen, A. : NPtool: A detector of English noun phrase, In Proc. of Workshop on Very Large Corpora, 1993.
|
CITED BY 14
|
Hua-Jun Zeng , Qi-Cai He , Zheng Chen , Wei-Ying Ma , Jinwen Ma, Learning to cluster web search results, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
Mike Perkowitz , Matthai Philipose , Kenneth Fishkin , Donald J. Patterson, Mining models of human activities from the web, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
Muyuan Wang , Zhiwei Li , Lie Lu , Wei-Ying Ma , Naiyao Zhang, Web object indexing using domain knowledge, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
Hang Li , Yunbo Cao , Jun Xu , Yunhua Hu , Shenjie Li , Dmitriy Meyerzon, A new approach to intranet search based on information extraction, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|