skip to main content
10.1145/1367497.1367675acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Representing a web page as sets of named entities of multiple types: a model and some preliminary applications

Published: 21 April 2008 Publication History

Abstract

As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named entities of multiple types. Specifically, four types of named entities are extracted, namely person, geographic location, organization, and time. Moreover, the relations among these entities are also extracted, weighted, classified and marked by labels. On top of this model, some interesting applications are demonstrated. In particular, we introduce a notion of person-activity, which contains four different elements: person, location, time and activity. With this notion and based on a reasonably large set of web pages, we are able to show how one person's activities can be attributed by time and location, which gives a good idea of the mobility of the person under question.

References

[1]
Conglei Yao, Nan Di. Technique Report: Mining the whole set of person names from the Chinese Web. http://net.pku.edu.cn/~ycl/wdtr.pdf.
[2]
Yu, S., Cai, D., Wen, J.-R. and Ma, W.-Y., Improving Pseudo-Relevance Feedback in Web Information retrieval Using Web Page Segmentation, In Proceedings of WWW'03, pages 11--18.
[3]
Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relations among named entities from large corpora. In Proceedings of ACL'04, pages 415--422.
[4]
Jinxiu Chen, Donghong Ji, Chew L. Tan, and Zhengyu Niu. Relation extraction using label propagation based semi-supervised learning. In Proceedings of ACL' 06, pages 129--136.

Cited By

View all
  • (2009)Towards Faceted Search for Named Entity QueriesAdvances in Web and Network Technologies, and Information Management10.1007/978-3-642-03996-6_10(100-112)Online publication date: 19-Aug-2009

Index Terms

  1. Representing a web page as sets of named entities of multiple types: a model and some preliminary applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '08: Proceedings of the 17th international conference on World Wide Web
      April 2008
      1326 pages
      ISBN:9781605580852
      DOI:10.1145/1367497
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 April 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. named entity
      2. web content mining
      3. web page model

      Qualifiers

      • Poster

      Conference

      WWW '08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2009)Towards Faceted Search for Named Entity QueriesAdvances in Web and Network Technologies, and Information Management10.1007/978-3-642-03996-6_10(100-112)Online publication date: 19-Aug-2009

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media