ACM Home Page
Please provide us with feedback. Feedback
Web data cleansing for information retrieval using key resource page selection
Full text PdfPdf (199 KB)
Source International World Wide Web Conference archive
Special interest tracks and posters of the 14th international conference on World Wide Web table of contents
Chiba, Japan
POSTER SESSION: Posters table of contents
Pages: 1136 - 1137  
Year of Publication: 2005
ISBN:1-59593-051-5
Authors
Yiqun Liu  Tsinghua University, Beijing, China P.R.
Canhui Wang  Tsinghua University, Beijing, China P.R.
Min Zhang  Tsinghua University, Beijing, China P.R.
Shaoping Ma  Tsinghua University, Beijing, China P.R.
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 47,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062745.1062906
What is a DOI?

ABSTRACT

With the page explosion of WWW, how to cover more useful information with limited storage and computation resources becomes more and more important in web IR research. Using web page non-content feature analysis, we proposed a clustering-based method to select high quality pages from the whole page set. Although the result page set contains only 44.3% of the whole collection, it is related with more than 98% of links and covers about 90% of key information. Link property and retrieval affects are also observed and experiment results show that key resource selection method is more suitable for the job of data cleansing and the result page set outperforms the whole collection by smaller size and better retrieval performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D. Hawking and N. Craswell. Overview of the TREC-2003 Web track. NIST Special Publication: SP 500--255, The Twelfth Text Retrieval Conference (TREC 2003), 2003.
3
 
4

Collaborative Colleagues:
Yiqun Liu: colleagues
Canhui Wang: colleagues
Min Zhang: colleagues
Shaoping Ma: colleagues