ACM Home Page
Please provide us with feedback. Feedback
A framework for web table mining
Full text pdf formatPdf (225 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 4th international workshop on Web information and data management table of contents
McLean, Virginia, USA
SESSION: Web mining table of contents
Pages: 36 - 42  
Year of Publication: 2002
ISBN:1-58113-593-9
Authors
Yingchen Yang  Simon Fraser University, Burnaby, Canada
Wo-Shun Luk  Simon Fraser University, Burnaby, Canada
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 77,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584931.584940
What is a DOI?

ABSTRACT

Web table mining is about information extraction from tables published inside web pages as HTML texts. Most previous work on this subject makes use of the tags to discover components of the table. Our work treats web as a distinct publication media, in two ways. We argue that new types of table format have been developed specially for the web. We also argue that the visual cues embedded within the HTML text, are utilized by the authors to direct the viewer on how to read the contents contained a web table properly. We develop a framework for comprehensively analyzing the structural aspects of a web table, within which rules are devised to process and extract attribute-value pairs from the table. This approach to web table mining is validated by good experimental results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. Hurst, Classifying TABLE Elements in HTML. In Poster, 11th International World Wide Web Conference, Honolulu, HI, May 2002, http://www2002.org/CDROM/poster/115/index.html
 
3
4
 
5
 
6
Y. Yang, Web Table Mining and Database Discovery, M.Sc. thesis, Simon Fraser University, August, 2002
 
7
M. Yoshida, K. Torisawa, and J. Tsujji, A Method to Integrate Tables of the World Wide Web, In Proc. 1st International Workshop on Web Document Analysis, Seattle, WA, USA, September 2001, pp. 31--34


Collaborative Colleagues:
Yingchen Yang: colleagues
Wo-Shun Luk: colleagues

Peer to Peer - Readers of this Article have also read: