| A framework for web table mining |
| Full text |
Pdf
(225 KB)
|
| Source
|
Workshop On Web Information And Data Management
archive
Proceedings of the 4th international workshop on Web information and data management
table of contents
McLean, Virginia, USA
SESSION: Web mining
table of contents
Pages: 36 - 42
Year of Publication: 2002
ISBN:1-58113-593-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 77, Citation Count: 4
|
|
|
ABSTRACT
Web table mining is about information extraction from tables published inside web pages as HTML texts. Most previous work on this subject makes use of the tags to discover components of the table. Our work treats web as a distinct publication media, in two ways. We argue that new types of table format have been developed specially for the web. We also argue that the visual cues embedded within the HTML text, are utilized by the authors to direct the viewer on how to read the contents contained a web table properly. We develop a framework for comprehensively analyzing the structural aspects of a web table, within which rules are devised to process and extract attribute-value pairs from the table. This approach to web table mining is validated by good experimental results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Hurst, Classifying TABLE Elements in HTML. In Poster, 11th International World Wide Web Conference, Honolulu, HI, May 2002, http://www2002.org/CDROM/poster/115/index.html
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
Y. Yang, Web Table Mining and Database Discovery, M.Sc. thesis, Simon Fraser University, August, 2002
|
| |
7
|
M. Yoshida, K. Torisawa, and J. Tsujji, A Method to Integrate Tables of the World Wide Web, In Proc. 1st International Workshop on Web Document Analysis, Seattle, WA, USA, September 2001, pp. 31--34
|
CITED BY 4
|
|
|
|
|
|
|
|
|
|
Wolfgang Gatterbauer , Paul Bohunsky , Marcus Herzog , Bernhard Krüpl , Bernhard Pollak, Towards domain-independent information extraction from web tables, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
The effect of latency on user performance in Warcraft III
Proceedings of the 2nd workshop on Network and system support for games
Nathan Sheldon
, Eric Girard
, Seth Borg
, Mark Claypool
, Emmanuel Agu
-
Learning subjective relevance to facilitate information access
Proceedings of the fourth international conference on Information and knowledge management
James R. Chen
, Nathalie Mathé
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
|