ACM Home Page
Please provide us with feedback. Feedback
Using graph matching techniques to wrap data from PDF documents
Full text PdfPdf (191 KB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
POSTER SESSION: Browsers and UI, web engineering, hypermedia & multimedia, security, and accessibility table of contents
Pages: 901 - 902  
Year of Publication: 2006
ISBN:1-59593-323-9
Authors
Tamir Hassan  Vienna University of Technology, Wien, Austria
Robert Baumgartner  Vienna University of Technology, Wien, Austria
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 50,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135935
What is a DOI?

ABSTRACT

Wrapping is the process of navigating a data source, semi-automatically extracting data and transforming it into a form suitable for data processing applications. There are currently a number of established products on the market for wrapping data from web pages. One such approach is Lixto [1], a product of research performed at our institute.Our work is concerned with extending the wrapping functionality of Lixto to PDF documents. As the PDF format is relatively unstructured, this is a challenging task. We have developed a method to segment the page into blocks, which are represented as nodes in a relational graph. This paper describes our current research in the use of relational matching techniques on this graph to locate wrapping instances.



Collaborative Colleagues:
Tamir Hassan: colleagues
Robert Baumgartner: colleagues