ACM Home Page
Please provide us with feedback. Feedback
GoGetIt!: a tool for generating structure-driven web crawlers
Full text PdfPdf (158 KB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
POSTER SESSION: Browsers and UI, web engineering, hypermedia & multimedia, security, and accessibility table of contents
Pages: 1011 - 1012  
Year of Publication: 2006
ISBN:1-59593-323-9
Authors
Márcio L. A. Vidal  Universidade Federal do Amazonas, Amazonas, Brazil
Altigran S. da Silva  Universidade Federal do Amazonas, Amazonas, Brazil
Edleno S. de Moura  Universidade Federal do Amazonas, Amazonas, Brazil
João M. B. Cavalcanti  Universidade Federal do Amazonas, Amazonas, Brazil
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 50,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135990
What is a DOI?

ABSTRACT

We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a Web site and generates a structure-driven crawler based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page. In the experiments we have performed, structure-driven crawlers generated by GoGetIt! were able to collect all pages that match the samples given, including those pages added after their generation.



Collaborative Colleagues:
Márcio L. A. Vidal: colleagues
Altigran S. da Silva: colleagues
Edleno S. de Moura: colleagues
João M. B. Cavalcanti: colleagues