skip to main content
10.1145/1166253.1166274acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
Article

Enabling web browsers to augment web sites' filtering and sorting functionalities

Published: 15 October 2006 Publication History

Abstract

Existing augmentations of web pages are mostly small cosmetic changes (e.g., removing ads) and minor addition of third-party content (e.g., product prices from competing sites). None leverages the structured data presented in web pages. This paper describes Sifter, a web browser extension that can augment a well-structured web site with advanced filtering and sorting functionality. These added features work inside the site's own pages, preserving the site's presentational style and the user's context. Sifter contains an algorithm that scrapes structured data out of well-structured web pages while usually requiring no user intervention. We tested Sifter on real web sites and real users and found that people could use Sifter to perform sophisticated queries and high-level analyses on sizable data collections on the Web. We propose that web sites can be similarly augmented with other sophisticated data-centric functionality, giving users new benefits over the existing Web.

Supplementary Material

JPG File (1166274.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p125-slides.zip)
Supplemental material for Enabling web browsers to augment web sites' filtering and sorting functionalities
Audio only (1166274.mp3)
Video (1166274.mp4)

References

[1]
Evaluation of Sifter's data extraction algorithm. http://people.csail.mit.edu/dfhuynh/research/papers/uist2006-augmenting-web-sites-stats.pdf.
[2]
Greasemonkey. http://greasemonkey.mozdev.org/.
[3]
Piggy Bank. http://simile.mit.edu/piggy-bank/.
[4]
Resource Description Framework (RDF)/W3C SemanticWeb Activity. http://www.w3.org/RDF/.
[5]
XML Path Language (XPath) Version 1.0. http://www.w3.org/TR/xpath.
[6]
Ahlberg, C., B. Shneiderman. Visual information seeking: tight coupling of dynamic query filters with starfield displays. CHI 1994.
[7]
Barrett, R., P. Maglio, and D. Kellem. How to personalize the web. CHI 1997.
[8]
Bolin, M., M. Webber, P. Rha, T. Wilson, and R. Miller. Automation and customization of rendered Web pages. UIST 2005.
[9]
Hogue, A. and D. Karger. Thresher: automating the unwrapping of semantic content from the World Wide Web. WWW 2005.
[10]
Huynh, D., S. Mazzocchi, and D. Karger. Piggy Bank: experience the Semantic Web inside your Web browser. ISWC 2005.
[11]
Joachims, T., D. Freitag, and T. Mitchell. WebWatcher: a tour guide for the World Wide Web. IJCAI 1997.
[12]
Lerman, K., L. Getoor, S. Minton, and C. Knoblock. Using the structure of Web sites for automatic segmentation of tables. SIGMOD 2004.
[13]
Nardi, B. A., J. R. Miller, and D. J. Wright. Collaborative, programmable intelligent agents. Communications of the ACM 41:33, 96--104, March 1998.
[14]
Pandit, M. S., and S. Kalbag. The Selection Recognition Agent: instant access to relevant information and operations. IUI 1997.
[15]
Quan, D., D. Huynh, and D. Karger. Haystack: a platform for authoring end-user Semantic Web applications. ISWC 2003.
[16]
Reis, D. C., P. B. Golgher, A. S. Silva, and A. F. Laender. Automatic Web news extraction using tree edit distance. WWW 2004.
[17]
Shneiderman, B. Dynamic queries for visual information seeking. IEEE Software, 11:6, 70--77, 1994.
[18]
Spenke, M., C. Beilken, and T. Berlage. FOCUS: the interactive table for product comparison and selection. UIST 1996.
[19]
Tai, K.-C. The tree-to-tree correction problem. J. Association of Computing Machinery, 26(3):422--433, July 1979.
[20]
Wang, J.-Y., and F. Lochovsky. Data extraction and label assignment for Web databases. WWW 2003.
[21]
Wittenburg, K., T. Lanning, M. Heinrichs, and M. Stanton. Parallel bargrams for consumer-based information exploration and choice. UIST 2001, 51--60.
[22]
Wood, A., A. Dey, and G. D. Abowd. CyberDesk: automated integration of desktop and network services. CHI 1997.
[23]
Yee, K-P., K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. CHI 2003.
[24]
Zhai, Y., and B. Liu. Web data extraction based on partial tree alignment. WWW 2005.

Cited By

View all
  • (2024)ScrapeViz: Hierarchical Representations for Web Scraping Macros2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00040(300-305)Online publication date: 2-Sep-2024
  • (2023)MIWA: Mixed-Initiative Web Automation for Better User Control and ConfidenceProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606720(1-15)Online publication date: 29-Oct-2023
  • (2021)Towards End-User Web Scraping for CustomizationCompanion Proceedings of the 5th International Conference on the Art, Science, and Engineering of Programming10.1145/3464432.3464437(49-59)Online publication date: 22-Mar-2021
  • Show More Cited By

Index Terms

  1. Enabling web browsers to augment web sites' filtering and sorting functionalities

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '06: Proceedings of the 19th annual ACM symposium on User interface software and technology
    October 2006
    354 pages
    ISBN:1595933131
    DOI:10.1145/1166253
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DOM
    2. HTML
    3. augment
    4. dynamic query
    5. faceted browsing
    6. filter
    7. sort
    8. tree alignment
    9. web

    Qualifiers

    • Article

    Conference

    UIST06

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ScrapeViz: Hierarchical Representations for Web Scraping Macros2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00040(300-305)Online publication date: 2-Sep-2024
    • (2023)MIWA: Mixed-Initiative Web Automation for Better User Control and ConfidenceProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606720(1-15)Online publication date: 29-Oct-2023
    • (2021)Towards End-User Web Scraping for CustomizationCompanion Proceedings of the 5th International Conference on the Art, Science, and Engineering of Programming10.1145/3464432.3464437(49-59)Online publication date: 22-Mar-2021
    • (2020)Wildcard: spreadsheet-driven customization of web applicationsCompanion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming10.1145/3397537.3397541(126-135)Online publication date: 23-Mar-2020
    • (2019)Promoting better financial inclusion through web page transformation—a systematic literature reviewJournal of Banking and Financial Technology10.1007/s42786-019-00010-03:2(131-147)Online publication date: 26-Nov-2019
    • (2018)RousillonProceedings of the 31st Annual ACM Symposium on User Interface Software and Technology10.1145/3242587.3242661(963-975)Online publication date: 11-Oct-2018
    • (2017)DS.jsProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology10.1145/3126594.3126663(691-702)Online publication date: 20-Oct-2017
    • (2017)GneissJournal of Visual Languages and Computing10.1016/j.jvlc.2016.07.00439:C(41-50)Online publication date: 1-Apr-2017
    • (2016)Crowdsourcing Human Annotation on Web Page StructureACM Transactions on Intelligent Systems and Technology10.1145/28706497:4(1-25)Online publication date: 25-Apr-2016
    • (2016)An environment for End-User Development of Web mashupsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2015.10.00887:C(38-64)Online publication date: 1-Mar-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media