skip to main content
10.1145/2910896.2910901acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

The Dawn of Today's Popular Domains: A Study of the Archived German Web over 18 Years

Published: 19 June 2016 Publication History

Abstract

The Web has been around and maturing for 25 years. The popular websites of today have undergone vast changes during this period, with a few being there almost since the beginning and many new ones becoming popular over the years. This makes it worthwhile to take a look at how these sites have evolved and what they might tell us about the future of the Web. We therefore embarked on a longitudinal study spanning almost the whole period of the Web, based on data collected by the Internet Archive starting in 1996, to retrospectively analyze how the popular Web as of now has evolved over the past 18 years.
For our study we focused on the German Web, specifically on the top 100 most popular websites in 17 categories. This paper presents a selection of the most interesting findings in terms of volume, size as well as age of the Web. While related work in the field of Web Dynamics has mainly focused on change rates and analyzed datasets spanning less than a year, we looked at the evolution of websites over 18 years. We found that around 70% of the pages we investigated are younger than a year, with an observed exponential growth in age as well as in size up to now. If this growth rate continues, the number of pages from the popular domains will almost double in the next two years. In addition, we give insights into our data set, provided by the Internet Archive, which hosts the largest and most complete Web archive as of today.

References

[1]
J. Cho and H. Garcia-Molina. The evolution of the web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB '00.
[2]
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A large-scale study of the evolution of web pages. In Proceedings of the 12th International Conference on World Wide Web, WWW '03.
[3]
Wallace Koehler. Web page change and persistence a four-year longitudinal study. Journal of the American Society for Information Science and Technology, 53 (2): 162--171, January 2002.
[4]
Alexandros Ntoulas, Junghoo Cho, and Christopher Olston. What's new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th International Conference on World Wide Web, WWW '04.
[5]
Eytan Adar, Jaime Teevan, Susan T. Dumais, and Jonathan L. Elsas. The web changes everything: Understanding the dynamics of web content. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM '09.
[6]
Myriam Ben Saad and Stéphane Gançarski. Archiving the web using page changes patterns: a case study. International Journal on Digital Libraries, 13 (1): 33--49, December 2012.
[7]
Ricardo Baeza-Yates and Barbara Poblete. Dynamics of the chilean web structure. Computer Networks, 50 (10): 1464--1473, 2006.
[8]
Ilaria Bordino, Paolo Boldi, Debora Donato, Massimo Santini, and Sebastiano Vigna. Temporal evolution of the uk web. In phData Mining Workshops, ICDMW'08, pages 909--918, 2008.
[9]
Scott A. Hale, Taha Yasseri, Josh Cowls, Eric T. Meyer, Ralph Schroeder, and Helen Margetts. Mapping the UK webspace: Fifteen years of british universities on the web. In Proceedings of the 2014 ACM Conference on Web Science, WebSci '14.
[10]
Teru Agata, Yosuke Miyata, Emi Ishita, Atsushi Ikeuchi, and Shuichi Ueda. Life span of web pages: A survey of 10 million pages collected in 2001. Digital Libraries, 2014.
[11]
Lulwah Alkwai, Michael L Nelson, and Michele C Weigle. How well are arabic websites archived? In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, 2015.

Cited By

View all
  • (2022)Modeling clusters from the ground up: A web data approachEnvironment and Planning B: Urban Analytics and City Science10.1177/2399808322110818550:1(244-267)Online publication date: 17-Jun-2022
  • (2022)Using the Web to Predict Regional Trade Flows: Data Extraction, Modeling, and ValidationAnnals of the American Association of Geographers10.1080/24694452.2022.2109577113:3(717-739)Online publication date: 19-Oct-2022
  • (2021)FaxPlainACProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481985(4823-4827)Online publication date: 26-Oct-2021
  • Show More Cited By

Index Terms

  1. The Dawn of Today's Popular Domains: A Study of the Archived German Web over 18 Years

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
      June 2016
      316 pages
      ISBN:9781450342292
      DOI:10.1145/2910896
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 June 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. analysis
      2. longitudinal
      3. retrospective
      4. statistics
      5. web dynamics

      Qualifiers

      • Research-article

      Funding Sources

      • European Research Council

      Conference

      JCDL '16
      Sponsor:

      Acceptance Rates

      JCDL '16 Paper Acceptance Rate 15 of 52 submissions, 29%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Modeling clusters from the ground up: A web data approachEnvironment and Planning B: Urban Analytics and City Science10.1177/2399808322110818550:1(244-267)Online publication date: 17-Jun-2022
      • (2022)Using the Web to Predict Regional Trade Flows: Data Extraction, Modeling, and ValidationAnnals of the American Association of Geographers10.1080/24694452.2022.2109577113:3(717-739)Online publication date: 19-Oct-2022
      • (2021)FaxPlainACProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481985(4823-4827)Online publication date: 26-Oct-2021
      • (2021)How Can an Archive Be Characterized?Linking Theory and Practice of Digital Libraries10.1007/978-3-030-86324-1_14(118-122)Online publication date: 7-Sep-2021
      • (2021)A Holistic View on Web ArchivesThe Past Web10.1007/978-3-030-63291-5_8(85-99)Online publication date: 1-Jul-2021
      • (2020)Digital economy in the UK: regional productivity effects of early adoptionRegional Studies10.1080/00343404.2020.182642055:12(1924-1938)Online publication date: 17-Nov-2020
      • (2019)Estimating PageRank deviations in crawled graphsApplied Network Science10.1007/s41109-019-0201-94:1Online publication date: 22-Oct-2019
      • (2018)Micro Archives as Rich Digital Object RepresentationsProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201110(353-357)Online publication date: 15-May-2018
      • (2018)Delusive PageRank in Incomplete GraphsComplex Networks and Their Applications VII10.1007/978-3-030-05411-3_9(104-117)Online publication date: 2-Dec-2018
      • (2016)Archiving Software Surrogates on the Web for Future ReferenceResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-43997-6_17(215-226)Online publication date: 10-Aug-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media