skip to main content
10.1145/1149993.1150008acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicweConference Proceedingsconference-collections
Article

Estimating the evolution of categorized web page populations

Published: 10 July 2006 Publication History

Abstract

This paper proposes a statistical approach for estimating the evolution of categorized web pages. The proposal is based on the capture-recapture method used in wildlife biological studies and it is modified according to the necessary assumptions and amendments for applying the experiments on the web, where web pages are considered as animals and the specific types of e-commerce pages as particular species of animals whose abundance, birth and survival rates is estimated. An artificial classifier capable of categorizing web pages, plays the role of the biologist who recognizes the species under study. Finally, a virtual experiment in the e-commerce field was simulated and the derived results were quite promising especially as far as the estimations of the survival probability values are concerned.

References

[1]
Faloutsos, M., Faloutsos, P., and Faloutsos, C., In Proceedings of On Power-Law Relationships of the Internet Topology, ACM SIGCOM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM Press, 1999, 251--262.
[2]
Barabasi, A. and Albert, R., Emergence of scaling in random networks, Science vol. 286, 1999, 509--512.
[3]
Odlyzko, A., The current state and likely evolution of the Internet, In Proceedings of Global Telecommunications Conference (GLOBECOM '99), (Rio de Janeiro, Brasil), 5-9 December 1999, 1869 -- 1876.
[4]
Floyd, S. and Paxson, V., Difficulties in Simulating the Internet, IEEE/ACM Transactions on Networking, vol. 9(4), 2001, 392--403.
[5]
Levene, M., Fenner, T., Loizou, G., and Wheeldon, R., A stochastic model for the evolution of the web, Computer Networks, vol. 39, 2002, 277--287.
[6]
Yook, S. H., Jeong, H., Barabasi, A. L., Modeling the Internet's large-scale topology, In Proceedings of the National Academy of Sciences (PNAS), vol. 99, 2002, 13382--13386.
[7]
Shavitt, Y., Sun, X., Wool, A. and Yener, B., Computing the Unmeasured: An Algebraic Approach to Internet Mapping, IEEE Journal on Selected Areas in Communications, 22(1), January 2004, 67--78.
[8]
Lawrence, S., and Giles, C., Searching the World Wide Web, Science, vol. 280, 1998, 98--100.
[9]
Lawrence, S., and Giles, C., Accessibility of information of the Web, Nature vol. 400, 1999, 107--109.
[10]
Henzinger, M., Heydon, A., Mitzenmacher, M., and Najork, M., On Near-Uniform URL Sampling, In Proceedings of the 9th Internal World Wide Web Conference, May 2000, 295--308.
[11]
Xing, S. and Paris, B., Importance sampling for measuring the size of the Internet, In Proceedings of the 35th Conference of Information Sciences Systems, Baltimore, MD, March 2001, 593--597.
[12]
Rusmevichientong, P., Pennock, D. M., Lawrence, S., and Giles, C. L., Methods for sampling pages uniformly from the world wide web, In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation, 2001, 121--128.
[13]
Henzinger, M., and Lawrence, S., Extracting knowledge from the World Wide Web, In Proceedings of the National Academy of Sciences of the United States of America (PNAS), vol. 101(1), April 6, 2004, 5186--5191.
[14]
Fetterly, D., Manasse, M., Najork, M. and Wiener, J. L., A large-scale study of the evolution of web pages, Software - Practice and Experience, vol. 34, 2004, 213--237.
[15]
Jolly, G., Explicit estimates from capture-recapture data with both death and immigration stochastic model, Biometrika vol. 52, 1965, 225--247.
[16]
Schwarz, C. and Stobo, W., Estimating temporary migration using the robust design, Biometrics vol. 53, 1997, 178--194.
[17]
White, G., and Burnham, K., Program MARK for survival estimation, Bird Study vol. 46 (Supplement), 1999, 120--139.
[18]
Spendelow, J., Nichols, J., Hines, J., Lebreton, J. and Pradel, R., Modelling post-fledging survival and age-specific breeding probabilities in species with delayed maturity: A case study of Roseate Terns at Falkner Island, Journal of Applied Statistics vol. 29, 2002, 385--405.
[19]
Williams, B., Nichols, J. and Conroy, M., Analysis and Management of Animal Populations, Academic Press, San Diego, California, 2002.
[20]
Kendall, W., Pollock, K., and Brownie, C., A likelihood-based approach to capture-recapture estimation of demographic parameters under the robust design, Biometrics vol. 51, 1995, 293--308.
[21]
Kendall, W. and Nichols, J., On the use of secondary capture-recapture samples to estimate temporary emigration and breeding proportions, Journal of Applied Statistics vol. 22, 1995, 751--762.
[22]
Kendall, W., Nichols, J., and Hines J., Estimating temporary emigration using capture-recapture data with Pollock's robust design, Ecology vol. 78, 1997, 563--578.
[23]
Jolly, G. M., Mark-recapture models with parameters constant in time, Biometrics vol. 38, 1982, 301--321.
[24]
Seber, G. A., The estimation of animal abundance and related parameters, 2nd edition, Macmillan Publishing Co., Inc. New York, 1982.
[25]
Klose, M., Lechner, U., Design of Business Media - An integrated Model of Electronic Commerce, In Proceedings of the 5th Americas Conference on Information Systems (AMCIS'99), (Milwaukee, WI, USA), August 13-15, 1999, 115--117.
[26]
Greunz, M., Stanoevska-Slabeva, K., Modeling Business Media Platforms, In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS-35'02), 2002, 2729-2738.
[27]
Petersson, J., and Lind, M., Towards the concept of business action media -- frameworks for business interaction in an electronic marketplace setting, In Proceedings of the ALOIS 2005, (Limerick, Ireland), 15--16 March, 2005, 81--98.
[28]
Anagnostopoulos, I., Anagnostopoulos, C., Loumos, V., Kayafas E., Classifying Web Pages employing a Probabilistic Neural Network Classifier, IEE Proceedings -- Software, vol. 151(3), March 2004, 139--150.
[29]
http://www.phidot.org/software/, Patuxent Wildlife Research Center.
[30]
Lindberg, M., Kendall, W., Hines, J., and Anderson, M., Combining band recovery data and Pollocks robust design to model temporary and permanent emigration, Biometrics 57, 2001, 273--281.

Cited By

View all
  • (2010)A new statistical approach to estimate global file populations from local observations in the eDonkey P2P file sharing systemannals of telecommunications - annales des télécommunications10.1007/s12243-010-0202-266:1-2(5-16)Online publication date: 29-Sep-2010
  • (2006)Adapting user's browsing behavior and web evolution features for effective search in medical portalsProceedings of the First International Workshop on Semantic Media Adaptation and Personalization10.1109/SMAP.2006.7(37-42)Online publication date: 4-Dec-2006

Index Terms

  1. Estimating the evolution of categorized web page populations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICWE '06: Workshop proceedings of the sixth international conference on Web engineering
    July 2006
    156 pages
    ISBN:1595934359
    DOI:10.1145/1149993
    • Conference Chairs:
    • Nora Koch,
    • Luis Olsina
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 July 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. capture-recapture measurements
    2. web evolution
    3. web page categorization

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)A new statistical approach to estimate global file populations from local observations in the eDonkey P2P file sharing systemannals of telecommunications - annales des télécommunications10.1007/s12243-010-0202-266:1-2(5-16)Online publication date: 29-Sep-2010
    • (2006)Adapting user's browsing behavior and web evolution features for effective search in medical portalsProceedings of the First International Workshop on Semantic Media Adaptation and Personalization10.1109/SMAP.2006.7(37-42)Online publication date: 4-Dec-2006

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media