skip to main content
10.1145/1935826.1935900acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
poster

A combined topical/non-topical approach to identifying web sites for children

Published: 09 February 2011 Publication History

Abstract

Today children interact more and more frequently with information services. Especially in on-line scenarios there is a great amount of content that is not suitable for their age group. Due to the growing importance and ubiquity of the Internet in today's world, denying children any unsupervised Web access is often not possible. This work presents an automatic way of distinguishing web pages for children from those for adults in order to improve child-appropriate web search engine performance. A range of 80 different features based on findings from cognitive sciences and children's psychology are discussed and evaluated. We conducted a large scale user study on the suitability of web sites and give detailed information about the insights gained. Finally a comparison to traditional web classification methods as well as human annotator performance reveals that our automatic classifier can reach a performance close to that of human agreement.

References

[1]
Ask Kids. http://www.askkids.com, 2010.
[2]
CrowdFlower - Harness the advantages of crowdsourcing. http://www.crowdflower.com, 2010.
[3]
PuppyIR: An Open Source Environment to Construct Information Services for Children. http://www.puppyir.eu, 2010.
[4]
The Open Directory Project - Kids & Teens. http://www.dmoz.org/kids and teens/, 2010.
[5]
Yahoo! Kids. http://kids.yahoo.com/, 2010.
[6]
Alias-i. LingPipe. http://alias-i.com/lingpipe, 2010
[7]
P.N. Bennett and N. Nguyen. Refined experts: improving classification in large taxonomies. In SIGIR 2009.
[8]
J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of NAACL HLT, 2007.
[9]
S.L. Calvert. Children as consumers: Advertising and marketing. The Future of Children, 2008.
[10]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR 2007.
[11]
K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading dfficulty. In Proceedings of HLT/NAACL, volume 4, 2004.
[12]
H.K. Dai, L. Zhao, Z. Nie, J.R. Wen, L. Wang, and Y. Li. Detecting online commercial intention (OCI). In WWW 2006, page 837. ACM.
[13]
L. Feng. Automatic readability assessment for people with intellectual disabilities. ACM SIGACCESS, (93), 2009.
[14]
L. Feng, N. Elhadad, and M. Huenerfauth. Cognitively motivated features for readability assessment. In EACL, pages 229--237. ACL, 2009.
[15]
E. Gabrilovich and S. Markovitch. Harnessing the expertise of 70,000 human editors: Knowledge-based feature generation for text categorization. Journal of Machine Learning Research, 8:2297--2345, 2007.
[16]
K. Golub and A. Ardo. Importance of HTML structural elements and metadata in automated subject classification. ECDL 2005, pages 368--378.
[17]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009.
[18]
G.R. Klare. The measurement of readability: useful information for communicators. ACM Journal of Computer Documentation (JCD), 24(3):121, 2000.
[19]
P. Kolari, T. Finin, and A. Joshi. SVMs for the blogosphere: Blog identification and splog detection. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
[20]
A. Large, J. Beheshti, and A. Breuleux. Information seeking in a multimedia environment by primary school students* 1. Library & Information Science Research, 20(4):343--376, 1998.
[21]
A. Large, J. Beheshti, and T. Rahman. Design criteria for children's Web portals: The users speak out. JASIST, 53(2):79--94, 2002.
[22]
B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In WWW 2005.
[23]
T.Y. Liu, Y. Yang, H. Wan, H.J. Zeng, Z. Chen, and W.Y. Ma. Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter, 7(1):43, 2005.
[24]
G.H. McLaughlin. SMOG grading: A new readability formula. Journal of reading, 12(8):639--646, 1969.
[25]
S. Naidu. Evaluating the usability of educational websites for children. Usability News, 7(2), 2005.
[26]
Jakob Nielsen. Kids' corner: Website usability for children. http://www.useit.com/alertbox /children.html, May 2010.
[27]
Ofcom. Uk children's media literacy: Research document. http://www.ofcom.org.uk/advice/media literacy/medlitpub/medlitpubrss/ukchildrensml/ukchildrensml1.pdf, March 2010.
[28]
X. Qi and B.D. Davison. Web page classification: Features and algorithms. ACM CSUR II 2009.
[29]
J. Schacter, G.K.W.K. Chung, and A. Dorr. Children's Internet searching on complex problems: performance and process analyses. JASIST, 49(9):840--849.
[30]
S. Schwarm and M. Ostendorf. Reading level assessment using support vector machines and statistical language models. In ACL 2005, volume 43.
[31]
E.A. Wartella, E.A. Vandewater, and V.J. Rideout. Introduction: electronic media use in the lives of infants, toddlers, and preschoolers. American Behavioral Scientist, 48(5):501, 2005.

Cited By

View all
  • (2023)Filtering objectionable information access based on click-through behaviours with deep learning methodsJournal of Information Science10.1177/01655515231160041Online publication date: 7-Mar-2023
  • (2021)Learning to Rank for Educational Search EnginesIEEE Transactions on Learning Technologies10.1109/TLT.2021.307519614:2(211-225)Online publication date: 1-Apr-2021
  • (2020)Automatic Content Inspection and Forensics for Children Android AppsIEEE Internet of Things Journal10.1109/JIOT.2020.29822487:8(7123-7134)Online publication date: Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
February 2011
870 pages
ISBN:9781450304931
DOI:10.1145/1935826
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. children
  2. classification
  3. filtering
  4. suitability
  5. web search

Qualifiers

  • Poster

Conference

Acceptance Rates

WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Filtering objectionable information access based on click-through behaviours with deep learning methodsJournal of Information Science10.1177/01655515231160041Online publication date: 7-Mar-2023
  • (2021)Learning to Rank for Educational Search EnginesIEEE Transactions on Learning Technologies10.1109/TLT.2021.307519614:2(211-225)Online publication date: 1-Apr-2021
  • (2020)Automatic Content Inspection and Forensics for Children Android AppsIEEE Internet of Things Journal10.1109/JIOT.2020.29822487:8(7123-7134)Online publication date: Aug-2020
  • (2018)A Safer YouTube Kids: An Extra Layer of Content Filtering Using Automated Multimodal AnalysisIntelligent Systems and Applications10.1007/978-3-030-01054-6_21(294-308)Online publication date: 9-Nov-2018
  • (2016)"Robust statistical methods in web retrieval" by Carsten Eickhoff and Arjen P. de Vries, with Martin Vesely as coordinatorACM SIGWEB Newsletter10.1145/2857659.28576632016:Winter(1-11)Online publication date: 11-Jan-2016
  • (2016)A Score Fusion Method Using a Mixture CopulaDatabase and Expert Systems Applications10.1007/978-3-319-44406-2_16(216-232)Online publication date: 6-Aug-2016
  • (2015)An Eye-Tracking Study of Query ReformulationProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767703(13-22)Online publication date: 9-Aug-2015
  • (2015)Mining browsing behaviors for objectionable content filteringJournal of the Association for Information Science and Technology10.1002/asi.2321766:5(930-942)Online publication date: 1-May-2015
  • (2014)Children’s Internet Search: Using Roles to Understand Children’s Search BehaviorSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00591ED1V01Y201408ICR0346:2(1-106)Online publication date: 22-Sep-2014
  • (2014)Modelling Complex Relevance Spaces with CopulasProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661925(1831-1834)Online publication date: 3-Nov-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media