skip to main content
10.1145/1557914.1557935acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Relating web pages to enable information-gathering tasks

Published: 29 June 2009 Publication History

Abstract

We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by providing links to pages related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: SeekRel, FactRel and SurfRel. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks, each corresponding to a particular keyword. Scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the hub and authority values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the Companion algorithm (Dean and Henzinger, 1999).

References

[1]
Altavista. http://www.altavista.com/.
[2]
A. Aula, N. Jhaveri, and M. Käki. Information search and re-access strategies of experienced Web users. In Proc. 14th Intl. World Wide Web Conference (WWW 2005), 2005.
[3]
N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for information retrieval: Part I. Background and theory. J. Doc., 38(2):61--71, 1982.
[4]
N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for information retrieval: Part II. Results of a design study. J. Doc., 38(3):145--164, 1982.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, pages 107--117, 1998.
[6]
A. Z. Broder. A taxonomy of Web search. In ACM SIGIR Forum, pages 3--10, 2002.
[7]
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proc. 7th Intl. Conference on the World Wide Web (WWW '98), pages 65--74, 1998.
[8]
V. Cothey. A longitudinal study of World Wide Web users' information-searching behavior. J. ASIST, 53(2):67---78, 2002.
[9]
M. de Kunder. The size of the World Wide Web. http://www.worldwidewebsize.com/. Retrieved on 29th February 2008.
[10]
J. Dean and M. Henzinger. Finding related pages in the World Wide Web. In Proceedings of the 8th WWW Conference, pages 1467--147, 1999.
[11]
C. Fellbaum, editor. Wordnet: An electronic lexical database. Bradford Books, 1998.
[12]
A. J. Ferrari, D. Gourley, K. Johnson, F. C. Knabe, D. Tunkelang, and J. S. Walter. Hierarchical data-driven navigation system and method for information retrieval. U.S. Patent number 7,035,864, April 2006.
[13]
T. H. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating strategies for similarity search on the Web. In Proc. 11th Intl. Conference on the World Wide Web (WWW 2002), pages 157--163, 2002.
[14]
S.--H. S. Huang, C. H. Molina-Rodriguez, J. U. Quevedo-Torrero, and M. F. Fonseca-Lozada. Exploring similarity among Web pages using the hyperlink structure. In Proc. International Conference on Information Technology: Coding and Computing (ITCC'04), pages 344--348, 2004.
[15]
B. Jansen, D. Booth, and A. Spink. Determining the user intent of Web search engine queries. In Proceedings of the 16th International Conference on World Wide Web, pages 1149--1150, 2007.
[16]
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[17]
M. Kellar. An Examination of User Behavior during Web Information Tasks. PhD thesis, Dalhousie University, Halifax, Canada, 2007.
[18]
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668--677, 1998.
[19]
S. Lawrence and C. L. Giles. Accessibility of information on the Web. Nature, 400:107--109, 1999.
[20]
T.-P. Liang, H.-J. Lai, and Y.-C. Ku. Personalized content recommendation and user satisfaction: Theoretical synthesis and empirical findings. J. Manage. Inf. Syst., 23(3):45--70, 2007.
[21]
Z. Lin, M. Lyu, and I. King. Pagesim: A novel link--based measure of Web page similarity. In Proceedings of the 15th International Conference on World Wide Web, pages 1019--1020, 2006.
[22]
W. Lu, J. C. M. Janssen, E. E. Milios, N. Japkowicz, and Y. Zhang. Node similarity in the citation graph. Knowl. Inf. Syst., 11(1):105--129, 2007.
[23]
W. Lu, J. C. M. Janssen, E. E. Milios, and N. Japkowicz. Node similarity in networked information spaces. In Proc. Conference of the Centre for Advanced Studies on Collaborative Research, 2001.
[24]
G. Marchionini. Information Seeking in Electronic Environments. Cambridge University Press, 1995.
[25]
Nutch. http://lucene.apache.org/nutch/.
[26]
J. Pitkow and P. Pirolli. Life, death, and lawfulness on the electronic frontier. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing, 1997.
[27]
D. E. Rose. Reconciling information-seeking behavior with search user interfaces for the Web. Journal of the American Society for Information Science and Technology, 57(6):797--799, 2006.
[28]
D. E. Rose and D. Levinson. Understanding user goals in Web search. In Proc. of 13th Intl. conference on World Wide Web (WWW 2004), pages 13--19, 2004.
[29]
A. Spink, D. Wolfram, M. B. Jansen, and T. Saracevic. Searching the Web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3):226--234, 2001.
[30]
A. Tombros and Z. Ali. Factors affecting Web page similarity. In 27th European Conference on Information Retrieval (ECIR), 2005.
[31]
W. Xi, E. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In Proceedings of the 28th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 130--137, 2005.
[32]
Yahoo! Content analysis Web services: Term extraction. \tiny http://developer.yahoo.com/search/content/V1/termExtraction.html.\endthebibliography

Cited By

View all
  • (2020)“Set of Strings” Framework for Big Data ModelingIntroduction to Data Science and Machine Learning10.5772/intechopen.85602Online publication date: 25-Mar-2020
  • (2014)Investigating Features in Support of Web Tools for Information GatheringProceedings of the 2014 47th Hawaii International Conference on System Sciences10.1109/HICSS.2014.121(916-923)Online publication date: 6-Jan-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia
June 2009
410 pages
ISBN:9781605584867
DOI:10.1145/1557914
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. authorities
  2. hubs
  3. information gathering
  4. network flow
  5. related pages
  6. similarity measures

Qualifiers

  • Research-article

Conference

HT '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)“Set of Strings” Framework for Big Data ModelingIntroduction to Data Science and Machine Learning10.5772/intechopen.85602Online publication date: 25-Mar-2020
  • (2014)Investigating Features in Support of Web Tools for Information GatheringProceedings of the 2014 47th Hawaii International Conference on System Sciences10.1109/HICSS.2014.121(916-923)Online publication date: 6-Jan-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media