skip to main content
10.1145/1076034.1076070acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Web-page summarization using clickthrough data

Published: 15 August 2005 Publication History

Abstract

Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page needed in building a high-quality summary, because many of these methods do not consider the hidden relationships in the Web. Uncovering the hidden knowledge is important in building good Web-page summarizers. In this paper, we extract the extra knowledge from the clickthrough data of a Web search engine to improve Web-page summarization. Wefirst analyze the feasibility in utilizing the clickthrough data to enhance Web-page summarization and then propose two adapted summarization methods that take advantage of the relationships discovered from the clickthrough data. For those pages that are not covered by the clickthrough data, we design a thematic lexicon approach to generate implicit knowledge for them. Our methods are evaluated on a dataset consisting of manually annotated pages as well as a large dataset that is crawled from the Open Directory Project website. The experimental results indicate that significant improvements can be achieved through our proposed summarizer as compared to the summarizers that do not use the clickthrough data.

References

[1]
E. Amitay and C. Paris. Automatically summarising web sites: is there a way around it? In Proceedings of the 9th international conference on Information and knowledge management, pages 173--179, New York, NY, USA, 2000. ACM Press.
[2]
M. W. Berry, S. T. Dumais, and G. W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Rev., 37(4):573--595, 1995.
[3]
O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Seeing the whole in parts: text summarization for web browsing on handheld devices. In Proceedings of the tenth international conference on World Wide Web, pages 652--662. ACM Press, 2001.
[4]
S. Chuang and L. Chien. Enriching web taxonomies through subject categorization of query terms from search engine logs. Decision Support Systems, 35:113--127, 2003.
[5]
J.-Y. Delort, B. Bouchon-Meunier, and M. Rifqi. Enhanced web document summarization using hyperlinks. In Proceedings of the 14th ACM conference on Hypertext and hypermedia, pages 208--215, New York, NY, USA, 2003. ACM Press.
[6]
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: sentence selection and evaluation
[7]
Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--25. ACM Press, 2001.
[8]
C.-C. Huang, S.-L. Chuang, and L.-F. Chien. Using a web-based categorization approach to generate thematic metadata form texts. In ACM Transactions on Asian Language Information Processing, pages 190--212. ACM Press, 2004.
[9]
A. Hulth, J. Karlgren, A. Jonsson, H. Bostrom, and L. Asker. Automatic keyword extraction using domain knowledge. Computational Linguistics and Intelligent Text Processing, 2004.
[10]
K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In Proceedings of the 13th international conference on World Wide Web, pages 658--665, New York, NY, USA, 2004. ACM Press.
[11]
C. Y. Lin and E. H. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In 2003 Language Technology Conference. ACM Press, 2003.
[12]
F. Liu, C. Yu, and W. Meng. Personalized web search by mapping user queries to categories. In Proceedings of the eleventh international conference on Information and knowledge management, pages 558--565. ACM Press, 2002.
[13]
H. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159--165, 1958.
[14]
I. Mani, D. House, G. Klein, L. Hirschman, T. Firmin, and B. Sundheim. The tipster summac text summarization evaluation. In Proc. of the 9th conference on European chapter of the Association for Computation Linguistics, Bergen, Norway, 1999.
[15]
I. Mani and M. T. Maybury. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA, 1999.
[16]
D. Shen, Z. Chen, Q. Yang, H.-J. Zeng, B. Zhang, Y. Lu, and W.-Y. Ma. Web-page classification through summarization. In Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 242--249. ACM Press, 2004.
[17]
J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. Cubesvd: A novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web, pages 652--662. ACM Press, 2005.

Cited By

View all
  • (2023)Towards Social Context Summarization with Convolutional Neural NetworksComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_27(341-353)Online publication date: 26-Feb-2023
  • (2021)Unsupervised Summarization Approach With Computational Statistics of Microblog DataMethodologies and Applications of Computational Statistics for Machine Intelligence10.4018/978-1-7998-7701-1.ch002(23-37)Online publication date: 2021
  • (2021)WATS-SMS: A T5-Based French Wikipedia Abstractive Text Summarizer for SMSFuture Internet10.3390/fi1309023813:9(238)Online publication date: 18-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clickthrough data
  2. generic web-page summarization
  3. latent semantic analysis
  4. thematic lexicon

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Towards Social Context Summarization with Convolutional Neural NetworksComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_27(341-353)Online publication date: 26-Feb-2023
  • (2021)Unsupervised Summarization Approach With Computational Statistics of Microblog DataMethodologies and Applications of Computational Statistics for Machine Intelligence10.4018/978-1-7998-7701-1.ch002(23-37)Online publication date: 2021
  • (2021)WATS-SMS: A T5-Based French Wikipedia Abstractive Text Summarizer for SMSFuture Internet10.3390/fi1309023813:9(238)Online publication date: 18-Sep-2021
  • (2021)Using Machine Learning for Web Page Classification in Search Engine OptimizationFuture Internet10.3390/fi1301000913:1(9)Online publication date: 2-Jan-2021
  • (2020)Transformer-based Summarization by Exploiting Social Information2020 12th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE50997.2020.9287388(25-30)Online publication date: 12-Nov-2020
  • (2020)Distant Supervision for Keyphrase Extraction using Search Queries2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService49289.2020.00019(70-77)Online publication date: Aug-2020
  • (2020)Improving Search Snippets in Context-Aware Web Search ScenariosInformation Retrieval10.1007/978-3-030-56725-5_1(3-16)Online publication date: 10-Aug-2020
  • (2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
  • (2018)Topic and sentiment aware microblog summarization for twitterJournal of Intelligent Information Systems10.1007/s10844-018-0521-8Online publication date: 8-Aug-2018
  • (2018)Multi-document Summarization and Opinion Mining Using Stack Decoder Method and Neural NetworksData Management, Analytics and Innovation10.1007/978-981-13-1274-8_5(61-78)Online publication date: 8-Sep-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media