skip to main content
10.1145/1149941.1149960acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
Article

Identifying commented passages of documents using implicit hyperlinks

Published: 22 August 2006 Publication History

Abstract

This paper addresses the issue of automatically selecting passages of blog posts using readers' comments. The problem is difficult because: (i) the textual content of blogs is often noisy, (ii) comments do not always target passages of the posts and, (iii) comments are not equally useful for identifying important passages. We have developed a system for selecting commented passages which takes as input blog posts and their comments and delivers, for each post, the sentences of the post which are the most commented and/or the most discussed. Our approach combines three steps to identify commented passages of a post. The first step is to remove the complexity of processing the contents of posts and comments using heuristics adapted to the language of the blog. The second step is to find useful comments and assigns them a degree of relevance using a model automatically built and validated by an expert. The third step is to identify important passages using relevant comments. We conducted two experiments to evaluate the usefulness and the effectiveness of our approach. The first study show that in only 50% of the posts, the most commented sentence elicited by our approach corresponds to the post extract generated using generic summarization. In the second study, human participants confirmed that, in practice, selected passages are frequently commented passages.

References

[1]
Delort, J.-Y., Bouchon-Meunier, B. and Rifqi, M. Enhanced Web-Document Summarization Using Hyperlinks. In Proceedings of the Thirteen Conference on Hypertext and Hypermedia, pages 208--216, ACM Press, 2003.]]
[2]
Delort, J.-Y., Bouchon-Meunier, B. and Rifqi, M. Summarization by Context, in Poster Proceedings of the Twelfth International World Wide Web Conference, 2003.]]
[3]
Amitay, E. and Paris, C. Automatically Summarizing Web Sites -- Is There A Way Around It? In Proceedings of the Ninth International Conference on Information and Knowledge Management, pages 173--179, ACM Press, 2000.]]
[4]
Dave, D. and Lawrence, S. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the Twelfth International World Wide Web Conference, pages 519--528, ACM Press, 2003.]]
[5]
Pang, Bo and Lee, L. and Vaithyanathan S., Thumbs up? Sentiment Classification using Machine Learning Techniques, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 79--86, 2002.]]
[6]
Esuli, A. and Sebastiani, F. Determining the Semantic Orientation of Terms through Gloss Classification, Proceedings of the Fourteenth International Conference on Information and Knowledge Management, pages 617--624, ACM Press, 2005.]]
[7]
Fukumoto, F., Suzukit, Y. and Fukumoto, J. An Automatic Extraction of Key Paragraphs Based on Context Dependency. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 291--298, 1997.]]
[8]
Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, pages 121--128, ACM Press, 1999.]]
[9]
Marshall, C. Toward an ecology of hypertext annotation. In Proceedings of the Ninth Conference on Hypertext and Hypermedia, pages 40--49, ACM Press, 1998.]]
[10]
Radev, D., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Celebi, A., Qi, H., Drabek, E. and Danyu Liu. Evaluation of Text Summarization in a Cross-Lingual Information Retrieval Framework. Technical Report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, June, 2002.]]
[11]
Shipman, F.M., Price, M.N., Marshall, C.C., and Golovchinsky, G. Identifying Useful Passages in Documents based on Annotation Patterns. In Proceedings of the European Conference on Digital Libraries, pages 101--112, 2003.]]
[12]
Trigg, R. H. A Network-Based Approach to Text Handling for the Online Scientific Community. Ph.D. Thesis, Dept. of Computer Science, University of Maryland November, 1983.]]
[13]
Sun, J., Shen, D., Zeng, H., Yang, Q., Lu, Y., and Chen, Z. Web-page Summarization Using Clickthrough Data. In Proceedings of the 28th International Conference on Research and Development in Information Retrieval, pages 194--201, ACM Press, 2005.]]
[14]
Menczer F. Links tell us about lexical and semantic web content. Technical report, Computer Science, abstract CS.IR/0108004, August, 2001.]]
[15]
Sparck-Jones, K. and Galliers, J.R. Evaluating Natural Language Processing Systems: An Analysis and Review. Lecture Notes in Artificial Intelligence. No 1083. Springer, 1995.]]
[16]
Salton, G. and McGill, M.J. Introduction to modern information retrieval, McGraw-Hill Book Company, 1983.]]
[17]
H.P. Luhn. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, Vol. 2, No. 2, pages 159--165, April, 1958.]]
[18]
Paice, C.D. The Automatic Generation of Literary Abstracts: An Approach Based on Identification of Self-Indicating Phrases. In O. R. Norman, S. E. Robertson, C. J. van Rijsbergen, and P. W. Williams, editors, Information Retrieval Research, London: Butterworth, 1981]]
[19]
Schmid, H. Probabilistic Part-of-Speech Tagging Using Decision Trees, In Proceedings of the International Conference on New Methods in Language Processing, 1994.]]
[20]
Spark-Jones, K. What might be in a summary? Information Retrieval 93: Von der Modellierung zur Anwendung, pages 9--26, 1993]]
[21]
Mani I. and Bloedorn E. Machine Learning of Generic and User-Focused Summarization. In Proceedings of the Fifteenth National Conference on AI, pages 821--826, 1998.]]
[22]
Amini, M.-R. Interactive Learning for Text Summarization. In Proceedings of PKDD'2000/MLTIA'2000 Workshop on Machine Learning and Textual Information Access, pages 44--52, 2000.]]
[23]
Ono, K., Sumita, K., Miike, S. Abstract Generation based on Rhetorical Structure Extraction. In Proceedings of the 15th International Conference on Computational Linguistics. COLING'94, Vol. 1, pp. 344--348, 1994.]]
[24]
Saggion, H. Automatic text summarization: past, present, and future. Tutorial of the 9th Ibero-American Conference on Artificial Intelligence. Mexico. 2004.]]
[25]
Page, L. and Brin, S. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford Digital Library Technologies Project, 1998.]]
[26]
Kraft, R. and Zien, J. Mining Anchor Text for Query Refinement. Proceedings of the Thirteen International World Wide Web Conference, pages 666--674, ACM Press, 2004.]]
[27]
Attardi, G. and Gull, A. Automatic Web Page Categorization by Link and Context Analysis. In Proceedings of THAI-99, European Symposium on Telematics, Hypermedia and Artificial Intelligence, pages 105--119, 1999.]]
[28]
El-Beltagy, S.R. and Hall, W. and Roure, D. and Carr, L. Linking in Context. In Proceedings of the Twelfth ACM Conference on Hypertext and Hypermedia, pages 151--160, ACM Press, 2001.]]
[29]
Zhang, Y. and Zincir-Heywood, N. and Milios, E. World Wide Web Site Summarization, Technical Report, Faculty of Computer Science, Salhousie University, April, 2002.]]
[30]
Marcu, D. A decision-based approach to rhetorical parsing. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 365--372, 1999.]]
[31]
Stokes, N. Carthy, J. Smeaton, A.F. Segmenting Broadcast News Streams using Lexical Chains, In Proceedings of STarting AI Researchers Symposium, pages 145--154, 2002.]]

Cited By

View all

Index Terms

  1. Identifying commented passages of documents using implicit hyperlinks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HYPERTEXT '06: Proceedings of the seventeenth conference on Hypertext and hypermedia
    August 2006
    178 pages
    ISBN:1595934170
    DOI:10.1145/1149941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. implicit links
    2. passage extraction
    3. weblogs

    Qualifiers

    • Article

    Conference

    HT06
    Sponsor:
    HT06: 17th Conference on Hypertext and Hypermedia
    August 22 - 25, 2006
    Odense, Denmark

    Acceptance Rates

    Overall Acceptance Rate 378 of 1,158 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
    • (2017)Intra-relation or inter-relation?Expert Systems with Applications: An International Journal10.1016/j.eswa.2017.01.02376:C(71-84)Online publication date: 15-Jun-2017
    • (2012)Information Retrieval in the CommentsphereACM Transactions on Intelligent Systems and Technology10.1145/2337542.23375533:4(1-21)Online publication date: 1-Sep-2012
    • (2012)Lexicon-based Comments-oriented News Sentiment Analyzer systemExpert Systems with Applications: An International Journal10.1016/j.eswa.2012.02.05739:10(9166-9180)Online publication date: 1-Aug-2012
    • (2010)Using Skip Lists for Managing Replying Comments Posted on Internet Discussion BoardsThe Journal of the Korea Contents Association10.5392/JKCA.2010.10.8.03810:8(38-50)Online publication date: 28-Aug-2010
    • (2010)Semantic tagging and classification of blogs2010 International Conference on Computer and Communication Technology (ICCCT)10.1109/ICCCT.2010.5640490(455-459)Online publication date: Sep-2010
    • (2009)Measuring the descriptiveness of web commentsProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1572097(724-725)Online publication date: 19-Jul-2009
    • (2009)Incremental Personalised Summarisation with Novelty DetectionProceedings of the 8th International Conference on Flexible Query Answering Systems10.1007/978-3-642-04957-6_55(641-652)Online publication date: 3-Nov-2009
    • (2008)Comments-oriented document summarizationProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1390334.1390385(291-298)Online publication date: 20-Jul-2008
    • (2008)A New Approach to Blog Post Summarization Using Fast FeaturesProceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 0210.1109/FSKD.2008.268(8-13)Online publication date: 18-Oct-2008
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media