skip to main content
10.1145/1835449.1835696acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
extended-abstract

Investigation on smoothing and aggregation methods in blog retrieval

Published: 19 July 2010 Publication History

Abstract

Recently, user generated data is growing rapidly and becoming one of the most important source of information in the web. Blogosphere (the collection of blogs on the web) is one of the main source of information in this category. In my work for my PhD, I mainly focussed on the blog distillation task which is: given a user query find the blogs that are most related to the query topic [3].
There are some properties of blogs that make blog analysis different from usual text analysis. One of these properties is related to the time stamp assigned to each post; it is possible that the topics of a blog change over the time and this can affect blog relevance to the query. Also each post in a blog can have viewer generated comments that can change the relevance of the blog to the query if these are considered as part of the content of the blog. Another property is related to the meaning of the links between blogs which are different than links between websites. Finally, blog distillation is different from traditional ad-hoc search since the retrieval unit is a blog (a collection of posts), instead of a single document. With this view, blog distillation is similar to the task of resource selection in federated search [1].
Researchers have applied different methods from similar problems to blog distillation like ad-hoc search methods, expert search algorithms or methods from resource selection in distributed information retrieval.
Based on our preliminary experiments, I decided to divide the blog distillation problem into two sub-problems. First of all, I want to use mentioned properties of blogs to retrieve the most relevant posts for a given query. This part is very similar to the ad hoc retrieval. After that, I want to aggregate relevance of posts in each blog and calculate relevance of the blog. This part requires the development of a cross-modal aggregation model that combines the different blog relevance clues found in the blogosphere.
We use structure based smoothing methods for improving posts retrieval. The idea behind these smoothing methods is to change the score of a document based on the score of its similar or related documents. We model the blogosphere as a single graph that represents relations between posts and terms [2]. The idea is that in accordance with the Clustering Hypothesis, related documents should have similar scores for the same query. To model the relatedness between posts, we define a new measure which takes into account both content similarity and temporal distance.
In more recent work, in the aggregation part of the problem, we model each post as evidence about relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging operators to combine the evidence. The ordered weighted averaging operator, commonly called OWA operator, was introduced by Yager [4]. OWA provides a parametrized class of mean type aggregation operators, that can generate OR operator (Max), AND operator (Min) and any other aggregation operator between them.
For the next steps, I'm thinking about capturing the temporal properties of the blogs. Bloggers can change their interests over the time or write about different topics periodically. Capturing these changes and using them in the retrieval is one the future woks that I'm interested in. Also, studying the relations between blogs and news and their effect on each other is an interesting problem.

References

[1]
J. L. Elsas, J. Arguello, J. Callan, and J. G. Carbonell. Retrieval and feedback models for blog feed search. In SIGIR, pages 347--354, 2008.
[2]
M. Keikha, M. J. Carman, and F. Crestani. Blog distillation using random walks. In SIGIR, pages 638--639, 2009.
[3]
I. Ounis, M. De Rijke, C. Macdonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 blog track. In Proceedings of TREC, pages 15--27, 2006.
[4]
R. Yager. On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern., 18(1):183--190, 1988.

Index Terms

  1. Investigation on smoothing and aggregation methods in blog retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Check for updates

    Author Tags

    1. blog search
    2. temporal analysis
    3. user generated data

    Qualifiers

    • Extended-abstract

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 261
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media