skip to main content
10.1145/1835449.1835516acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Mining the blogosphere for top news stories identification

Published: 19 July 2010 Publication History

Abstract

The analysis of query logs from blog search engines show that news-related queries occupy a significant portion of the logs. This raises a interesting research question on whether the blogosphere can be used to identify important news stories. In this paper, we present novel approaches to identify important news story headlines from the blogosphere for a given day. The proposed system consists of two components based on the language model framework, the query likelihood and the news headline prior. For the query likelihood, we propose several approaches to estimate the query language model and the news headline language model. We also suggest several criteria to evaluate the news headline prior that is the prior belief about the importance or newsworthiness of the news headline for a given day. Experimental results show that our system significantly outperforms a baseline system. Specifically, the proposed approach gives 2.62% and 10.19% further increases in MAP and P@5 over the best performing result of the TREC'09 Top Stories Identification Task.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of WSDM 2009, pages 5--14. ACM, 2009.
[2]
J. Allan, M. E. Connell, W. B. Croft, F.-F. Feng, D. Fisher, and X. Li. Inquery and trec-9. In Proceedings of TREC-9, pages 551--562, 2000.
[3]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of SIGIR 1998, pages 37--45. ACM, 1998.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[5]
T. Brants, F. Chen, and A. Farahat. A system for new event detection. In Proceedings of SIGIR 2003, pages 330--337. ACM, 2003.
[6]
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336. ACM, 1998.
[7]
C. C. Chen, Y.-T. Chen, Y. Sun, and M. C. Chen. Life cycle modeling of news events using aging theory. In Proceedings of ECML 2003, pages 47--59, 2003.
[8]
K.-Y. Chen, L. Luesukprasert, and S.-c. T. Chou. Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. on Knowl. and Data Eng., 19(8):1016--1025, 2007.
[9]
H. L. Chieu and Y. K. Lee. Query based event extraction along a timeline. In Proceedings of SIGIR 2004, pages 425--432. ACM, 2004.
[10]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR 2008, pages 659--666, New York, NY, USA, 2008. ACM.
[11]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.
[12]
Q. He, K. Chang, and E.-P. Lim. Analyzing feature trajectories for event detection. In Proceedings of SIGIR 2007, pages 207--214. ACM, 2007.
[13]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of SIGIR 1999, pages 50--57. ACM, 1999.
[14]
R. Jones and F. Diaz. Temporal profiles of queries. ACM Trans. Inf. Syst., 25(3):14, 2007.
[15]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of SIGKDD 2002, pages 91--101. ACM, 2002.
[16]
P. Kolari, A. Java, and T. Finin. Characterizing the splogosphere. In Proceedings of 3rd Annl. Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th Word Wide Web Conf., 2006.
[17]
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of SIGIR 2004, pages 297--304. ACM, 2004.
[18]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR 2001, pages 111--119. ACM, 2001.
[19]
Y. Lee, S.-H. Na, and J.-H. Lee. An improved feedback approach using relevant local posts for blog feed retrieval. In Proceeding of CIKM 2009, pages 1971--1974. ACM, 2009.
[20]
Y. Lv and C. Zhai. Positional language models for information retrieval. In Proceedings of SIGIR 2009, pages 299--306. ACM, 2009.
[21]
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the TREC-2009 Blog Track. In Proceedings of TREC 2009, 2010.
[22]
G. Mishne and M. de Rijke. A study of blog search. In Proceedings of ECIR 2006, pages 289--301. Springer, 2006.
[23]
S.-H. Nam, S.-H. Na, Y. Lee, and J.-H. Lee. Diffpost: Filtering non-relevant content based on content difference between two consecutive blog posts. In Proceedings of ECIR 2009, pages 791--795. Springer-Verlag, 2009.
[24]
C. Wang, M. Zhang, L. Ru, and S. Ma. Automatic online news topic ranking using media focus and user attention based on aging theory. In Proceeding of CIKM 2008, pages 1033--1042. ACM, 2008.
[25]
Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In Proceedings of SIGIR 1998, pages 28--36. ACM, 1998.
[26]
Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In Proceedings of SIGKDD 2002, pages 688--693. ACM, 2002.
[27]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.
[28]
K. Zhang, J. Zi, and L. G. Wu. New event detection based on indexing-tree and named entity. In Proceedings of SIGIR 2007, pages 215--222. ACM, 2007.
[29]
Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In Proceedings of SIGIR 2002, pages 81--88. ACM, 2002.

Cited By

View all
  • (2021)Fashion Bloggers: Temperament and CharacteristicsThe Art of Digital Marketing for Fashion and Luxury Brands10.1007/978-3-030-70324-0_4(81-104)Online publication date: 18-Jul-2021
  • (2017)Time sensitive blog retrieval using temporal properties of queriesJournal of Information Science10.1177/016555151561858943:1(103-121)Online publication date: 1-Feb-2017
  • (2017)Modelling to identify influential bloggers in the blogosphereComputers in Human Behavior10.1016/j.chb.2016.11.01268:C(64-82)Online publication date: 1-Mar-2017
  • Show More Cited By

Index Terms

  1. Mining the blogosphere for top news stories identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. blog retrieval
    2. blogosphere
    3. top news stories identification

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Fashion Bloggers: Temperament and CharacteristicsThe Art of Digital Marketing for Fashion and Luxury Brands10.1007/978-3-030-70324-0_4(81-104)Online publication date: 18-Jul-2021
    • (2017)Time sensitive blog retrieval using temporal properties of queriesJournal of Information Science10.1177/016555151561858943:1(103-121)Online publication date: 1-Feb-2017
    • (2017)Modelling to identify influential bloggers in the blogosphereComputers in Human Behavior10.1016/j.chb.2016.11.01268:C(64-82)Online publication date: 1-Mar-2017
    • (2014)Learning to personalize trending image search suggestionProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609569(727-736)Online publication date: 3-Jul-2014
    • (2014)A Graph-Based Bursty Topic Detection Approach in User-Generated TextsProceedings of the 2014 11th Web Information System and Application Conference10.1109/WISA.2014.57(273-278)Online publication date: 12-Sep-2014
    • (2014)On the Influence Propagation of Web VideosIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.14226:8(1961-1973)Online publication date: Aug-2014
    • (2014)Forest monitoring and social media – Complementary data sources for ecosystem surveillance?Forest Ecology and Management10.1016/j.foreco.2013.09.004316(9-20)Online publication date: Mar-2014
    • (2014)Identifying top news stories based on their popularity in the blogosphereInformation Retrieval10.1007/s10791-014-9241-z17:4(326-350)Online publication date: 14-May-2014
    • (2014)Supporting More-Like-This Information Needs: Finding Similar Web Content in Different ScenariosInformation Access Evaluation. Multilinguality, Multimodality, and Interaction10.1007/978-3-319-11382-1_6(50-61)Online publication date: 2014
    • (2014)Measuring the Influence of Bloggers in Their Community Based on the H-index FamilyAdvanced Computational Methods for Knowledge Engineering10.1007/978-3-319-06569-4_23(313-324)Online publication date: 2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media