skip to main content
10.1145/1772690.1772740acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Topic initiator detection on the world wide web

Published: 26 April 2010 Publication History

Abstract

In this paper we introduce a new Web mining and search technique - Topic Initiator Detection (TID) on the Web. Given a topic query on the Internet and the resulting collection of time-stamped web documents which contain the query keywords, the task of TID is to automatically return which web document (or its author) initiated the topic or was the first to discuss about the topic.
To deal with the TID problem, we design a system framework and propose algorithm InitRank (Initiator Ranking) to rank the web documents by their possibility to be the topic initiator. We first extract features from the web documents and design several topic initiator indicators. Then, we propose a TCL graph which integrates the Time, Content and Link information and design an optimization framework over the graph to compute InitRank. Experiments show that compared with baseline methods, such as direct time sorting, well-known link based ranking algorithms PageRank and HITS, InitRank achieves the best overall performance with high effectiveness and robustness. In case studies, we successfully detected (1) the first web document related to a famous rumor of an Australia product banned in USA and (2) the pre-release of IBM and Google Cloud Computing collaboration before the official announcement.

References

[1]
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[2]
N. Australia. There's no accounting for taste. http://www.news.com.au/story/0,23599,20623973-2,00.html.
[3]
N. Australia. Us denies imposing ban on aussie vegemite. http://www.news.com.au/heraldsun/story/0,21985,20641682-663,00.html.
[4]
F. C. Lexical analysis and stoplists. In Information Retrieval: Data Structures and Algorithms, pages 102--130, Englewood Cliffs, New Jersey, 1992. Prentice Hall.
[5]
S. Chakrabarti, B. Dom, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. In 7th World Wide Web Conference (WWW'97), pages 65--74, 1997.
[6]
I. De and A. Kontostathis. Experiments in first story detection. In Proceedings of the 2005 National Conference on Undergraduate Research (NCUR), 2005.
[7]
Filmfreke. http://filmfreke.livejournal.com/187583.html (2006-01-05 03:13:00) (Accessed 2008).
[8]
T. Frank. Latest nanny state ban: Vegemite. http://overlawyered.com/2006/10/latest-nanny-state-ban-vegemite/ (Accessed 2008).
[9]
D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report Computer Science Technical Report TR-CS-96-08, Australian National University, 1996.
[10]
K. Healey. Us bans vegemite. http://www.news.com.au/couriermail/story/0,23739,20620744-953,00.html.
[11]
IBM. Google and ibm announce university initiative to address internet-scale computing challenges. http://www-03.ibm.com/press/us/en/pressrelease/22414.wss (Accessed 2008).
[12]
R. Kosala and H. Blockeel. Web mining research: A survey. ACM SIGKDD EXPLORATIONS, 2000.
[13]
S. Lohr. Google and ibm join in 'cloud computing research'. http://www.iht.com/articles/2007/10/07/business/cloud.php (Accessed 2008).
[14]
D. Needle. Dell targets cloud computing. http://www.internetnews.com/ent-news/article.php/3668201 (March 27, 2007) (Accessed 2008).
[15]
Neil. Vegemite ban or cheap shot at the us? http://melbourne.metblogs.com/2006/10/24/vegemite-ban-or-cheap-shot-at-the-us/ (Accessed 2008).
[16]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University, 1999.
[17]
B. Peek. Vegemite banned in the usa? http://benpeek.livejournal.com/481233.html (Accessed 2008).
[18]
Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[19]
M. Porter. Porterstemmer. http://www.tartarus.org/~martin/PorterStemmer (Accessed 2008).
[20]
S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of ACM SIGIR'94 Conference, pages 232--241, 1994.
[21]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the ACM SIGIR96 Conference, pages 21--29, 1996.
[22]
Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pages 28--36, New York, NY, 1998. ACM.
[23]
Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pages 688--693, New York, NY, 2002. ACM.

Cited By

View all
  • (2023)Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00058(682-695)Online publication date: Apr-2023
  • (2020)Crowdsourcing Based Description of Urban Emergency Events Using Social Media Big DataIEEE Transactions on Cloud Computing10.1109/TCC.2016.25176388:2(387-397)Online publication date: 1-Apr-2020
  • (2020)Web event evolution trend prediction based on its computational social contextWorld Wide Web10.1007/s11280-019-00753-2Online publication date: 14-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. ranking
  3. topic initiator
  4. web mining

Qualifiers

  • Research-article

Conference

WWW '10
WWW '10: The 19th International World Wide Web Conference
April 26 - 30, 2010
North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00058(682-695)Online publication date: Apr-2023
  • (2020)Crowdsourcing Based Description of Urban Emergency Events Using Social Media Big DataIEEE Transactions on Cloud Computing10.1109/TCC.2016.25176388:2(387-397)Online publication date: 1-Apr-2020
  • (2020)Web event evolution trend prediction based on its computational social contextWorld Wide Web10.1007/s11280-019-00753-2Online publication date: 14-Mar-2020
  • (2019)MPURank: A Social Hotspot Tracking Scheme Based on Tripartite Graph and Multimessages Iterative DrivenIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29224316:4(715-725)Online publication date: Aug-2019
  • (2018)From Latency, Through Outbreak, to Decline: Detecting Different States of Emergency Events Using Web ResourcesIEEE Transactions on Big Data10.1109/TBDATA.2016.25999354:2(245-257)Online publication date: 1-Jun-2018
  • (2018)Role DiscoveryEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_288(2260-2270)Online publication date: 12-Jun-2018
  • (2017)Crowdsourcing based social media data analysis of urban emergency eventsMultimedia Tools and Applications10.1007/s11042-015-2731-176:9(11567-11584)Online publication date: 1-May-2017
  • (2017)Role DiscoveryEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_288-1(1-11)Online publication date: 2-Jan-2017
  • (2016)Outbreak power measurement for evolution course of web eventsJournal of Web Engineering10.5555/3177210.317721315:3-4(226-248)Online publication date: 1-Jul-2016
  • (2016)Measuring the Semantic Uncertainty of News Events for Evolution Potential EstimationACM Transactions on Information Systems10.1145/290371934:4(1-25)Online publication date: 9-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EPUB

View this article in ePub.

ePub

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media