| Exploring in the weblog space by detecting informative and affective articles |
| Full text |
Pdf
(474 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 16th international conference on World Wide Web
table of contents
Banff, Alberta, Canada
SESSION: Industrial practice & experience
table of contents
Pages: 281 - 290
Year of Publication: 2007
ISBN:978-1-59593-654-7
|
|
Authors
|
|
Xiaochuan Ni
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Gui-Rong Xue
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Xiao Ling
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Yong Yu
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Qiang Yang
|
Hong Kong University of Science and Technology, Hong Kong
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 31, Downloads (12 Months): 357, Citation Count: 1
|
|
|
ABSTRACT
Weblogs have become a prevalent source of information for people to express themselves. In general, there are two genres of contents in weblogs. The first kind is about the webloggers' personal feelings, thoughts or emotions. We call this kind of weblogs affective articles. The second kind of weblogs is about technologies and different kinds of informative news. In this paper, we present a machine learning method for classifying informative and affective articles among weblogs. We consider this problem as a binary classification problem. By using machine learning approaches, we achieve about 92% on information retrieval performance measures including precision, recall and F1. We set up three studies on the applications of above classification approach in both research and industrial fields. The above classification approach is used to improve the performance of classification of emotions from weblog articles. We also develop an intent-driven weblog-search engine based on the classification techniques to improve the satisfaction of Web users. Finally, our approach is applied to search for weblogs with a great deal of informative articles.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
K. T. Durant and M. D. Smith. Mining Sentiment Classification from Political Web Logs. In Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WebKDD-2006). August, 2006.
|
| |
4
|
N. Glance, M. Hurst, and T. Tornkiyo. Blogpulse: Automated Trend Discovery for Weblogs. In Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.
|
 |
5
|
Daniel Gruhl , R. Guha , David Liben-Nowell , Andrew Tomkins, Information diffusion through blogspace, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988739]
|
| |
6
|
M. Hodder. Live Web Search. http://www2.sims.berkeley.edu/courses/is141/f05/schedule.html
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
J.D. Lasica, Weblogs: A New Source of Information. In We've got blog: How weblogs are changing our culture, John Rodzvilla (ed). Perseus Publishing, Cambridge, MA, 2002. Also http://www.ojr.org/ojr/lasica/p1019165278.php
|
 |
17
|
|
| |
18
|
A. Mccallum and K. Nigam, A Comparison of Event Models for Naive Byaes Text Classification", In Proceedings of AAAI-98 Workshop on "Learning for Text Categorization", pages 41--48, 1998.
|
 |
19
|
|
| |
20
|
G. Mishne. Experiments with Mood Classification in Blog Posts. In Style 2005- 1st Workshop on Stylistic Analysis of Text for Information Access, at SIGIR 2005, 2005.
|
| |
21
|
Pew Internet and the American Life Project. http://www.pewinternet.org/PPF/r/186/report_display.asp
|
| |
22
|
Pew Internet and the American Life Project. 2005. http://www.pewinternet.org/trends/Internet_Activities_12.05.05.htm.
|
| |
23
|
Pew Internet and the American Life Project. 2006. http://www.pewinternet.org/trends/Internet_Activities_7.19.06.htm
|
| |
24
|
J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, Tackling the Poor Assumption of Naive Bayes Text Classifiers, In Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington DC, USA, 2003.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
V. Vapnik, Principles of Risk Minimization for Learning Theory, In D.S. Lippman, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processding Systems, Morgan Kaufmann, pages 831--838, 1992.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
CITED BY
|
Xiao Ling , Gui-Rong Xue , Wenyuan Dai , Yun Jiang , Qiang Yang , Yong Yu, Can chinese web pages be classified with english data source?, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|