|
ABSTRACT
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each potential keyword, inverse document frequency, presence in meta-data, and how often the term occurs in search query logs. The system is trained with a set of example pages that have been hand-labeled with "relevant" keywords. Based on this training, it can then extract new keywords from previously unseen pages. Accuracy is substantially better than several baseline systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
S. F. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, CMU, 1999.
|
| |
5
|
|
| |
6
|
Y. Even-Zohar and D. Roth. A sequential model for multi class classification. In EMNLP-01, 2001.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
J. Goodman and V. R. Carvalho. Implicit queries for email. In CEAS-05, 2005.
|
 |
11
|
|
| |
12
|
|
| |
13
|
D. Kelleher and S. Luz. Automatic hypertext keyphrase detection. In IJCAI-05, 2005.
|
| |
14
|
T. Mitchell. Tutorial on machine learning over natural language documents, 1997. Available from tt http://www.cs.cmu.edu/{0}~tom/{0}text-learning.ps
|
| |
15
|
V. Punyakanok and D. Roth. The use of classifiers in sequential inference. In NIPS-00, 2001.
|
| |
16
|
|
| |
17
|
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), February 1989.
|
 |
18
|
|
| |
19
|
D. Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In IJCAI-01, pages 1257--1263, 2001.
|
| |
20
|
C. Sutton and A. McCallum. Composition of conditional random fields for transfer learning. In Proceedings of HLT/EMLNLP-05, 2005.
|
| |
21
|
|
| |
22
|
|
| |
23
|
P. D. Turney. Coherent keyphrase extraction via web mining. In Proc. of IJCAI-03, pages 434--439, 2003.
|
CITED BY 19
|
Xin Jin , Ying Li , Teresa Mah , Jie Tong, Sensitive webpage classification for content advertising, Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, p.28-33, August 12-12, 2007, San Jose, California
|
|
|
|
Hua Li , Duo Zhang , Jian Hu , Hua-Jun Zeng , Zheng Chen, Finding keyword from online broadcasting content for targeted advertising, Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, p.55-62, August 12-12, 2007, San Jose, California
|
|
|
|
Dou Shen , Toby Walkery , Zijian Zhengy , Qiang Yangz , Ying Li, Personal name classification in web queries, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
Aris Anagnostopoulos , Andrei Z. Broder , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Just-in-time contextual advertising, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
Bingjun Sun , Qingzhao Tan , Prasenjit Mitra , C. Lee Giles, Extraction and search of chemical formulae in text documents on the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoxun Zhang , Xueying Wang , Honglei Guo , Zhili Guo , Xian Wu , Zhong Su, Floatcascade learning for fast imbalanced web mining, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|