ACM Home Page
Please provide us with feedback. Feedback
Finding advertising keywords on web pages
Full text PdfPdf (194 KB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
SESSION: Mining the web table of contents
Pages: 213 - 222  
Year of Publication: 2006
ISBN:1-59593-323-9
Authors
Wen-tau Yih  Microsoft Research, Redmond, WA
Joshua Goodman  Microsoft Research, Redmond, WA
Vitor R. Carvalho  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 260,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135813
What is a DOI?

ABSTRACT

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each potential keyword, inverse document frequency, presence in meta-data, and how often the term occurs in search query logs. The system is trained with a set of example pages that have been hand-labeled with "relevant" keywords. Based on this training, it can then extract new keywords from previously unseen pages. Accuracy is substantially better than several baseline systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
S. F. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, CMU, 1999.
 
5
 
6
Y. Even-Zohar and D. Roth. A sequential model for multi class classification. In EMNLP-01, 2001.
 
7
 
8
 
9
 
10
J. Goodman and V. R. Carvalho. Implicit queries for email. In CEAS-05, 2005.
11
 
12
 
13
D. Kelleher and S. Luz. Automatic hypertext keyphrase detection. In IJCAI-05, 2005.
 
14
T. Mitchell. Tutorial on machine learning over natural language documents, 1997. Available from tt http://www.cs.cmu.edu/{0}~tom/{0}text-learning.ps
 
15
V. Punyakanok and D. Roth. The use of classifiers in sequential inference. In NIPS-00, 2001.
 
16
 
17
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), February 1989.
18
 
19
D. Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In IJCAI-01, pages 1257--1263, 2001.
 
20
C. Sutton and A. McCallum. Composition of conditional random fields for transfer learning. In Proceedings of HLT/EMLNLP-05, 2005.
 
21
 
22
 
23
P. D. Turney. Coherent keyphrase extraction via web mining. In Proc. of IJCAI-03, pages 434--439, 2003.

CITED BY  19
 

Collaborative Colleagues:
Wen-tau Yih: colleagues
Joshua Goodman: colleagues
Vitor R. Carvalho: colleagues