ACM Home Page
Please provide us with feedback. Feedback
Sequential cost-sensitive decision making with reinforcement learning
Full text PdfPdf (1.06 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Learning methods table of contents
Pages: 259 - 268  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Edwin Pednault  IBM T. J. Watson Res. Ctr., Yorktown Hieghts, NY
Naoki Abe  IBM T. J. Watson Res. Ctr., Yorktown Hieghts, NY
Bianca Zadrozny  University of Calif., San Diego, La Jolla, CA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 60,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775086
What is a DOI?

ABSTRACT

Recently, there has been increasing interest in the issues of cost-sensitive learning and decision making in a variety of applications of data mining. A number of approaches have been developed that are effective at optimizing cost-sensitive decisions when each decision is considered in isolation. However, the issue of sequential decision making, with the goal of maximizing total benefits accrued over a period of time instead of immediate benefits, has rarely been addressed. In the present paper, we propose a novel approach to sequential decision making based on the reinforcement learning framework. Our approach attempts to learn decision rules that optimize a sequence of cost-sensitive decisions so as to maximize the total benefits accrued over time. We use the domain of targeted' marketing as a testbed for empirical evaluation of the proposed method. We conducted experiments using approximately two years of monthly promotion data derived from the well-known KDD Cup 1998 donation data set. The experimental results show that the proposed method for optimizing total accrued benefits out performs the usual targeted-marketing methodology of optimizing each promotion in isolation. We also analyze the behavior of the targeting rules that were obtained and discuss their appropriateness to the application domain.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, Irvine, 2000. http://kdd.ics.uci.edu/.
 
3
4
 
5
C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Aug. 2001.
 
6
 
7
L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 1996.
 
8
 
9
R. Natarajan and E. Pednault. Segmented regression estimators for massive data sets. In Second SIAM International Conference on Data Mining, Arlington, Virginia, 2002. to appear.
 
10
G. A. Rummery and M. Niranjan. On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Departement, 1994. Ph.D. thesis.
 
11
 
12
J. N. Tsitsiklis and B. V. Roy. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674--690, 1997.
 
13
P. Turney. Cost-sensitive learning bibliography. Institute for Information Technology, National Research Council, Ottawa, Canada, 2000. http://extractor.iit.nrc.ca/bibliographies/cost-sensitive.html.
 
14
X. Wang and T. Dietterich. Efficient value function approximation using regression trees. In Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization, 1999.
 
15
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, 1989.
 
16
17

CITED BY  7
 
 
 
 

Collaborative Colleagues:
Edwin Pednault: colleagues
Naoki Abe: colleagues
Bianca Zadrozny: colleagues

Peer to Peer - Readers of this Article have also read: