ACM Home Page
Please provide us with feedback. Feedback
Experience-efficient learning in associative bandit problems
Full text PdfPdf (201 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 889 - 896  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Alexander L. Strehl  Rutgers University, Piscataway, NJ
Chris Mesterharm  Rutgers University, Piscataway, NJ
Michael L. Littman  Rutgers University, Piscataway, NJ
Haym Hirsh  Rutgers University, Piscataway, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 22,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143956
What is a DOI?

ABSTRACT

We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abe, N., Biermann, A. W., & Long, P. M. (2003). Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37, 263--293.
 
2
 
3
Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments. London, UK: Chapman and Hall.
 
4
Elkan, C. (2001). The foundations of cost-sensitive learning. IJCAI (pp. 973--978).
 
5
Fiechter, C.-N. (1995). PAC associative reinforcement learning. Unpublished manuscript.
 
6
 
7
Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).
 
8
 
9
 
10
Langford, J., & Zadrozny, B. (2005). Estimating class membership probabilities using classifier learners. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 198--205).
 
11
12
 
13

Collaborative Colleagues:
Alexander L. Strehl: colleagues
Chris Mesterharm: colleagues
Michael L. Littman: colleagues
Haym Hirsh: colleagues