| Experience-efficient learning in associative bandit problems |
| Full text |
Pdf
(201 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 889 - 896
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 22, Citation Count: 0
|
|
|
ABSTRACT
We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abe, N., Biermann, A. W., & Long, P. M. (2003). Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37, 263--293.
|
| |
2
|
|
| |
3
|
Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments. London, UK: Chapman and Hall.
|
| |
4
|
Elkan, C. (2001). The foundations of cost-sensitive learning. IJCAI (pp. 973--978).
|
| |
5
|
Fiechter, C.-N. (1995). PAC associative reinforcement learning. Unpublished manuscript.
|
| |
6
|
|
| |
7
|
Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).
|
| |
8
|
|
| |
9
|
|
| |
10
|
Langford, J., & Zadrozny, B. (2005). Estimating class membership probabilities using classifier learners. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 198--205).
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
|