| Asymptotic Bayesian generalization error when training and test distributions are different |
| Full text |
Pdf
(358 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 227
archive
Proceedings of the 24th international conference on Machine learning
table of contents
Corvalis, Oregon
Pages: 1079 - 1086
Year of Publication: 2007
ISBN:978-1-59593-793-3
|
|
Authors
|
|
Keisuke Yamazaki
|
Tokyo Institute of Technology, Midori-ku, Yokohama, Japan
|
|
Motoaki Kawanabe
|
Fraunhofer FIRST, IDA, Berlin, Germany
|
|
Sumio Watanabe
|
Tokyo Institute of Technology
|
|
Masashi Sugiyama
|
Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
|
|
Klaus-Robert Müller
|
Technical University of Berlin, Berlin, Germany
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 25, Citation Count: 1
|
|
|
ABSTRACT
In supervised learning, we commonly assume that training and test data are sampled from the same distribution. However, this assumption can be violated in practice and then standard machine learning techniques perform poorly. This paper focuses on revealing and improving the performance of Bayesian estimation when the training and test distributions are different. We formally analyze the asymptotic Bayesian generalization error and establish its upper bound under a very general setting. Our important finding is that lower order terms---which can be ignored in the absence of the distribution change---play an important role under the distribution change. We also propose a novel variant of stochastic complexity which can be used for choosing an appropriate model and hyper-parameters under a particular distribution change.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19, 716--723.
|
| |
2
|
|
| |
3
|
Fedorov, V. V. (1972). Theory of optimal experiments. New York: Academic Press.
|
| |
4
|
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153--162.
|
| |
5
|
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., & Schöölkopf, B. (2007). Correcting sample selection bias by unlabeled data. In B. Schölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19. Cambridge, MA: MIT Press.
|
| |
6
|
Kanamori, T., & Shimodaira, H. (2003). Active learning algorithm using the maximum weighted log-likelihood estimator. Journal of Statistical Planning and Inference, 116, 149--162.
|
| |
7
|
|
| |
8
|
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14, 1080--1100.
|
| |
9
|
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge, MA: MIT Press.
|
| |
10
|
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6 (2), 461--464.
|
| |
11
|
|
| |
12
|
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227--244.
|
| |
13
|
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36, 111--147.
|
| |
14
|
|
| |
15
|
Sugiyama, M., Krauledat, M., & Müüller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8.
|
| |
16
|
Sugiyama, M., & Müüller, K.-R. (2005). Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, 23, 249--279.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Watanabe, S. (2001b). Algebraic information geometry for learning machines with singularities. Advances in Neural Information Processing Systems, 14, 329--336.
|
| |
20
|
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1--25.
|
| |
21
|
Wiens, D. P. (2000). Robust weights and designs for biased regression models: Least squares and generalized M-estimation. Journal of Statistical Planning and Inference, 83, 395--412.
|
| |
22
|
Wolpaw, J. R., Birbaumer, N., McFarland, D. J., Pfurtscheller, G., & Vaughan, T. M. (2002). Braincomputer interfaces for communication and control. Clinical Neurophysiology, 113, 767--791.
|
|