ACM Home Page
Please provide us with feedback. Feedback
Asymptotic Bayesian generalization error when training and test distributions are different
Full text PdfPdf (358 KB)
Source ACM International Conference Proceeding Series; Vol. 227 archive
Proceedings of the 24th international conference on Machine learning table of contents
Corvalis, Oregon
Pages: 1079 - 1086  
Year of Publication: 2007
ISBN:978-1-59593-793-3
Authors
Keisuke Yamazaki  Tokyo Institute of Technology, Midori-ku, Yokohama, Japan
Motoaki Kawanabe  Fraunhofer FIRST, IDA, Berlin, Germany
Sumio Watanabe  Tokyo Institute of Technology
Masashi Sugiyama  Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
Klaus-Robert Müller  Technical University of Berlin, Berlin, Germany
Sponsor
: Machine Learning Journal
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 25,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273496.1273632
What is a DOI?

ABSTRACT

In supervised learning, we commonly assume that training and test data are sampled from the same distribution. However, this assumption can be violated in practice and then standard machine learning techniques perform poorly. This paper focuses on revealing and improving the performance of Bayesian estimation when the training and test distributions are different. We formally analyze the asymptotic Bayesian generalization error and establish its upper bound under a very general setting. Our important finding is that lower order terms---which can be ignored in the absence of the distribution change---play an important role under the distribution change. We also propose a novel variant of stochastic complexity which can be used for choosing an appropriate model and hyper-parameters under a particular distribution change.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19, 716--723.
 
2
 
3
Fedorov, V. V. (1972). Theory of optimal experiments. New York: Academic Press.
 
4
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153--162.
 
5
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., & Schöölkopf, B. (2007). Correcting sample selection bias by unlabeled data. In B. Schölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19. Cambridge, MA: MIT Press.
 
6
Kanamori, T., & Shimodaira, H. (2003). Active learning algorithm using the maximum weighted log-likelihood estimator. Journal of Statistical Planning and Inference, 116, 149--162.
 
7
 
8
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14, 1080--1100.
 
9
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge, MA: MIT Press.
 
10
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6 (2), 461--464.
 
11
 
12
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227--244.
 
13
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36, 111--147.
 
14
 
15
Sugiyama, M., Krauledat, M., & Müüller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8.
 
16
Sugiyama, M., & Müüller, K.-R. (2005). Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, 23, 249--279.
 
17
 
18
 
19
Watanabe, S. (2001b). Algebraic information geometry for learning machines with singularities. Advances in Neural Information Processing Systems, 14, 329--336.
 
20
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1--25.
 
21
Wiens, D. P. (2000). Robust weights and designs for biased regression models: Least squares and generalized M-estimation. Journal of Statistical Planning and Inference, 83, 395--412.
 
22
Wolpaw, J. R., Birbaumer, N., McFarland, D. J., Pfurtscheller, G., & Vaughan, T. M. (2002). Braincomputer interfaces for communication and control. Clinical Neurophysiology, 113, 767--791.

Collaborative Colleagues:
Keisuke Yamazaki: colleagues
Motoaki Kawanabe: colleagues
Sumio Watanabe: colleagues
Masashi Sugiyama: colleagues
Klaus-Robert Müller: colleagues