ABSTRACT
Variational Autoencoders (VAE) learn a latent representation of image data that allows natural image generation and manipulation. However, they struggle to generate sharp images. To address this problem, we propose a hierarchy of VAEs analogous to a Laplacian pyramid. Each network models a single pyramid level, and is conditioned on the coarser levels. The Laplacian architecture allows for novel image editing applications that take advantage of the coarse to fine structure of the model. Our method achieves lower reconstruction error in terms of MSE, which is the loss function of the VAE and is not directly minimised in our model. Furthermore, the reconstructions generated by the proposed model are preferred over those from the VAE by human evaluators.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). Software available from tensorflow.org.Google Scholar
- Andrew Brock, Theodore Lim, JM Ritchie, and Nick Weston. 2016. Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093 (2016).Google Scholar
- Garrison W Cottrell, Paul Munro, and David Zipser. 1987. Learning internal representations from gray-scale images: An example of extensional programming. In Conference of the Cognitive Science Society.Google Scholar
- Emily L Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In Advances in Neural Information Processing Systems. 1486--1494. Google ScholarDigital Library
- Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014 (2014), 5.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarDigital Library
- Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, and Aaron Courville. 2017. PixelVAE: A Latent Variable Model for Natural Images. In International Conference on Learning Representations (ICLR).Google Scholar
- Matthew D Hoffman and Matthew J Johnson. 2016. ELBO surgery: yet another way to carve up the variational evidence lower bound. In NIPS 2016 Workshop on Advances in Approximate Bayesian Inference.Google Scholar
- Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. International Conference on Learning Representations (2014).Google Scholar
- Alexander Kolesnikov and Christoph H Lampert. 2016. Deep Probabilistic Modeling of Natural Images using a Pyramid Decomposition. arXiv preprint arXiv:1612.08185 (2016).Google Scholar
- A. Lamb, V. Dumoulin, and A. Courville. 2016. Discriminative Regularization for Generative Models. arXiv preprint arXiv:1612.03220 (2016).Google Scholar
- Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. JMLR, 1558--1566. Google ScholarDigital Library
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV). Google ScholarDigital Library
- Aaron Van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel Recurrent Neural Networks. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. 1747--1756. Google ScholarDigital Library
- Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson Image Editing. ACM Trans. Graph. 22, 3 (July 2003), 313--318. Google ScholarDigital Library
- Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR).Google Scholar
- Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning What and Where to Draw. In NIPS. Google ScholarDigital Library
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems 29. 2234--2242. Google ScholarDigital Library
- Paul Upchurch, Jacob Gardner, Kavita Bala, Robert Pless, Noah Snavely, and Kilian Weinberger. 2016. Deep Feature Interpolation for Image Content Changes. arXiv preprint arXiv:1611.05507 (2016).Google Scholar
- Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2Image: Conditional Image Generation from Visual Attributes. Proceedings of European Conference on Computer Vision (ECCV) (2016).Google ScholarCross Ref
- Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2016. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv preprint arXiv:1612.03242 (2016).Google Scholar
- Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In Proceedings of European Conference on Computer Vision (ECCV).Google Scholar
Index Terms
- Laplacian Pyramid of Conditional Variational Autoencoders
Recommendations
Variational Diffusion Autoencoders with Random Walk Sampling
Computer Vision – ECCV 2020AbstractVariational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the ...
Learning conditional variational autoencoders with missing covariates
AbstractConditional variational autoencoders (CVAEs) are versatile deep latent variable models that extend the standard VAE framework by conditioning the generative model with auxiliary covariates. The original CVAE model assumes that the data samples ...
Graphical abstractDisplay Omitted
Highlights- An improved learning method for conditional VAEs and Gaussian process prior VAEs.
- The method is designed for non-temporal, temporal, and longitudinal data.
- Used an amortised variational distribution for learning missing auxiliary ...
Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures
LVA/ICA 2015: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation - Volume 9237This work aims at a test-time fine-tune scheme to further improve the performance of an already-trained Denoising AutoEncoder DAE in the context of semi-supervised audio source separation. Although the state-of-the-art deep learning-based DAEs show ...
Comments