skip to main content
10.1145/3150165.3150172acmotherconferencesArticle/Chapter ViewAbstractPublication PagescvmpConference Proceedingsconference-collections
research-article

Laplacian Pyramid of Conditional Variational Autoencoders

Published:11 December 2017Publication History

ABSTRACT

Variational Autoencoders (VAE) learn a latent representation of image data that allows natural image generation and manipulation. However, they struggle to generate sharp images. To address this problem, we propose a hierarchy of VAEs analogous to a Laplacian pyramid. Each network models a single pyramid level, and is conditioned on the coarser levels. The Laplacian architecture allows for novel image editing applications that take advantage of the coarse to fine structure of the model. Our method achieves lower reconstruction error in terms of MSE, which is the loss function of the VAE and is not directly minimised in our model. Furthermore, the reconstructions generated by the proposed model are preferred over those from the VAE by human evaluators.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Andrew Brock, Theodore Lim, JM Ritchie, and Nick Weston. 2016. Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093 (2016).Google ScholarGoogle Scholar
  3. Garrison W Cottrell, Paul Munro, and David Zipser. 1987. Learning internal representations from gray-scale images: An example of extensional programming. In Conference of the Cognitive Science Society.Google ScholarGoogle Scholar
  4. Emily L Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In Advances in Neural Information Processing Systems. 1486--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014 (2014), 5.Google ScholarGoogle Scholar
  6. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, and Aaron Courville. 2017. PixelVAE: A Latent Variable Model for Natural Images. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  9. Matthew D Hoffman and Matthew J Johnson. 2016. ELBO surgery: yet another way to carve up the variational evidence lower bound. In NIPS 2016 Workshop on Advances in Approximate Bayesian Inference.Google ScholarGoogle Scholar
  10. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. International Conference on Learning Representations (2014).Google ScholarGoogle Scholar
  11. Alexander Kolesnikov and Christoph H Lampert. 2016. Deep Probabilistic Modeling of Natural Images using a Pyramid Decomposition. arXiv preprint arXiv:1612.08185 (2016).Google ScholarGoogle Scholar
  12. A. Lamb, V. Dumoulin, and A. Courville. 2016. Discriminative Regularization for Generative Models. arXiv preprint arXiv:1612.03220 (2016).Google ScholarGoogle Scholar
  13. Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. JMLR, 1558--1566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aaron Van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel Recurrent Neural Networks. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. 1747--1756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson Image Editing. ACM Trans. Graph. 22, 3 (July 2003), 313--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  18. Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning What and Where to Draw. In NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems 29. 2234--2242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Paul Upchurch, Jacob Gardner, Kavita Bala, Robert Pless, Noah Snavely, and Kilian Weinberger. 2016. Deep Feature Interpolation for Image Content Changes. arXiv preprint arXiv:1611.05507 (2016).Google ScholarGoogle Scholar
  21. Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2Image: Conditional Image Generation from Visual Attributes. Proceedings of European Conference on Computer Vision (ECCV) (2016).Google ScholarGoogle ScholarCross RefCross Ref
  22. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2016. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv preprint arXiv:1612.03242 (2016).Google ScholarGoogle Scholar
  23. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In Proceedings of European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar

Index Terms

  1. Laplacian Pyramid of Conditional Variational Autoencoders

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CVMP '17: Proceedings of the 14th European Conference on Visual Media Production (CVMP 2017)
        December 2017
        93 pages
        ISBN:9781450353298
        DOI:10.1145/3150165

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 December 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        CVMP '17 Paper Acceptance Rate10of16submissions,63%Overall Acceptance Rate40of67submissions,60%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader