skip to main content
10.1145/3123266.3123286acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Fast Deep Matting for Portrait Animation on Mobile Phone

Published: 19 October 2017 Publication History

Abstract

Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait image. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.

References

[1]
Yagız Aksoy, Tuncc Ozan Aydın, and Marc Pollefeys. 2017. Designing Effective Inter-Pixel Information Flow for Natural Image Matting Computer Vision and Pattern Recognition (CVPR), 2017.
[2]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs International Conference on Learning Representations (ICLR), 2015.
[3]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE transactions on pattern analysis and machine intelligence (2017).
[4]
Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 9 (2013), 2175--2188.
[5]
Yingying Chen, Jinqiao Wang, Min Xu, Xiangjian He, and Hanqing Lu. 2016. A unified model sharing framework for moving object detection. Signal Processing Vol. 124 (2016), 72--80.
[6]
Donghyeon Cho, Yu Wing Tai, and Inso Kweon. 2016. Natural Image Matting Using Deep Convolutional Neural Networks European Conference on Computer Vision (ECCV), 2016. 626--643.
[7]
Yung-Yu Chuang, Brian Curless, David H Salesin, and Richard Szeliski. 2001. A bayesian approach to digital matting. In Computer Vision and Pattern Recognition (CVPR), 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. Vol. 2. IEEE, II--II.
[8]
Eduardo SL Gastal and Manuel M Oliveira. 2010. Shared sampling for real-time alpha matting. In Computer Graphics Forum, Vol. Vol. 29. Wiley Online Library, 575--584.
[9]
Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, and Roberto Cipolla. 2015. SceneNet: Understanding Real World Indoor Scenes With Synthetic Data. CoRR Vol. abs/1511.07041 (2015).
[10]
Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Fast matting using large kernel matting laplacian matrices Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2165--2172.
[11]
Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 6 (2013), 1397--1409.
[12]
Peiyun Hu and Deva Ramanan. 2017. Finding Tiny Faces Computer Vision and Pattern Recognition (CVPR), 2017.
[13]
Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2017.
[14]
Simon Jégou, Michal Drozdzal, David Vázquez, Adriana Romero, and Yoshua Bengio. 2017. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In Workshop on Computer Vision in Vehicle Technology CVPR, 2017.
[15]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.
[16]
Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android Proceedings of the 2016 ACM on Multimedia Conference. 1201--1205.
[17]
Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, 2 (2008), 228--242.
[18]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV.
[19]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.
[20]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016).
[21]
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network Computer Vision and Pattern Recognition (CVPR), 2017.
[22]
Shaoyu Qi, Yu-Tseh Chi, Adrian M. Peter, and Jeffrey Ho. 2016. CASAIR: Content and Shape-Aware Image Retargeting and Its Applications. IEEE Transactions on Image Processing Vol. 25 (2016), 2222--2232.
[23]
Hongwei Qin, Junjie Yan, Xiu Li, and Xiaolin Hu. 2016. Joint Training of Cascaded CNN for Face Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 3456--3465.
[24]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In Computer Vision and Pattern Recognition (CVPR), 2017.
[25]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39 (2015), 1137--1149.
[26]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 815--823.
[27]
Xiaoyong Shen, Aaron Hertzmann, Jiaya Jia, Sylvain Paris, Brian Price, Eli Shechtman, and Ian Sachs. 2016 a. Automatic portrait segmentation for image stylization Computer Graphics Forum, Vol. Vol. 35. Wiley Online Library, 93--102.
[28]
Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016 b. Deep Automatic Portrait Matting. In European Conference on Computer Vision. Springer, 92--107.
[29]
Jian Sun, Jiaya Jia, Chi-Keung Tang, and Heung-Yeung Shum. 2004. Poisson matting ACM Transactions on Graphics (ToG), Vol. Vol. 23. ACM, 315--321.
[30]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex A. Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning ICLR 2016 Workshop.
[31]
Jinqiao Wang, Ling-Yu Duan, Qingshan Liu, Hanqing Lu, and Jesse S. Jin. 2008. A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams. IEEE Trans. Multimedia Vol. 10 (2008), 393--408.
[32]
Jinqiao Wang, Wei Fu, Hanqing Lu, and Songde Ma. 2014. Bilayer Sparse Topic Model for Scene Analysis in Imbalanced Surveillance Videos. IEEE Transactions on Image Processing Vol. 23 (2014), 5198--5208.
[33]
Jinqiao Wang, Zhan Qu, Yingying Chen, Tao Mei, Min Xu, La Zhang, and Hanqing Lu. 2016. Adaptive Content Condensation Based on Grid Optimization for Thumbnail Image Generation. IEEE Trans. Circuits Syst. Video Techn. Vol. 26 (2016), 2079--2092.
[34]
Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. 2017. Deep Image Matting Computer Vision and Pattern Recognition (CVPR), 2017.
[35]
Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions International Conference on Learning Representations (ICLR), 2016.
[36]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. In Computer Vision and Pattern Recognition (CVPR), 2017.
[37]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2016. Semantic Understanding of Scenes through the ADE20K Dataset. CoRR Vol. abs/1608.05442 (2016).

Cited By

View all
  • (2024)Human Selective MattingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3640017Online publication date: 15-Jan-2024
  • (2024)VMFormer: End-to-End Video Matting with Transformer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00654(6664-6673)Online publication date: 3-Jan-2024
  • (2024)Video Instance Matting2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00653(6654-6663)Online publication date: 3-Jan-2024
  • Show More Cited By
  1. Fast Deep Matting for Portrait Animation on Mobile Phone

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '17: Proceedings of the 25th ACM international conference on Multimedia
    October 2017
    2028 pages
    ISBN:9781450349062
    DOI:10.1145/3123266
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic
    2. mobile phone
    3. portrait matting
    4. real-time

    Qualifiers

    • Research-article

    Conference

    MM '17
    Sponsor:
    MM '17: ACM Multimedia Conference
    October 23 - 27, 2017
    California, Mountain View, USA

    Acceptance Rates

    MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Human Selective MattingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3640017Online publication date: 15-Jan-2024
    • (2024)VMFormer: End-to-End Video Matting with Transformer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00654(6664-6673)Online publication date: 3-Jan-2024
    • (2024)Video Instance Matting2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00653(6654-6663)Online publication date: 3-Jan-2024
    • (2024)SDNet: An Extremely Efficient Portrait Matting Model via Self-Distillation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00553(5613-5622)Online publication date: 3-Jan-2024
    • (2024)Pixel-Level Contrastive Pretrainer for Industrial Image RepresentationIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335386073(1-13)Online publication date: 2024
    • (2024)End-to-End Human Instance MattingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.3306400(1-1)Online publication date: 2024
    • (2024)Amultistage Approach For Object Detection And Efficient Parsing Of Video Content2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS)10.1109/ICKECS61492.2024.10617104(1-6)Online publication date: 18-Apr-2024
    • (2024)Matting Anything2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00184(1775-1785)Online publication date: 17-Jun-2024
    • (2024)EFormer: Enhanced Transformer Towards Semantic-Contour Features of Foreground for Portraits Matting2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00372(3880-3889)Online publication date: 16-Jun-2024
    • (2023)Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW58289.2023.00045(411-420)Online publication date: Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media