research-article

Fast Deep Matting for Portrait Animation on Mobile Phone

Authors:

Ming TangAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 297 - 305

https://doi.org/10.1145/3123266.3123286

Published: 19 October 2017 Publication History

Abstract

Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait image. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.

References

[1]

Yagız Aksoy, Tuncc Ozan Aydın, and Marc Pollefeys. 2017. Designing Effective Inter-Pixel Information Flow for Natural Image Matting Computer Vision and Pattern Recognition (CVPR), 2017.

[2]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs International Conference on Learning Representations (ICLR), 2015.

[3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE transactions on pattern analysis and machine intelligence (2017).

[4]

Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 9 (2013), 2175--2188.

Digital Library

[5]

Yingying Chen, Jinqiao Wang, Min Xu, Xiangjian He, and Hanqing Lu. 2016. A unified model sharing framework for moving object detection. Signal Processing Vol. 124 (2016), 72--80.

Digital Library

[6]

Donghyeon Cho, Yu Wing Tai, and Inso Kweon. 2016. Natural Image Matting Using Deep Convolutional Neural Networks European Conference on Computer Vision (ECCV), 2016. 626--643.

[7]

Yung-Yu Chuang, Brian Curless, David H Salesin, and Richard Szeliski. 2001. A bayesian approach to digital matting. In Computer Vision and Pattern Recognition (CVPR), 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. Vol. 2. IEEE, II--II.

[8]

Eduardo SL Gastal and Manuel M Oliveira. 2010. Shared sampling for real-time alpha matting. In Computer Graphics Forum, Vol. Vol. 29. Wiley Online Library, 575--584.

[9]

Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, and Roberto Cipolla. 2015. SceneNet: Understanding Real World Indoor Scenes With Synthetic Data. CoRR Vol. abs/1511.07041 (2015).

[10]

Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Fast matting using large kernel matting laplacian matrices Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2165--2172.

[11]

Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 6 (2013), 1397--1409.

Digital Library

[12]

Peiyun Hu and Deva Ramanan. 2017. Finding Tiny Faces Computer Vision and Pattern Recognition (CVPR), 2017.

[13]

Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2017.

[14]

Simon Jégou, Michal Drozdzal, David Vázquez, Adriana Romero, and Yoshua Bengio. 2017. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In Workshop on Computer Vision in Vehicle Technology CVPR, 2017.

[15]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.

Digital Library

[16]

Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android Proceedings of the 2016 ACM on Multimedia Conference. 1201--1205.

Digital Library

[17]

Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, 2 (2008), 228--242.

Digital Library

[18]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV.

[19]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.

[20]

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016).

[21]

Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network Computer Vision and Pattern Recognition (CVPR), 2017.

[22]

Shaoyu Qi, Yu-Tseh Chi, Adrian M. Peter, and Jeffrey Ho. 2016. CASAIR: Content and Shape-Aware Image Retargeting and Its Applications. IEEE Transactions on Image Processing Vol. 25 (2016), 2222--2232.

Digital Library

[23]

Hongwei Qin, Junjie Yan, Xiu Li, and Xiaolin Hu. 2016. Joint Training of Cascaded CNN for Face Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 3456--3465.

[24]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In Computer Vision and Pattern Recognition (CVPR), 2017.

[25]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39 (2015), 1137--1149.

Digital Library

[26]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 815--823.

[27]

Xiaoyong Shen, Aaron Hertzmann, Jiaya Jia, Sylvain Paris, Brian Price, Eli Shechtman, and Ian Sachs. 2016 a. Automatic portrait segmentation for image stylization Computer Graphics Forum, Vol. Vol. 35. Wiley Online Library, 93--102.

[28]

Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016 b. Deep Automatic Portrait Matting. In European Conference on Computer Vision. Springer, 92--107.

[29]

Jian Sun, Jiaya Jia, Chi-Keung Tang, and Heung-Yeung Shum. 2004. Poisson matting ACM Transactions on Graphics (ToG), Vol. Vol. 23. ACM, 315--321.

Digital Library

[30]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex A. Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning ICLR 2016 Workshop.

[31]

Jinqiao Wang, Ling-Yu Duan, Qingshan Liu, Hanqing Lu, and Jesse S. Jin. 2008. A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams. IEEE Trans. Multimedia Vol. 10 (2008), 393--408.

Digital Library

[32]

Jinqiao Wang, Wei Fu, Hanqing Lu, and Songde Ma. 2014. Bilayer Sparse Topic Model for Scene Analysis in Imbalanced Surveillance Videos. IEEE Transactions on Image Processing Vol. 23 (2014), 5198--5208.

[33]

Jinqiao Wang, Zhan Qu, Yingying Chen, Tao Mei, Min Xu, La Zhang, and Hanqing Lu. 2016. Adaptive Content Condensation Based on Grid Optimization for Thumbnail Image Generation. IEEE Trans. Circuits Syst. Video Techn. Vol. 26 (2016), 2079--2092.

Digital Library

[34]

Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. 2017. Deep Image Matting Computer Vision and Pattern Recognition (CVPR), 2017.

[35]

Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions International Conference on Learning Representations (ICLR), 2016.

[36]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. In Computer Vision and Pattern Recognition (CVPR), 2017.

[37]

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2016. Semantic Understanding of Scenes through the ADE20K Dataset. CoRR Vol. abs/1608.05442 (2016).

Cited By

Liu QMeng QLv XLi ZYu WZhang S(2024)Human Selective MattingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3640017Online publication date: 15-Jan-2024
https://doi.org/10.1145/3640017
Li JGoel VOhanyan MNavasardyan SWei YShi H(2024)VMFormer: End-to-End Video Matting with Transformer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00654(6664-6673)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00654
Li JHenschel RGoel VOhanyan MNavasardyan SShi H(2024)Video Instance Matting2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00653(6654-6663)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00653
Show More Cited By

Fast Deep Matting for Portrait Animation on Mobile Phone
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Privacy-Preserving Portrait Matting
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Recently, there has been an increasing concern about the privacy issue raised by using personally identifiable information in machine learning. However, previous portrait matting methods were all based on identifiable portrait images. To fill the gap, ...
Deep portrait matting via double-grained segmentation
Abstract
Portrait matting is an image processing technology that takes the portrait in the image as the foreground and accurately extracts it, and it is widely used in portrait photography and other fields. Given the problem that previous portrait matting ...
Portrait Matting via Semantic and Detail Guidance
Pattern Recognition
Abstract
Portrait matting is a challenging computer vision task that aims to estimate the per-pixel opacity of the foreground human regions. To produce high-quality alpha mattes, the majority of available methods employ a user-supplied trimap as an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
457
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu QMeng QLv XLi ZYu WZhang S(2024)Human Selective MattingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3640017Online publication date: 15-Jan-2024
https://doi.org/10.1145/3640017
Li JGoel VOhanyan MNavasardyan SWei YShi H(2024)VMFormer: End-to-End Video Matting with Transformer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00654(6664-6673)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00654
Li JHenschel RGoel VOhanyan MNavasardyan SShi H(2024)Video Instance Matting2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00653(6654-6663)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00653
Li ZXu BXie JTang YLu C(2024)SDNet: An Extremely Efficient Portrait Matting Model via Self-Distillation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00553(5613-5622)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00553
Zhu BChen YTang MWang J(2024)Pixel-Level Contrastive Pretrainer for Industrial Image RepresentationIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335386073(1-13)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3353860
Liu QZhang SMeng QZhong BLiu PYao H(2024)End-to-End Human Instance MattingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.3306400(1-1)Online publication date: 2024
https://doi.org/10.1109/TCSVT.2023.3306400
Sharma GJatain DMalik SNiranjanamurthy M(2024)Amultistage Approach For Object Detection And Efficient Parsing Of Video Content2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS)10.1109/ICKECS61492.2024.10617104(1-6)Online publication date: 18-Apr-2024
https://doi.org/10.1109/ICKECS61492.2024.10617104
Li JJain JShi H(2024)Matting Anything2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00184(1775-1785)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00184
Wang ZMiao QXi YZhao P(2024)EFormer: Enhanced Transformer Towards Semantic-Contour Features of Foreground for Portraits Matting2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00372(3880-3889)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00372
Azarian KDas DPark HPorikli F(2023)Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW58289.2023.00045(411-420)Online publication date: Jan-2023
https://doi.org/10.1109/WACVW58289.2023.00045
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents