research-article

Attribute-Aware Attention Model for Fine-grained Representation Learning

Authors:
Kai Han

Peking University & Alibaba Group, Beijing, China

Peking University & Alibaba Group, Beijing, China
View Profile

,
Jianyuan Guo

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Chao Zhang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Mingjian Zhu

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 2040–2048https://doi.org/10.1145/3240508.3240550

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 2040–2048

ABSTRACT

How to learn a discriminative fine-grained representation is a key point in many computer vision applications, such as person re-identification, fine-grained classification, fine-grained image retrieval, etc. Most of the previous methods focus on learning metrics or ensemble to derive better global representation, which are usually lack of local information. Based on the considerations above, we propose a novel Attribute-Aware Attention Model ($A^3M$), which can learn local attribute representation and global category representation simultaneously in an end-to-end manner. The proposed model contains two attention models: attribute-guided attention module uses attribute information to help select category features in different regions, at the same time, category-guided attention module selects local features of different attributes with the help of category cues. Through this attribute-category reciprocal process, local and global features benefit from each other. Finally, the resulting feature contains more intrinsic information for image recognition instead of the noisy and irrelevant features. Extensive experiments conducted on Market-1501, CompCars, CUB-200-2011 and CARS196 demonstrate the effectiveness of our $A^3M$.

Supplemental Material

Available for Download

zip

fp0235.zip (2.8 MB)

The supplemental material's pdf form is under this fold. The source file is appendix.tex.

References

Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3908--3916.Google ScholarCross Ref
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. CVPR (2017).Google Scholar
Mohsen Biglari, Ali Soleimani, and Hamid Hassanpour. 2018. A Cascaded Part-Based System for Fine-Grained Vehicle Classification. IEEE Transactions on Intelligent Transportation Systems, Vol. 19, 1 (2018), 273--283.Google ScholarCross Ref
Steve Branson, Grant Van Horn, Serge Belongie, and Pietro Perona. 2014. Bird species categorization using pose normalized deep convolutional nets. (2014).Google Scholar
Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, and Jian-Huang Lai. 2017. Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017). Google ScholarDigital Library
Franccois Chollet. 2015. Keras. https://github.com/fchollet/keras. (2015).Google Scholar
Yin Cui, Feng Zhou, Yuanqing Lin, and Serge Belongie. 2016. Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop. In IEEE Conference on Computer Vision and Pattern Recognition. 1153--1162.Google Scholar
Thibaut Durand, Nicolas Thome, and Matthieu Cord. 2016. Weldon: Weakly supervised learning of deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4743--4752.Google ScholarCross Ref
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-stacked CNN for fine-grained visual categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1173--1182.Google ScholarCross Ref
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2288--2295. Google ScholarDigital Library
J. Krause, M. Stark, J. Deng, and L. Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 2013 IEEE International Conference on Computer Vision Workshops. 554--561. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 951--958.Google ScholarCross Ref
Ryan Layne, Timothy M Hospedales, Shaogang Gong, and Q Mary. 2012. Person Re-identification by Attributes.. In Bmvc, Vol. 2. 8.Google Scholar
Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification. In IEEE Conference on Computer Vision and Pattern Recognition .Google ScholarCross Ref
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarCross Ref
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1449--1457. Google ScholarDigital Library
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving Person Re-identification by Attribute and Identity Learning. arXiv preprint arXiv:1703.07220 (2017).Google Scholar
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017a. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing (2017).Google Scholar
Xiao Liu, Jiang Wang, Shilei Wen, Errui Ding, and Yuanqing Lin. 2017b. Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition. In AAAI. 4190--4196.Google Scholar
Raghavan Prabhakar Manning, Christopher D. and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.Google Scholar
ES Tetsu Matsukawa and Einoshin Suzuki. 2016. Person re-identification using cnn features learned from combination of attributes. ICPR.Google Scholar
Michael Opitz, Georg Waltner, Horst Possegger, and Horst Bischof. 2017. BIER-Boosting Independent Embeddings Robustly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5189--5198.Google ScholarCross Ref
Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2015. Is object localization for free-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 685--694.Google ScholarCross Ref
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2018. Object-Part Attention Model for Fine-Grained Image Classification. IEEE Transactions on Image Processing, Vol. 27 (2018), 1487--1500.Google ScholarCross Ref
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252. Google ScholarDigital Library
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Zechao Li, Yu-Gang Jiang, and Shuicheng Yan. 2016. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology (2016).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 1857--1865. Google ScholarDigital Library
Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2016. Deep attributes driven multi-camera person re-identification. In European Conference on Computer Vision. Springer, 475--491.Google ScholarCross Ref
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016a. Gated siamese convolutional neural network architecture for human re-identification. In European Conference on Computer Vision. Springer, 791--808.Google ScholarCross Ref
Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016b. A Siamese Long Short-Term Memory Architecture for Human Re-identification. In European Conference on Computer Vision. 135--153.Google ScholarCross Ref
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).Google Scholar
Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, Xiangyang Xue, and Zheng Zhang. 2015. Multiple granularity descriptors for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision. 2399--2406. Google ScholarDigital Library
Gang Wang and David Forsyth. 2009. Joint learning of visual attributes, object classes and visual saliency. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 537--544.Google ScholarCross Ref
Yaming Wang, Jonghyun Choi, Vlad I. Morariu, and Larry Davis. 2016. Mining Discriminative Triplets of Patches for Fine-Grained Classification. arXiv:1605.01130 (2016).Google Scholar
Shaogang Gong Wei Li, Xiatian Zhu. 2017. Person Re-Identification by Deep Joint Learning of Multi-Loss Classification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 2194--2200. Google ScholarDigital Library
Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision. Springer, 451--466.Google ScholarCross Ref
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057. Google ScholarDigital Library
Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3973--3981.Google ScholarCross Ref
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--29.Google ScholarCross Ref
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Deep metric learning for person re-identification. In Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 34--39. Google ScholarDigital Library
Yuhui Yuan, Kuiyuan Yang, and Chao Zhang. 2016. Hard-aware deeply cascaded embedding. CoRR, abs/1611.05720, Vol. 1 (2016).Google Scholar
Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision. Springer, 834--849.Google ScholarCross Ref
Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1134--1142.Google ScholarCross Ref
Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-Learned Part-Aligned Representations for Person Re-Identification. ICCV (2017).Google Scholar
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision. 2528--2535. Google ScholarDigital Library
Liang Zheng, Liyue Shen, Lu Tian, and Shengjin Wang. 2015. Scalable Person Re-identification: A Benchmark. In IEEE International Conference on Computer Vision. 1116--1124. Google ScholarDigital Library
Zhedong Zheng, Liang Zheng, and Yi Yang. 2016. A Discriminatively Learned CNN Embedding for Person Re-identification. arXiv preprint arXiv:1611.05666 (2016).Google Scholar
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking Person Re-identification with k-reciprocal Encoding. CVPR (2017).Google Scholar

Index Terms

Attribute-Aware Attention Model for Fine-grained Representation Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification

Recommendations

Fine-grained role graph model
Read More
Attention cutting and padding learning for fine-grained image recognition
Abstract
Fine-grained image recognition is an important task in the field of computer vision. In fine-grained image recognition, the difference between different categories is very small. Thus, fine-grained image recognition highly depends on local ...
Read More
Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition
Fashion attribute recognition is a not-new topic, but rather a core task in understanding fashion from the perspective of computer vision. This article proposes a structured relation-aware network (sRA-Net), which exploits multiple hidden relations in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attribute-aware attention
deep learning
fine-grained recognition
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 1,114
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Attribute-Aware Attention Model for Fine-grained Representation Learning

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Fine-grained role graph model

Attention cutting and padding learning for fine-grained image recognition

Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition