skip to main content
10.1145/3240508.3240550acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attribute-Aware Attention Model for Fine-grained Representation Learning

Published:15 October 2018Publication History

ABSTRACT

How to learn a discriminative fine-grained representation is a key point in many computer vision applications, such as person re-identification, fine-grained classification, fine-grained image retrieval, etc. Most of the previous methods focus on learning metrics or ensemble to derive better global representation, which are usually lack of local information. Based on the considerations above, we propose a novel Attribute-Aware Attention Model ($A^3M$), which can learn local attribute representation and global category representation simultaneously in an end-to-end manner. The proposed model contains two attention models: attribute-guided attention module uses attribute information to help select category features in different regions, at the same time, category-guided attention module selects local features of different attributes with the help of category cues. Through this attribute-category reciprocal process, local and global features benefit from each other. Finally, the resulting feature contains more intrinsic information for image recognition instead of the noisy and irrelevant features. Extensive experiments conducted on Market-1501, CompCars, CUB-200-2011 and CARS196 demonstrate the effectiveness of our $A^3M$.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3908--3916.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. CVPR (2017).Google ScholarGoogle Scholar
  4. Mohsen Biglari, Ali Soleimani, and Hamid Hassanpour. 2018. A Cascaded Part-Based System for Fine-Grained Vehicle Classification. IEEE Transactions on Intelligent Transportation Systems, Vol. 19, 1 (2018), 273--283.Google ScholarGoogle ScholarCross RefCross Ref
  5. Steve Branson, Grant Van Horn, Serge Belongie, and Pietro Perona. 2014. Bird species categorization using pose normalized deep convolutional nets. (2014).Google ScholarGoogle Scholar
  6. Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, and Jian-Huang Lai. 2017. Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Franccois Chollet. 2015. Keras. https://github.com/fchollet/keras. (2015).Google ScholarGoogle Scholar
  8. Yin Cui, Feng Zhou, Yuanqing Lin, and Serge Belongie. 2016. Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop. In IEEE Conference on Computer Vision and Pattern Recognition. 1153--1162.Google ScholarGoogle Scholar
  9. Thibaut Durand, Nicolas Thome, and Matthieu Cord. 2016. Weldon: Weakly supervised learning of deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4743--4752.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Google ScholarGoogle ScholarCross RefCross Ref
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-stacked CNN for fine-grained visual categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1173--1182.Google ScholarGoogle ScholarCross RefCross Ref
  13. Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2288--2295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Krause, M. Stark, J. Deng, and L. Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 2013 IEEE International Conference on Computer Vision Workshops. 554--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 951--958.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ryan Layne, Timothy M Hospedales, Shaogang Gong, and Q Mary. 2012. Person Re-identification by Attributes.. In Bmvc, Vol. 2. 8.Google ScholarGoogle Scholar
  18. Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification. In IEEE Conference on Computer Vision and Pattern Recognition .Google ScholarGoogle ScholarCross RefCross Ref
  19. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1449--1457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving Person Re-identification by Attribute and Identity Learning. arXiv preprint arXiv:1703.07220 (2017).Google ScholarGoogle Scholar
  22. Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017a. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing (2017).Google ScholarGoogle Scholar
  23. Xiao Liu, Jiang Wang, Shilei Wen, Errui Ding, and Yuanqing Lin. 2017b. Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition. In AAAI. 4190--4196.Google ScholarGoogle Scholar
  24. Raghavan Prabhakar Manning, Christopher D. and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.Google ScholarGoogle Scholar
  25. ES Tetsu Matsukawa and Einoshin Suzuki. 2016. Person re-identification using cnn features learned from combination of attributes. ICPR.Google ScholarGoogle Scholar
  26. Michael Opitz, Georg Waltner, Horst Possegger, and Horst Bischof. 2017. BIER-Boosting Independent Embeddings Robustly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5189--5198.Google ScholarGoogle ScholarCross RefCross Ref
  27. Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2015. Is object localization for free-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 685--694.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yuxin Peng, Xiangteng He, and Junjie Zhao. 2018. Object-Part Attention Model for Fine-Grained Image Classification. IEEE Transactions on Image Processing, Vol. 27 (2018), 1487--1500.Google ScholarGoogle ScholarCross RefCross Ref
  29. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Zechao Li, Yu-Gang Jiang, and Shuicheng Yan. 2016. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology (2016).Google ScholarGoogle Scholar
  31. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  32. Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 1857--1865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2016. Deep attributes driven multi-camera person re-identification. In European Conference on Computer Vision. Springer, 475--491.Google ScholarGoogle ScholarCross RefCross Ref
  34. Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016a. Gated siamese convolutional neural network architecture for human re-identification. In European Conference on Computer Vision. Springer, 791--808.Google ScholarGoogle ScholarCross RefCross Ref
  35. Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016b. A Siamese Long Short-Term Memory Architecture for Human Re-identification. In European Conference on Computer Vision. 135--153.Google ScholarGoogle ScholarCross RefCross Ref
  36. Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).Google ScholarGoogle Scholar
  37. Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, Xiangyang Xue, and Zheng Zhang. 2015. Multiple granularity descriptors for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision. 2399--2406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gang Wang and David Forsyth. 2009. Joint learning of visual attributes, object classes and visual saliency. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 537--544.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yaming Wang, Jonghyun Choi, Vlad I. Morariu, and Larry Davis. 2016. Mining Discriminative Triplets of Patches for Fine-Grained Classification. arXiv:1605.01130 (2016).Google ScholarGoogle Scholar
  40. Shaogang Gong Wei Li, Xiatian Zhu. 2017. Person Re-Identification by Deep Joint Learning of Multi-Loss Classification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 2194--2200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision. Springer, 451--466.Google ScholarGoogle ScholarCross RefCross Ref
  42. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3973--3981.Google ScholarGoogle ScholarCross RefCross Ref
  44. Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--29.Google ScholarGoogle ScholarCross RefCross Ref
  45. Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Deep metric learning for person re-identification. In Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 34--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yuhui Yuan, Kuiyuan Yang, and Chao Zhang. 2016. Hard-aware deeply cascaded embedding. CoRR, abs/1611.05720, Vol. 1 (2016).Google ScholarGoogle Scholar
  47. Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision. Springer, 834--849.Google ScholarGoogle ScholarCross RefCross Ref
  48. Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1134--1142.Google ScholarGoogle ScholarCross RefCross Ref
  49. Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-Learned Part-Aligned Representations for Person Re-Identification. ICCV (2017).Google ScholarGoogle Scholar
  50. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision. 2528--2535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Liang Zheng, Liyue Shen, Lu Tian, and Shengjin Wang. 2015. Scalable Person Re-identification: A Benchmark. In IEEE International Conference on Computer Vision. 1116--1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zhedong Zheng, Liang Zheng, and Yi Yang. 2016. A Discriminatively Learned CNN Embedding for Person Re-identification. arXiv preprint arXiv:1611.05666 (2016).Google ScholarGoogle Scholar
  53. Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking Person Re-identification with k-reciprocal Encoding. CVPR (2017).Google ScholarGoogle Scholar

Index Terms

  1. Attribute-Aware Attention Model for Fine-grained Representation Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '18: Proceedings of the 26th ACM international conference on Multimedia
      October 2018
      2167 pages
      ISBN:9781450356657
      DOI:10.1145/3240508

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader