skip to main content
research-article

Automatic triage for a photo series

Published:11 July 2016Publication History
Skip Abstract Section

Abstract

People often take a series of nearly redundant pictures to capture a moment or scene. However, selecting photos to keep or share from a large collection is a painful chore. To address this problem, we seek a relative quality measure within a series of photos taken of the same scene, which can be used for automatic photo triage. Towards this end, we gather a large dataset comprised of photo series distilled from personal photo albums. The dataset contains 15, 545 unedited photos organized in 5,953 series. By augmenting this dataset with ground truth human preferences among photos within each series, we establish a benchmark for measuring the effectiveness of algorithmic models of how people select photos. We introduce several new approaches for modeling human preference based on machine learning. We also describe applications for the dataset and predictor, including a smart album viewer, automatic photo enhancement, and providing overviews of video clips.

Skip Supplemental Material Section

Supplemental Material

a148.mp4

mp4

218.4 MB

References

  1. Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG) 34, 4, 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bhattacharya, S., Sukthankar, R., and Shah, M. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the international conference on Multimedia, ACM, 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Breiman, L. 2001. Random forests. Machine learning 45, 1, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., and Shah, R. 1993. Signature verification using a "siamese" time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669--688.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bychkovsky, V., Paris, S., Chan, E., and Durand, F. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 97--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cao, X., Wei, Y., Wen, F., and Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2, 177--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, IEEE, 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 1998. Active appearance models. In Computer Vision?ECCV?98. Springer, 484--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2006. Studying aesthetics in photographic images using a computational approach. In Computer Vision--ECCV 2006. Springer, 288--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dhar, S., Ordonez, V., and Berg, T. L. 2011. High level de-scribable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 1657--1664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Drucker, S., Wong, C., Roseway, A., Glenner, S., and De Mar, S. 2003. Photo-triage: Rapidly annotating your digital photographs. Tech. rep., Microsoft Research Technical Report, MSR-TR-2003-99.Google ScholarGoogle Scholar
  12. Girshick, R., Donahue, J., Darrell, T., and Malik, J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Girshick, R. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guo, Y., Liu, M., Gu, T., and Wang, W. 2012. Improving photo composition elegantly: Considering image similarity during composition optimization. In Computer Graphics Forum, Wiley Online Library, 2193--2202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HaCohen, Y., Shechtman, E., Goldman, D. B., and Lischinski, D. 2011. Non-rigid dense correspondence with applications for image enhancement. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2011) 30, 4, 70:1--70:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. 2014. Hypercolumns for object segmentation and fine-grained localization. arXiv preprint arXiv:1411.5752.Google ScholarGoogle Scholar
  17. He, K., Zhang, X., Ren, S., and Sun, J. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google ScholarGoogle Scholar
  18. Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., and Salesin, D. H. 2001. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM, 327--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jacobs, D. E., Goldman, D. B., and Shechtman, E. 2010. Cosaliency: Where people look when comparing images. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Judd, T., Ehinger, K., Durand, F., and Torralba, A. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), IEEE.Google ScholarGoogle Scholar
  21. Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., and Winnemoeller, H. 2013. Recognizing image style. arXiv preprint arXiv:1311.3715.Google ScholarGoogle Scholar
  22. Kaufman, L., Lischinski, D., and Werman, M. 2012. Content-aware automatic photo enhancement. In Computer Graphics Forum, Wiley Online Library, 2528--2540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ke, Y., Tang, X., and Jing, F. 2006. The design of high-level features for photo quality assessment. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, IEEE, 419--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Khosla, A., Raju, A. S., Torralba, A., and Oliva, A. 2015. Understanding and predicting image memorability at a large scale. In International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kittur, A., Chi, E. H., and Suh, B. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, CHI '08, 453--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097--1105.Google ScholarGoogle Scholar
  27. Liu, L., Chen, R., Wolf, L., and Cohen-Or, D. 2010. Optimizing photo composition. Computer Graphic Forum (Proceedings of Eurographics) 29, 2, 469--478.Google ScholarGoogle ScholarCross RefCross Ref
  28. Long, J., Shelhamer, E., and Darrell, T. 2014. Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038.Google ScholarGoogle Scholar
  29. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lu, X., Lin, Z., Jin, H., Yang, J., and Wang, J. Z. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the ACM International Conference on Multimedia, ACM, 457--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Luo, Y., and Tang, X. 2008. Photo and video quality evaluation: Focusing on the subject. In Computer Vision--ECCV 2008. Springer, 386--399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Luo, W., Wang, X., and Tang, X. 2011. Content-based photo quality assessment. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2206--2213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia, ACM, 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marchesotti, L., Perronnin, F., Larlus, D., and Csurka, G. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 1784--1791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Megvii Inc., 2013. Face++ research toolkit. www.faceplusplus.com.Google ScholarGoogle Scholar
  36. Murray, N., Marchesotti, L., and Perronnin, F. 2012. Ava: A large-scale database for aesthetic visual analysis. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2408--2415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nishiyama, M., Okabe, T., Sato, I., and Sato, Y. 2011. Aesthetic quality classification of photographs based on color harmony. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Paige, C. C., and Saunders, M. A. 1982. Lsqr: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 1, 43--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Park, J., Lee, J.-Y., Tai, Y.-W., and Kweon, I. S. 2012. Modeling photo composition and its application to photo rearrangement. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 2741--2744.Google ScholarGoogle Scholar
  41. Ralph Allan Bradley, M. E. T. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39, 3/4, 324--345.Google ScholarGoogle Scholar
  42. Ren, X., and Malik, J. 2003. Learning a classification model for segmentation. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, IEEE, 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ren, S., He, K., Girshick, R., and Sun, J. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ren, S., He, K., Girshick, R. B., Zhang, X., and Sun, J. 2015. Object detection networks on convolutional feature maps. CoRR abs/1504.06066.Google ScholarGoogle Scholar
  45. Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In ICCV, IEEE.Google ScholarGoogle Scholar
  46. Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  47. Sinha, P., Mehrotra, S., and Jain, R. 2011. Summarization of personal photologs using multidimensional content and context. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google ScholarGoogle Scholar
  49. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. 2015. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567.Google ScholarGoogle Scholar
  50. Tang, H., Joshi, N., and Kapoor, A. 2011. Learning a blind measure of perceptual image quality. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 305--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wang, X.-J., Zhang, L., and Liu, C. 2013. Duplicate discovery on 2 billion internet images. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, IEEE, 429--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ye, P., Kumar, J., Kang, L., and Doermann, D. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 1098--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yu, F., and Koltun, V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.Google ScholarGoogle Scholar
  54. Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.Google ScholarGoogle Scholar
  55. Yuan, L., and Sun, J. 2012. Automatic exposure correction of consumer photographs. In Computer Vision--ECCV 2012. Springer, 771--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., and Chen, C. 2013. Probabilistic graphlet transfer for photo cropping. Image Processing, IEEE Transactions on 22, 2, 802--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. 2013. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 386--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zhu, J.-Y., Agarwala, A., Efros, A. A., Shechtman, E., and Wang, J. 2014. Mirror mirror: Crowdsourcing better portraits. ACM Transactions on Graphics (TOG) 33, 6, 234. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic triage for a photo series

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 35, Issue 4
            July 2016
            1396 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/2897824
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 July 2016
            Published in tog Volume 35, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader