Abstract
People often take a series of nearly redundant pictures to capture a moment or scene. However, selecting photos to keep or share from a large collection is a painful chore. To address this problem, we seek a relative quality measure within a series of photos taken of the same scene, which can be used for automatic photo triage. Towards this end, we gather a large dataset comprised of photo series distilled from personal photo albums. The dataset contains 15, 545 unedited photos organized in 5,953 series. By augmenting this dataset with ground truth human preferences among photos within each series, we establish a benchmark for measuring the effectiveness of algorithmic models of how people select photos. We introduce several new approaches for modeling human preference based on machine learning. We also describe applications for the dataset and predictor, including a smart album viewer, automatic photo enhancement, and providing overviews of video clips.
Supplemental Material
- Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG) 34, 4, 98. Google ScholarDigital Library
- Bhattacharya, S., Sukthankar, R., and Shah, M. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the international conference on Multimedia, ACM, 271--280. Google ScholarDigital Library
- Breiman, L. 2001. Random forests. Machine learning 45, 1, 5--32. Google ScholarDigital Library
- Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., and Shah, R. 1993. Signature verification using a "siamese" time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669--688.Google ScholarCross Ref
- Bychkovsky, V., Paris, S., Chan, E., and Durand, F. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 97--104. Google ScholarDigital Library
- Cao, X., Wei, Y., Wen, F., and Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2, 177--190. Google ScholarDigital Library
- Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, IEEE, 539--546. Google ScholarDigital Library
- Cootes, T. F., Edwards, G. J., and Taylor, C. J. 1998. Active appearance models. In Computer Vision?ECCV?98. Springer, 484--498. Google ScholarDigital Library
- Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2006. Studying aesthetics in photographic images using a computational approach. In Computer Vision--ECCV 2006. Springer, 288--301. Google ScholarDigital Library
- Dhar, S., Ordonez, V., and Berg, T. L. 2011. High level de-scribable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 1657--1664. Google ScholarDigital Library
- Drucker, S., Wong, C., Roseway, A., Glenner, S., and De Mar, S. 2003. Photo-triage: Rapidly annotating your digital photographs. Tech. rep., Microsoft Research Technical Report, MSR-TR-2003-99.Google Scholar
- Girshick, R., Donahue, J., Darrell, T., and Malik, J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 580--587. Google ScholarDigital Library
- Girshick, R. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083. Google ScholarDigital Library
- Guo, Y., Liu, M., Gu, T., and Wang, W. 2012. Improving photo composition elegantly: Considering image similarity during composition optimization. In Computer Graphics Forum, Wiley Online Library, 2193--2202. Google ScholarDigital Library
- HaCohen, Y., Shechtman, E., Goldman, D. B., and Lischinski, D. 2011. Non-rigid dense correspondence with applications for image enhancement. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2011) 30, 4, 70:1--70:9. Google ScholarDigital Library
- Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. 2014. Hypercolumns for object segmentation and fine-grained localization. arXiv preprint arXiv:1411.5752.Google Scholar
- He, K., Zhang, X., Ren, S., and Sun, J. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google Scholar
- Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., and Salesin, D. H. 2001. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM, 327--340. Google ScholarDigital Library
- Jacobs, D. E., Goldman, D. B., and Shechtman, E. 2010. Cosaliency: Where people look when comparing images. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM, 219--228. Google ScholarDigital Library
- Judd, T., Ehinger, K., Durand, F., and Torralba, A. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), IEEE.Google Scholar
- Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., and Winnemoeller, H. 2013. Recognizing image style. arXiv preprint arXiv:1311.3715.Google Scholar
- Kaufman, L., Lischinski, D., and Werman, M. 2012. Content-aware automatic photo enhancement. In Computer Graphics Forum, Wiley Online Library, 2528--2540. Google ScholarDigital Library
- Ke, Y., Tang, X., and Jing, F. 2006. The design of high-level features for photo quality assessment. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, IEEE, 419--426. Google ScholarDigital Library
- Khosla, A., Raju, A. S., Torralba, A., and Oliva, A. 2015. Understanding and predicting image memorability at a large scale. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
- Kittur, A., Chi, E. H., and Suh, B. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, CHI '08, 453--456. Google ScholarDigital Library
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097--1105.Google Scholar
- Liu, L., Chen, R., Wolf, L., and Cohen-Or, D. 2010. Optimizing photo composition. Computer Graphic Forum (Proceedings of Eurographics) 29, 2, 469--478.Google ScholarCross Ref
- Long, J., Shelhamer, E., and Darrell, T. 2014. Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038.Google Scholar
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91--110. Google ScholarDigital Library
- Lu, X., Lin, Z., Jin, H., Yang, J., and Wang, J. Z. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the ACM International Conference on Multimedia, ACM, 457--466. Google ScholarDigital Library
- Luo, Y., and Tang, X. 2008. Photo and video quality evaluation: Focusing on the subject. In Computer Vision--ECCV 2008. Springer, 386--399. Google ScholarDigital Library
- Luo, W., Wang, X., and Tang, X. 2011. Content-based photo quality assessment. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2206--2213. Google ScholarDigital Library
- Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia, ACM, 533--542. Google ScholarDigital Library
- Marchesotti, L., Perronnin, F., Larlus, D., and Csurka, G. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 1784--1791. Google ScholarDigital Library
- Megvii Inc., 2013. Face++ research toolkit. www.faceplusplus.com.Google Scholar
- Murray, N., Marchesotti, L., and Perronnin, F. 2012. Ava: A large-scale database for aesthetic visual analysis. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2408--2415. Google ScholarDigital Library
- Nishiyama, M., Okabe, T., Sato, I., and Sato, Y. 2011. Aesthetic quality classification of photographs based on color harmony. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 33--40. Google ScholarDigital Library
- Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3, 145--175. Google ScholarDigital Library
- Paige, C. C., and Saunders, M. A. 1982. Lsqr: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 1, 43--71. Google ScholarDigital Library
- Park, J., Lee, J.-Y., Tai, Y.-W., and Kweon, I. S. 2012. Modeling photo composition and its application to photo rearrangement. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 2741--2744.Google Scholar
- Ralph Allan Bradley, M. E. T. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39, 3/4, 324--345.Google Scholar
- Ren, X., and Malik, J. 2003. Learning a classification model for segmentation. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, IEEE, 10--17. Google ScholarDigital Library
- Ren, S., He, K., Girshick, R., and Sun, J. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 91--99. Google ScholarDigital Library
- Ren, S., He, K., Girshick, R. B., Zhang, X., and Sun, J. 2015. Object detection networks on convolutional feature maps. CoRR abs/1504.06066.Google Scholar
- Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In ICCV, IEEE.Google Scholar
- Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Sinha, P., Mehrotra, S., and Jain, R. 2011. Summarization of personal photologs using multidimensional content and context. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ACM, 4. Google ScholarDigital Library
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. 2015. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567.Google Scholar
- Tang, H., Joshi, N., and Kapoor, A. 2011. Learning a blind measure of perceptual image quality. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 305--312. Google ScholarDigital Library
- Wang, X.-J., Zhang, L., and Liu, C. 2013. Duplicate discovery on 2 billion internet images. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, IEEE, 429--436. Google ScholarDigital Library
- Ye, P., Kumar, J., Kang, L., and Doermann, D. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 1098--1105. Google ScholarDigital Library
- Yu, F., and Koltun, V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.Google Scholar
- Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.Google Scholar
- Yuan, L., and Sun, J. 2012. Automatic exposure correction of consumer photographs. In Computer Vision--ECCV 2012. Springer, 771--785. Google ScholarDigital Library
- Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., and Chen, C. 2013. Probabilistic graphlet transfer for photo cropping. Image Processing, IEEE Transactions on 22, 2, 802--815. Google ScholarDigital Library
- Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. 2013. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 386--391. Google ScholarDigital Library
- Zhu, J.-Y., Agarwala, A., Efros, A. A., Shechtman, E., and Wang, J. 2014. Mirror mirror: Crowdsourcing better portraits. ACM Transactions on Graphics (TOG) 33, 6, 234. Google ScholarDigital Library
Index Terms
- Automatic triage for a photo series
Recommendations
Automatic tag expansion using visual similarity for photo sharing websites
In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are ...
Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference
As one of the emerging Web 2.0 activities, tagging becomes a popular approach to manage personal media data, such as photo albums. A dilemma in tagging behavior is the users' manual efforts and the tagging accuracy: exhaustively tagging all photos in an ...
Comments