research-article

Painting-to-3D model alignment via discriminative visual elements

Authors:

Bryan C. Russell,

Josef SivicAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 33, Issue 2

Article No.: 14, Pages 1 - 14

https://doi.org/10.1145/2591009

Published: 08 April 2014 Publication History

Abstract

This article describes a technique that can reliably align arbitrary 2D depictions of an architectural site, including drawings, paintings, and historical photographs, with a 3D model of the site. This is a tremendously difficult task, as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, for example, due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the painting to a large 3D model, such as a partial reconstruction of a city, is huge. To address these issues, we develop a new compact representation of complex 3D scenes. The 3D model of the scene is represented by a small set of discriminative visual elements that are automatically learned from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learned in a discriminative fashion. We show that the learned visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, historical photograph) and structural changes (e.g., missing scene parts, large occluders) of the scene. We demonstrate an application of the proposed approach to automatic rephotography to find an approximate viewpoint of historical paintings and photographs with respect to a 3D model of the site. The proposed alignment procedure is validated via a human user study on a new database of paintings and sketches spanning several sites. The results demonstrate that our algorithm produces significantly better alignments than several baseline methods.

Supplementary Material

JPG File (a14-sidebyside.jpg)

Download
6.21 KB

MP4 File (a14-sidebyside.mp4)

Download
34.24 MB

References

[1]

D. Aliaga, P. Rosen, and D. Bekins. 2007. Style grammars for interactive visualization of architecture. IEEE Trans. Vis. Comput. Graph. 13, 4.

Digital Library

[2]

G. Baatz, O. Saurer, K. Koser, and M. Pollefeys. 2012. Large scale visual geo-localization of images in mountainous terrain. In Proceedings of the European Conference on Computer Vision.

Digital Library

[3]

L. Baboud, M. Cadik, E. Eisemann, and H.-P. Seidel. 2011. Automatic photo-to-terrain alignment for the annotation of mountain pictures. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[4]

F. Bach. and Z. Harchaoui. 2008. Diffrac: A discriminative and flexible framework for clustering. In Advances in Neural Information Processing Systems.

[5]

S. Bae, A. Agarwala, and F. Durand. 2010. Computational rephotography. ACM Trans. Graph. 29, 3.

Digital Library

[6]

L. Ballan, G. Brostow, J. Puwein, and M. Pollefeys. 2010. Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Trans. Graph. 29, 4.

Digital Library

[7]

C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.

Digital Library

[8]

F. Bosche. 2010. Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv. Engin. Inf. 24, 1, 107--118.

Digital Library

[9]

O. Chum and J. Matas. 2006. Geometric hashing with local affine frames. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[10]

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[11]

T. Dean, M. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik. 2013. Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[12]

P. E. Debevec, C. J. Taylor, and J. Malik. 1996. Modeling and rendering architecture from photographs. In Proceedings of the 23^rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'96). 1--20.

Digital Library

[13]

C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros. 2012. What makes Paris look like Paris&quest; ACM Trans. Graph. 31, 4.

Digital Library

[14]

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1, 1871--1874.

Digital Library

[15]

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan. 2010. Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645.

Digital Library

[16]

M. A. Fischler and R. C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24, 6, 381--395.

Digital Library

[17]

A. Frome, Y. Singer, F. Sha, and J. Malik. 2007. Learning globally-consistent local distance functions for shape-based image retrieval and classification. In Proceedings of the International Conference on Computer Vision.

[18]

Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. 2010. Towards Internet-scale multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

[19]

Y. Furukawa and J. Ponce. 2010. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8.

Digital Library

[20]

M. Gharbi, T. Malisiewicz, S. Paris, and F. Durand. 2012. A Gaussian approximation of feature space for fast image similarity. Tech. rep. MIT-CSAIL-TR-2012-032. http://people.csail.mit.edu/tomasz/papers/gharbi_techreport_2012.pdf.

[21]

B. Hariharan, J. Malik, and D. Ramanan. 2012. Discriminative decorrelation for clustering and classification. In Proceedings of the European Conference on Computer Vision.

Digital Library

[22]

R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry in Computer Vision 2^nd Ed. Cambridge University Press.

Digital Library

[23]

D. Hauagge and N. Snavely. 2012. Image matching using local symmetry features. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[24]

D. P. Huttenlocher and S. Ullman. 1987. Object recognition using alignment. In Proceedings of the International Conference on Computer Vision.

[25]

A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

[26]

A. Jain, A. Gupta, M. Rodriguez, and L. S. Davis. 2013. Representing videos using mid-level discriminative patches. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[27]

M. Juneja, A. Vedaldi, C. V. Jawahar, and A. Zisserman. 2013. Blocks that shout: Distinctive parts for scene classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Digital Library

[28]

T. Kailath. 1967. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Comm. Technol. 15, 1, 52--60.

[29]

J. Kopf, B. Neubert, B. Chen, M. Cohen, D. Cohen-Or, O. Deussen, M. Uyttendaele, and D. Lischinski. 2008. Deep photo: Model-based photograph enhancement and viewing. ACM Trans. Graph. 27, 5.

Digital Library

[30]

G. Levin and P. Debevec. 1999. Rouen revisited -- Interactive installation. http://acg.media.mit.edu/people/golan/rouen/.

Digital Library

[31]

Y. Li, N. Snavely, D. Huttenlocher, and P. Fua. 2012. Worldwide pose estimation using 3D point clouds. In Proceedings of the European Conference on Computer Vision.

Digital Library

[32]

D. Lowe. 1987. The viewpoint consistency constraint. Int. J. Comput. Vis. 1, 1, 57--72.

[33]

D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.

Digital Library

[34]

T. Malisiewicz, A. Gupta, and A. A. Efros. 2011. Ensemble of exemplar-SVMs for object detection and beyond. In Proceedings of the International Conference on Computer Vision.

Digital Library

[35]

P. Musialski, P. Wonka, D. Aliaga, M. Wimmer, L. Van Gool, W. Purgathofer, N. Mitra, M. Pauly, M. Wand, and D. Ceylan, et al. 2012. A survey of urban reconstruction. In Eurographics State of the Art Reports.

[36]

A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175.

Digital Library

[37]

J. Rapp. 2008. A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: An application of geometric restitution of perspective. J. Archit. 13, 6.

[38]

B. C. Russell, J. Sivic, J. Ponce, and H. Dessales. 2011. Automatic alignment of paintings and photographs depicting a 3D scene. In Proceedings of the IEEE Workshop on 3D Representation for Recognition (3dRR'11).

[39]

T. Sattler, B. Leibe, and L. Kobbelt. 2011. Fast image-based localization using direct 2D-to-3D matching. In Proceedings of the International Conference on Computer Vision.

Digital Library

[40]

G. Schindler, M. Brown, and R. Szeliski. 2007. City-scale location recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

[41]

S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Math. Program. Series B 127, 1, 3--30.

Digital Library

[42]

E. Shechtman and M. Irani. 2007. Matching local self-similarities across images and videos. In Proceedings of the Conference on Computer Vision and Pattern Recognition.

[43]

A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros. 2011. Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. 30, 6.

Digital Library

[44]

S. Singh, A. Gupta, and A. A. Efros. 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision.

Digital Library

[45]

J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision.

Digital Library

[46]

N. Snavely, S. M. Seitz, and R. Szeliski. 2006. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph. 25, 3, 835--846.

Digital Library

[47]

R. Szeliski. 2006. Image alignment and stitching: A tutorial. Foundat. Trends Comput. Graph. Vis. 2, 1, 1--104.

Digital Library

[48]

R. Szeliski and P. Torr. 1998. Geometrically constrained structure from motion: Points on planes. In European Workshop on 3D Structure from Multiple Images of Large-Scale Environments (SMILE'98).

[49]

C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys. 2008. 3D model matching with viewpoint invariant patches (VIPs). In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Cited By

Lyu QZhao NYang YGong YGao J(2024)A diffusion probabilistic model for traditional Chinese landscape painting super-resolutionHeritage Science10.1186/s40494-023-01123-y12:1Online publication date: 2-Jan-2024
https://doi.org/10.1186/s40494-023-01123-y
Chen XLiu YPu YZhang WZhou JQiao YDong CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning A Low-Level Vision Generalist via Visual Task PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681621(2671-2680)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681621
Liu YWu RYan SCheng XZhu JLiu YZhang M(2024)ATLoc: Aerial Thermal Images Localization via View SynthesisIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.335242162(1-13)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3352421
Show More Cited By

Index Terms

Painting-to-3D model alignment via discriminative visual elements
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Image and video acquisition
        3D imaging

Recommendations

Gloss, Color, and Topography Scanning for Reproducing a Painting’s Appearance Using 3D Printing

High fidelity reproductions of paintings provide new opportunities to museums in preserving and providing access to cultural heritage. This article presents an integrated system that is able to capture and fabricate color, topography and gloss of a ...
Painting with light
SIGGRAPH '02: ACM SIGGRAPH 2002 conference abstracts and applications

This sketch presents fractal art work created by ray tracing the specular highlights of point lights on the inside surface of a hollow sphere. The sphere has a mirror surface on the inside that contributes no colour to the images, but there is spread in ...
Exploiting Local Shape and Material Similarity for Effective SV-BRDF Reconstruction from Sparse Multi-Light Image Collections
We present a practical solution to create a relightable model from small Multi-light Image Collections (MLICs) acquired using standard acquisition pipelines. The approach targets the difficult but very common situation in which the optical behavior of a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 33, Issue 2

March 2014

135 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2603314

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2014

Accepted: 01 November 2013

Received: 01 September 2013

Published in TOG Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

European Institute of Innovation and Technology
Google
MSR-INRIA Laboratory
Agence Nationale de la Recherche

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
1,113
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lyu QZhao NYang YGong YGao J(2024)A diffusion probabilistic model for traditional Chinese landscape painting super-resolutionHeritage Science10.1186/s40494-023-01123-y12:1Online publication date: 2-Jan-2024
https://doi.org/10.1186/s40494-023-01123-y
Chen XLiu YPu YZhang WZhou JQiao YDong CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Learning A Low-Level Vision Generalist via Visual Task PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681621(2671-2680)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681621
Liu YWu RYan SCheng XZhu JLiu YZhang M(2024)ATLoc: Aerial Thermal Images Localization via View SynthesisIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.335242162(1-13)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3352421
Han XShen QHuang JZhang Y(2024)Research on Fire Smoke Detection Algorithm Based on YOLOv82024 17th International Conference on Advanced Computer Theory and Engineering (ICACTE)10.1109/ICACTE62428.2024.10871680(221-225)Online publication date: 13-Sep-2024
https://doi.org/10.1109/ICACTE62428.2024.10871680
Wu PHuang HSun HLiang DLiu N(2024)CPRNC: Channels pruning via reverse neuron crowding for model compressionComputer Vision and Image Understanding10.1016/j.cviu.2024.103942240(103942)Online publication date: Mar-2024
https://doi.org/10.1016/j.cviu.2024.103942
Liang JZhang LBu CYang GWu HSong A(2024)Plug-and-play multi-dimensional attention module for accurate Human Activity RecognitionComputer Networks10.1016/j.comnet.2024.110338244(110338)Online publication date: May-2024
https://doi.org/10.1016/j.comnet.2024.110338
Vijendran MDeng JChen SHo EShum H(2024)Artificial intelligence for geometry-based feature extraction, analysis and synthesis in artistic images: a surveyArtificial Intelligence Review10.1007/s10462-024-11051-358:2Online publication date: 21-Dec-2024
https://doi.org/10.1007/s10462-024-11051-3
Miao YEngelmann FVysotska OTombari FPollefeys MBaráth D(2024)SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene GraphsComputer Vision – ECCV 202410.1007/978-3-031-73242-3_8(127-150)Online publication date: 29-Oct-2024
https://doi.org/10.1007/978-3-031-73242-3_8
Chen NWang HFan GYang DRao L(2023)An End-to-End Robotic Visual Localization Algorithm Based on Deep LearningJournal of Sensors10.1155/2023/23969112023(1-13)Online publication date: 29-Sep-2023
https://doi.org/10.1155/2023/2396911
Song SZhong FWang TQin XTu C(2023)Guided Linear UpsamplingACM Transactions on Graphics10.1145/359245342:4(1-12)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592453
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents