skip to main content
research-article

Painting-to-3D model alignment via discriminative visual elements

Published: 08 April 2014 Publication History

Abstract

This article describes a technique that can reliably align arbitrary 2D depictions of an architectural site, including drawings, paintings, and historical photographs, with a 3D model of the site. This is a tremendously difficult task, as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, for example, due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the painting to a large 3D model, such as a partial reconstruction of a city, is huge. To address these issues, we develop a new compact representation of complex 3D scenes. The 3D model of the scene is represented by a small set of discriminative visual elements that are automatically learned from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learned in a discriminative fashion. We show that the learned visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, historical photograph) and structural changes (e.g., missing scene parts, large occluders) of the scene. We demonstrate an application of the proposed approach to automatic rephotography to find an approximate viewpoint of historical paintings and photographs with respect to a 3D model of the site. The proposed alignment procedure is validated via a human user study on a new database of paintings and sketches spanning several sites. The results demonstrate that our algorithm produces significantly better alignments than several baseline methods.

Supplementary Material

JPG File (a14-sidebyside.jpg)
MP4 File (a14-sidebyside.mp4)

References

[1]
D. Aliaga, P. Rosen, and D. Bekins. 2007. Style grammars for interactive visualization of architecture. IEEE Trans. Vis. Comput. Graph. 13, 4.
[2]
G. Baatz, O. Saurer, K. Koser, and M. Pollefeys. 2012. Large scale visual geo-localization of images in mountainous terrain. In Proceedings of the European Conference on Computer Vision.
[3]
L. Baboud, M. Cadik, E. Eisemann, and H.-P. Seidel. 2011. Automatic photo-to-terrain alignment for the annotation of mountain pictures. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[4]
F. Bach. and Z. Harchaoui. 2008. Diffrac: A discriminative and flexible framework for clustering. In Advances in Neural Information Processing Systems.
[5]
S. Bae, A. Agarwala, and F. Durand. 2010. Computational rephotography. ACM Trans. Graph. 29, 3.
[6]
L. Ballan, G. Brostow, J. Puwein, and M. Pollefeys. 2010. Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Trans. Graph. 29, 4.
[7]
C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
[8]
F. Bosche. 2010. Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv. Engin. Inf. 24, 1, 107--118.
[9]
O. Chum and J. Matas. 2006. Geometric hashing with local affine frames. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[10]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[11]
T. Dean, M. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik. 2013. Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[12]
P. E. Debevec, C. J. Taylor, and J. Malik. 1996. Modeling and rendering architecture from photographs. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'96). 1--20.
[13]
C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros. 2012. What makes Paris look like Paris? ACM Trans. Graph. 31, 4.
[14]
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1, 1871--1874.
[15]
P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan. 2010. Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645.
[16]
M. A. Fischler and R. C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24, 6, 381--395.
[17]
A. Frome, Y. Singer, F. Sha, and J. Malik. 2007. Learning globally-consistent local distance functions for shape-based image retrieval and classification. In Proceedings of the International Conference on Computer Vision.
[18]
Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. 2010. Towards Internet-scale multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[19]
Y. Furukawa and J. Ponce. 2010. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8.
[20]
M. Gharbi, T. Malisiewicz, S. Paris, and F. Durand. 2012. A Gaussian approximation of feature space for fast image similarity. Tech. rep. MIT-CSAIL-TR-2012-032. http://people.csail.mit.edu/tomasz/papers/gharbi_techreport_2012.pdf.
[21]
B. Hariharan, J. Malik, and D. Ramanan. 2012. Discriminative decorrelation for clustering and classification. In Proceedings of the European Conference on Computer Vision.
[22]
R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry in Computer Vision 2nd Ed. Cambridge University Press.
[23]
D. Hauagge and N. Snavely. 2012. Image matching using local symmetry features. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[24]
D. P. Huttenlocher and S. Ullman. 1987. Object recognition using alignment. In Proceedings of the International Conference on Computer Vision.
[25]
A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[26]
A. Jain, A. Gupta, M. Rodriguez, and L. S. Davis. 2013. Representing videos using mid-level discriminative patches. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[27]
M. Juneja, A. Vedaldi, C. V. Jawahar, and A. Zisserman. 2013. Blocks that shout: Distinctive parts for scene classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[28]
T. Kailath. 1967. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Comm. Technol. 15, 1, 52--60.
[29]
J. Kopf, B. Neubert, B. Chen, M. Cohen, D. Cohen-Or, O. Deussen, M. Uyttendaele, and D. Lischinski. 2008. Deep photo: Model-based photograph enhancement and viewing. ACM Trans. Graph. 27, 5.
[30]
G. Levin and P. Debevec. 1999. Rouen revisited -- Interactive installation. http://acg.media.mit.edu/people/golan/rouen/.
[31]
Y. Li, N. Snavely, D. Huttenlocher, and P. Fua. 2012. Worldwide pose estimation using 3D point clouds. In Proceedings of the European Conference on Computer Vision.
[32]
D. Lowe. 1987. The viewpoint consistency constraint. Int. J. Comput. Vis. 1, 1, 57--72.
[33]
D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.
[34]
T. Malisiewicz, A. Gupta, and A. A. Efros. 2011. Ensemble of exemplar-SVMs for object detection and beyond. In Proceedings of the International Conference on Computer Vision.
[35]
P. Musialski, P. Wonka, D. Aliaga, M. Wimmer, L. Van Gool, W. Purgathofer, N. Mitra, M. Pauly, M. Wand, and D. Ceylan, et al. 2012. A survey of urban reconstruction. In Eurographics State of the Art Reports.
[36]
A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175.
[37]
J. Rapp. 2008. A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: An application of geometric restitution of perspective. J. Archit. 13, 6.
[38]
B. C. Russell, J. Sivic, J. Ponce, and H. Dessales. 2011. Automatic alignment of paintings and photographs depicting a 3D scene. In Proceedings of the IEEE Workshop on 3D Representation for Recognition (3dRR'11).
[39]
T. Sattler, B. Leibe, and L. Kobbelt. 2011. Fast image-based localization using direct 2D-to-3D matching. In Proceedings of the International Conference on Computer Vision.
[40]
G. Schindler, M. Brown, and R. Szeliski. 2007. City-scale location recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[41]
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Math. Program. Series B 127, 1, 3--30.
[42]
E. Shechtman and M. Irani. 2007. Matching local self-similarities across images and videos. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[43]
A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros. 2011. Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. 30, 6.
[44]
S. Singh, A. Gupta, and A. A. Efros. 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision.
[45]
J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision.
[46]
N. Snavely, S. M. Seitz, and R. Szeliski. 2006. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph. 25, 3, 835--846.
[47]
R. Szeliski. 2006. Image alignment and stitching: A tutorial. Foundat. Trends Comput. Graph. Vis. 2, 1, 1--104.
[48]
R. Szeliski and P. Torr. 1998. Geometrically constrained structure from motion: Points on planes. In European Workshop on 3D Structure from Multiple Images of Large-Scale Environments (SMILE'98).
[49]
C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys. 2008. 3D model matching with viewpoint invariant patches (VIPs). In Proceedings of the Conference on Computer Vision and Pattern Recognition.

Cited By

View all
  • (2024)A diffusion probabilistic model for traditional Chinese landscape painting super-resolutionHeritage Science10.1186/s40494-023-01123-y12:1Online publication date: 2-Jan-2024
  • (2024)Learning A Low-Level Vision Generalist via Visual Task PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681621(2671-2680)Online publication date: 28-Oct-2024
  • (2024)ATLoc: Aerial Thermal Images Localization via View SynthesisIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.335242162(1-13)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 33, Issue 2
March 2014
135 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2603314
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2014
Accepted: 01 November 2013
Received: 01 September 2013
Published in TOG Volume 33, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D alignment
  2. 3D models
  3. CAD models
  4. historical photographs
  5. paintings
  6. rephotography
  7. sketches

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A diffusion probabilistic model for traditional Chinese landscape painting super-resolutionHeritage Science10.1186/s40494-023-01123-y12:1Online publication date: 2-Jan-2024
  • (2024)Learning A Low-Level Vision Generalist via Visual Task PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681621(2671-2680)Online publication date: 28-Oct-2024
  • (2024)ATLoc: Aerial Thermal Images Localization via View SynthesisIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.335242162(1-13)Online publication date: 2024
  • (2024)Research on Fire Smoke Detection Algorithm Based on YOLOv82024 17th International Conference on Advanced Computer Theory and Engineering (ICACTE)10.1109/ICACTE62428.2024.10871680(221-225)Online publication date: 13-Sep-2024
  • (2024)CPRNC: Channels pruning via reverse neuron crowding for model compressionComputer Vision and Image Understanding10.1016/j.cviu.2024.103942240(103942)Online publication date: Mar-2024
  • (2024)Plug-and-play multi-dimensional attention module for accurate Human Activity RecognitionComputer Networks10.1016/j.comnet.2024.110338244(110338)Online publication date: May-2024
  • (2024)Artificial intelligence for geometry-based feature extraction, analysis and synthesis in artistic images: a surveyArtificial Intelligence Review10.1007/s10462-024-11051-358:2Online publication date: 21-Dec-2024
  • (2024)SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene GraphsComputer Vision – ECCV 202410.1007/978-3-031-73242-3_8(127-150)Online publication date: 29-Oct-2024
  • (2023)An End-to-End Robotic Visual Localization Algorithm Based on Deep LearningJournal of Sensors10.1155/2023/23969112023(1-13)Online publication date: 29-Sep-2023
  • (2023)Guided Linear UpsamplingACM Transactions on Graphics10.1145/359245342:4(1-12)Online publication date: 26-Jul-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media