research-article

Open access

3D Wikipedia: using online text to automatically label and navigate reconstructed geometry

Authors:

Bryan C. Russell,

Ricardo Martin-Brualla,

Daniel J. Butler,

Steven M. Seitz,

Luke ZettlemoyerAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 32, Issue 6

Article No.: 193, Pages 1 - 10

https://doi.org/10.1145/2508363.2508425

Published: 01 November 2013 Publication History

Abstract

We introduce an approach for analyzing Wikipedia and other text, together with online photos, to produce annotated 3D models of famous tourist sites. The approach is completely automated, and leverages online text and photo co-occurrences via Google Image Search. It enables a number of new interactions, which we demonstrate in a new 3D visualization tool. Text can be selected to move the camera to the corresponding objects, 3D bounding boxes provide anchors back to the text describing them, and the overall narrative of the text provides a temporal guide for automatically flying through the scene to visualize the world as you read about it. We show compelling results on several major tourist sites.

References

[1]

Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. 2011. Building rome in a day. Communications of the ACM 54, 10 (Oct.), 105--112.

Digital Library

[2]

Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., and Jordan, M. I. 2003. Matching words and pictures. Journal of Machine Learning Research 3, 1107--1135.

Digital Library

[3]

Berg, A. C., Berg, T. L., III, H. D., Dodge, J., Goyal, A., Han, X., Mensch, A., Mitchell, M., Sood, A., Stratos, K., and Yamaguchi, K. 2012. Understanding and predicting importance in images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3562--3569.

Digital Library

[4]

Berlitz International, I. 2003. Berlitz Rome Pocket Guide. Berlitz Pocket Guides Series. Berlitz International, Incorporated.

[5]

Buckley, C. 1995. Automatic query expansion using SMART: TREC 3. In Proceedings of the third Text REtrieval Conference (TREC-3), 69--80.

[6]

Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.

[7]

Cour, T., Sapp, B., and Taskar, B. 2011. Learning from partial labels. Journal of Machine Learning Research 12 (May), 1501--1536.

Digital Library

[8]

Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW), 761--770.

Digital Library

[9]

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2010. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2, 303--338.

Digital Library

[10]

Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., and Forsyth, D. 2010. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision (ECCV), 15--29.

Digital Library

[11]

Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 8, 1362--1376.

Digital Library

[12]

Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2010. Towards internet-scale multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1434--1441.

[13]

Garwood, D., and Hole, A. 2012. Lonely Planet Rome. Travel Guide. Lonely Planet Publications.

[14]

Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.

[15]

Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, second ed. Cambridge University Press, ISBN: 0521540518.

Digital Library

[16]

Hays, J., and Efros, A. A. 2008. IM2GPS: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.

[17]

Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proceedings of the 4th Eurographics Symposium on Geometry Processing (SGP), 61--70.

Digital Library

[18]

Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics, 423--430.

Digital Library

[19]

Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., and Torr, P. H. S. 2012. Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision 100, 2, 122--133.

Digital Library

[20]

Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. 2008. Learning realistic human actions from movies. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.

[21]

Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91--110.

Digital Library

[22]

Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Sratos, K., Han, X., Mensch, A., Berg, A. C., Berg, T. L., and Daumé III, H. 2012. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 747--756.

Digital Library

[23]

Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.

[24]

Raguram, R., Wu, C., Frahm, J.-M., and Lazebnik, S. 2011. Modeling and recognition of landmark image collections using iconic scene graphs. International Journal of Computer Vision 95, 3, 213--239.

Digital Library

[25]

Ren, X., Bo, L., and Fox, D. 2012. RGB-(D) Scene labeling: Features and algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2759--2766.

Digital Library

[26]

Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2008. LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision 77, 1--3, 157--173.

Digital Library

[27]

Salton, G., and Buckley, C. 1999. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41, 4, 288--297.

[28]

Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. 2012. Indoor segmentation and support inference from RGBD images. In European Conference on Computer Vision (ECCV), 746--760.

Digital Library

[29]

Simon, I., and Seitz, S. M. 2008. Scene segmentation using the wisdom of crowds. In European Conference on Computer Vision (ECCV), 541--553.

Digital Library

[30]

Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV), 1470--1477.

Digital Library

[31]

Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics (SIGGRAPH) 25, 3, 835--846.

Digital Library

[32]

Snavely, N., Seitz, S. M., and Szeliski, R. 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 2, 189--210.

Digital Library

[33]

Stop words list. http://norm.al/2009/04/14/list-of-english-stop-words/.

[34]

Wikipedia. http://www.wikipedia.org.

[35]

Wu, C. SiftGPU: A GPU implementation of scale invaraint feature transform (SIFT). http://cs.unc.edu/~ccwu/siftgpu.

[36]

Wu, C. VisualSFM: A visual structure from motion system. http://homes.cs.washington.edu/~ccwu/vsfm/.

[37]

Wu, C., Agarwal, S., Curless, B., and Seitz, S. M. 2011. Multicore bundle adjustment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3057--3064.

Digital Library

Cited By

Dudai CAlper MBezalel HHanocka RLang IAverbuch‐Elor H(2024)HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo CollectionsComputer Graphics Forum10.1111/cgf.1500643:2Online publication date: 15-Apr-2024
https://doi.org/10.1111/cgf.15006
Bönisch KMehler ABabbili SHeinrich YStephan PAbrami G(2024)Viki LibraRy: collaborative hypertext browsing and navigation in virtual realityNew Review of Hypermedia and Multimedia10.1080/13614568.2024.2383581(1-31)Online publication date: 24-Oct-2024
https://doi.org/10.1080/13614568.2024.2383581
Babbili SBönisch KHeinrich YStephan PAbrami GMehler A(2023)Viki LibraRyProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609079(1-3)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3603163.3609079
Show More Cited By

Index Terms

3D Wikipedia: using online text to automatically label and navigate reconstructed geometry
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
    2. Natural language processing
      1. Language resources
  2. Computer graphics
    1. Graphics systems and interfaces
      1. Virtual reality
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Mixed / augmented reality
      2. Virtual reality

Recommendations

Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Evaluating Entity Linking with Wikipedia

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 32, Issue 6

November 2013

671 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2508363

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013

Published in TOG Volume 32, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Pervasive Computing (ISTC-PC)
Intel Corporation
Division of Information and Intelligent Systems
Google

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,050
Total Downloads

Downloads (Last 12 months)174
Downloads (Last 6 weeks)15

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dudai CAlper MBezalel HHanocka RLang IAverbuch‐Elor H(2024)HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo CollectionsComputer Graphics Forum10.1111/cgf.1500643:2Online publication date: 15-Apr-2024
https://doi.org/10.1111/cgf.15006
Bönisch KMehler ABabbili SHeinrich YStephan PAbrami G(2024)Viki LibraRy: collaborative hypertext browsing and navigation in virtual realityNew Review of Hypermedia and Multimedia10.1080/13614568.2024.2383581(1-31)Online publication date: 24-Oct-2024
https://doi.org/10.1080/13614568.2024.2383581
Babbili SBönisch KHeinrich YStephan PAbrami GMehler A(2023)Viki LibraRyProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609079(1-3)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3603163.3609079
Wu XAverbuch-Elor HSun JSnavely N(2021)Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.00048(418-427)Online publication date: Oct-2021
https://doi.org/10.1109/ICCV48922.2021.00048
Lombardo FDaly M(2021)Analyzing suicide life stories on Wikipedia with Highway_star and other textual visualization toolsSN Social Sciences10.1007/s43545-021-00272-w1:11Online publication date: 29-Oct-2021
https://doi.org/10.1007/s43545-021-00272-w
Du RLi DVarshney APolys NMcCann MLiu FPlesch A(2019)Project Geollery.com: Reconstructing A Live Mirrored World With Geotagged Social MediaProceedings of the 24th International Conference on 3D Web Technology10.1145/3329714.3338126(1-9)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3329714.3338126
Wu KLi GLi HZhang JYu Y(2019)Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331846315:3(1-23)Online publication date: 8-Aug-2019
https://dl.acm.org/doi/10.1145/3318463
Du RLi DVarshney ABrewster SFitzpatrick GCox AKostakos V(2019)GeolleryProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300915(1-13)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300915
Du RLi DVarshney A(2019)Interactive Fusion of 360° Images for a Mirrored World2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)10.1109/VR.2019.8798187(900-901)Online publication date: Mar-2019
https://doi.org/10.1109/VR.2019.8798187
Wang ZShi WAkoglu KKotoula EYang YRushmeier H(2018)CHER-ObJournal on Computing and Cultural Heritage 10.1145/323067311:4(1-22)Online publication date: 19-Nov-2018
https://dl.acm.org/doi/10.1145/3230673
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents