skip to main content
research-article
Open access

3D Wikipedia: using online text to automatically label and navigate reconstructed geometry

Published: 01 November 2013 Publication History

Abstract

We introduce an approach for analyzing Wikipedia and other text, together with online photos, to produce annotated 3D models of famous tourist sites. The approach is completely automated, and leverages online text and photo co-occurrences via Google Image Search. It enables a number of new interactions, which we demonstrate in a new 3D visualization tool. Text can be selected to move the camera to the corresponding objects, 3D bounding boxes provide anchors back to the text describing them, and the overall narrative of the text provides a temporal guide for automatically flying through the scene to visualize the world as you read about it. We show compelling results on several major tourist sites.

References

[1]
Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. 2011. Building rome in a day. Communications of the ACM 54, 10 (Oct.), 105--112.
[2]
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., and Jordan, M. I. 2003. Matching words and pictures. Journal of Machine Learning Research 3, 1107--1135.
[3]
Berg, A. C., Berg, T. L., III, H. D., Dodge, J., Goyal, A., Han, X., Mensch, A., Mitchell, M., Sood, A., Stratos, K., and Yamaguchi, K. 2012. Understanding and predicting importance in images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3562--3569.
[4]
Berlitz International, I. 2003. Berlitz Rome Pocket Guide. Berlitz Pocket Guides Series. Berlitz International, Incorporated.
[5]
Buckley, C. 1995. Automatic query expansion using SMART: TREC 3. In Proceedings of the third Text REtrieval Conference (TREC-3), 69--80.
[6]
Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.
[7]
Cour, T., Sapp, B., and Taskar, B. 2011. Learning from partial labels. Journal of Machine Learning Research 12 (May), 1501--1536.
[8]
Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW), 761--770.
[9]
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2010. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2, 303--338.
[10]
Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., and Forsyth, D. 2010. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision (ECCV), 15--29.
[11]
Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 8, 1362--1376.
[12]
Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2010. Towards internet-scale multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1434--1441.
[13]
Garwood, D., and Hole, A. 2012. Lonely Planet Rome. Travel Guide. Lonely Planet Publications.
[14]
Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.
[15]
Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, second ed. Cambridge University Press, ISBN: 0521540518.
[16]
Hays, J., and Efros, A. A. 2008. IM2GPS: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.
[17]
Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proceedings of the 4th Eurographics Symposium on Geometry Processing (SGP), 61--70.
[18]
Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics, 423--430.
[19]
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., and Torr, P. H. S. 2012. Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision 100, 2, 122--133.
[20]
Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. 2008. Learning realistic human actions from movies. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.
[21]
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91--110.
[22]
Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Sratos, K., Han, X., Mensch, A., Berg, A. C., Berg, T. L., and Daumé III, H. 2012. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 747--756.
[23]
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.
[24]
Raguram, R., Wu, C., Frahm, J.-M., and Lazebnik, S. 2011. Modeling and recognition of landmark image collections using iconic scene graphs. International Journal of Computer Vision 95, 3, 213--239.
[25]
Ren, X., Bo, L., and Fox, D. 2012. RGB-(D) Scene labeling: Features and algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2759--2766.
[26]
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2008. LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision 77, 1--3, 157--173.
[27]
Salton, G., and Buckley, C. 1999. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41, 4, 288--297.
[28]
Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. 2012. Indoor segmentation and support inference from RGBD images. In European Conference on Computer Vision (ECCV), 746--760.
[29]
Simon, I., and Seitz, S. M. 2008. Scene segmentation using the wisdom of crowds. In European Conference on Computer Vision (ECCV), 541--553.
[30]
Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV), 1470--1477.
[31]
Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics (SIGGRAPH) 25, 3, 835--846.
[32]
Snavely, N., Seitz, S. M., and Szeliski, R. 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 2, 189--210.
[33]
Stop words list. http://norm.al/2009/04/14/list-of-english-stop-words/.
[34]
Wikipedia. http://www.wikipedia.org.
[35]
Wu, C. SiftGPU: A GPU implementation of scale invaraint feature transform (SIFT). http://cs.unc.edu/~ccwu/siftgpu.
[36]
Wu, C. VisualSFM: A visual structure from motion system. http://homes.cs.washington.edu/~ccwu/vsfm/.
[37]
Wu, C., Agarwal, S., Curless, B., and Seitz, S. M. 2011. Multicore bundle adjustment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3057--3064.

Cited By

View all
  • (2024)HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo CollectionsComputer Graphics Forum10.1111/cgf.1500643:2Online publication date: 15-Apr-2024
  • (2024)Viki LibraRy: collaborative hypertext browsing and navigation in virtual realityNew Review of Hypermedia and Multimedia10.1080/13614568.2024.2383581(1-31)Online publication date: 24-Oct-2024
  • (2023)Viki LibraRyProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609079(1-3)Online publication date: 4-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 32, Issue 6
November 2013
671 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2508363
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013
Published in TOG Volume 32, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D visualization
  2. Wikipedia
  3. image-based modeling and rendering
  4. natural language processing

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)174
  • Downloads (Last 6 weeks)15
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo CollectionsComputer Graphics Forum10.1111/cgf.1500643:2Online publication date: 15-Apr-2024
  • (2024)Viki LibraRy: collaborative hypertext browsing and navigation in virtual realityNew Review of Hypermedia and Multimedia10.1080/13614568.2024.2383581(1-31)Online publication date: 24-Oct-2024
  • (2023)Viki LibraRyProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609079(1-3)Online publication date: 4-Sep-2023
  • (2021)Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.00048(418-427)Online publication date: Oct-2021
  • (2021)Analyzing suicide life stories on Wikipedia with Highway_star and other textual visualization toolsSN Social Sciences10.1007/s43545-021-00272-w1:11Online publication date: 29-Oct-2021
  • (2019)Project Geollery.com: Reconstructing A Live Mirrored World With Geotagged Social MediaProceedings of the 24th International Conference on 3D Web Technology10.1145/3329714.3338126(1-9)Online publication date: 26-Jul-2019
  • (2019)Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331846315:3(1-23)Online publication date: 8-Aug-2019
  • (2019)GeolleryProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300915(1-13)Online publication date: 2-May-2019
  • (2019)Interactive Fusion of 360° Images for a Mirrored World2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)10.1109/VR.2019.8798187(900-901)Online publication date: Mar-2019
  • (2018)CHER-ObJournal on Computing and Cultural Heritage 10.1145/323067311:4(1-22)Online publication date: 19-Nov-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media