skip to main content
research-article

SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips

Published:03 November 2015Publication History
Skip Abstract Section

Abstract

We present a new interactive and online approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the environment. Unlike offline systems where capture, labeling, and batch learning often take hours or even days to perform, our approach is fully online. This provides users with continuous live feedback of the recognition during capture, allowing to immediately correct errors in the segmentation and/or learning—a feature that has so far been unavailable to batch and offline methods. This leads to models that are tailored or personalized specifically to the user's environments and object classes of interest, opening up the potential for new applications in augmented reality, interior design, and human/robot navigation. It also provides the ability to capture substantial labeled 3D datasets for training large-scale visual recognition systems.

Skip Supplemental Material Section

Supplemental Material

References

  1. M. Abdelrahman, M. Aono, M. El-Elegy, A. Farag, A. Fereira, H. Johan, B. Li, Y. Lu, J. Machado, P.-B. Pascoal, and A. Tatsuma. 2013. SHREC13: Retrieval of objects captured with low-cost depth-sensing cameras. In Proceedings of the 6th Eurographics Workshop on 3D Object Retrieval (3DOR'13). 65--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Anand, H. S. Koppula, T. Joachims, and A. Saxena. 2013. Contextally guided semantic labeling and search for three-dimensional point clouds. The Int. J. Robot. Res. 32, 1, 19--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Bay, A. Ess, T. Tuytelaars, and van L. Gool. 2008. Surf: Speeded up robust features. In Proceedings of the IEEE Conference on Computer Vision and Image Understanding (CVIU'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. gavaldà. 2009. New evolving data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Bonde, V. Badrinarayanan, and R. Cipolla. 2013. Multi scale shape index for 3D object recognition. In Scale Space and Variational Methods in Computer Vision. Springer, 306--318.Google ScholarGoogle Scholar
  6. Y. Boykov, O. Versler, and R. Zabih. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Breiman. 2001. Random forests. Mach. Learn. 45, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. 2008. Segmentation and recognition using structure from motion point clouds. In Proceedings of the European Conference on Computer Vision (ECCV'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. O. Castle, D. Gawley, G. Klein, and D. W. Murray. 2007. Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'07).Google ScholarGoogle Scholar
  10. J. Chen, D. Bautembach, and S. Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. 32, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Chen, A. Golovinskiy, and T. Funkhouser. 2009. A benchmark for 3D mesh segmentation. ACM Trans. Graph. 28, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-M. Cheng, S. Zheng, W.-Y. Lin, V. Vineet, P. Sturgess, N. Crook, N. Mitra, and P. Torr. 2014. ImageSpirit: Verbal guided image parsing. ACM Trans. Graph. 34, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Couprie, C. Farabet, L. Najman, and Y. Lecun. 2013. Indoor semantic segmentation using depth information. http://arxiv.org/abs/1301.3572.Google ScholarGoogle Scholar
  14. A. Criminisi and J. Shotton. 2013. Decision Forests for Computer Vision and Medical Image Analysis. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Curless and M. Levoy. 1996. A volumetric method for building complex models from range images. In Proceedings of the 23rd Annual ACM Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'96). ACM Press, New York, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Domingos and G. Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Drost, M. Ulrich, N. Navar, and S. Ilic. 2010. Model globally, match locally: Efficient and robust 3D object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  19. N. Fioraio and L. di Stefano. 2013. Joint detection, tracking and mapping by semantic bundle adjustment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Geiger, P. Lenz, and R. Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Gupta, A. A. Efros, and M. Hebert. 2010. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Häne, C. Zach, A. Cohen, R. Angst, and M. Pollefeys. 2013. Joint 3D scene reconstruction and class segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Herbst, P. Henry, and D. Fox. 2014. Toward online 3-D object segmentation and mapping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'14).Google ScholarGoogle Scholar
  24. H. Hirschmuller. 2008. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2, 328--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Ioanou, B. Taati, R. Harrap, and M. Greenspan. 2012. Difference of normals as a multi-scale operator in unorganized point clouds. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization, and Transmission (3DIMPVT'12). 501--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, J. Shotton, S. Hodges, D. Freeman, A. Davidson, and A. Fitzgibbon. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST'11). 559--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Johnson. 1997. Spin-images: A representation for 3-D surface matching. Ph.D. thesis, Robotics Institute, Carnegie Mellon University.Google ScholarGoogle Scholar
  28. O. Kähler and I. Reid. 2013. Efficient 3D scene labeling using fields of trees. In Procceings of the International Conference on Computer Vision (ICCV'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Kalogerakis, A. Hertzmann, and K. Singh. 2010. Learning 3D mesh segmentation and labeling. ACM Trans. Graph. 29, 4, 102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Karpathy, S. Miller, and L. Fei-Fei. 2013. Object discovery in 3D scenes via shape analysis. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'13).Google ScholarGoogle Scholar
  31. B.-S. Kim, P. Kohli, and S. Savarese. 2013a. 3D scene understanding by voxel-CRF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. Diverdi, and T. Funkhouser. 2013b. Learning part-based templates from large collections of 3D shapes. ACM Trans. Graph. 32, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. M. Kim, N. J. Mitra, D.-M. Yan, and L. Guibas. 2012. Acquiring 3D indoor environments with variability and repetition. ACM Trans. Graph. 31, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Kohli, L. Ladicky, and P. H. S. Torr. 2009. Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vis. 82, 3, 302--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Koller and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. S. Koppula, A. Anand, T. Joachims, and A. Saxena. 2011. Semantic labeling of 3D point clouds for indoor scenes. In Proceedings of the Conference on Neural Information Processing Systems (NIPS'11).Google ScholarGoogle Scholar
  37. P. Krähenbühl and V. Koltun. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the Conference on Neural Information Processing Systems (NIPS'11).Google ScholarGoogle Scholar
  38. A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Conference on Neural Information Processing Systems (NIPS'12).Google ScholarGoogle Scholar
  39. L. Ladickỳ, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. Clocksin, and P. H. Torr. 2012. Joint optimization for object class segmentation and dense stereo reconstruction. Int. J. Comput. Vis. 100, 2, 122--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Lafferty, A. Mccallum, and F. C. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML'01). 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Lai, L. Bo, X. Ren, and D. Fox. 2011. A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'11).Google ScholarGoogle Scholar
  42. V. Lepetit and P. Fua. 2006. Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 28, 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginzton, S. Anderson, J. Davis, J. Ginsberg, et al. 2000. The digital Michaelangelo project. 3D scanning of large statues. In Proceedings of the Annual ACM Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'00). ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Lin, S. Fidler, and R. Urtasun. 2013a. Holistic scene understanding for 3D object detection with RGBD camera. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. H. Lin, J. Gao, Y. Zhou, G. Lu, M. Ye, C. Zhang, L. Liu, and R. Yang. 2013b. Semantic decomposition and reconstruction of residential scenes from LiDAR data. ACM Trans. Graph. 32, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Merrell, E. Schkufza, Z. Li, M. Agrawala, and V. Koltun. 2011. Interactive furniture layout using interior design guidelines. ACM Trans. Graph. 30, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. L. Nan, K. Xie, and A. Sharf. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. 31, 6, 137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A Fitzgibbon. 2011a. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. 2011b. DTAM: Dense tracking and mapping in real-time. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Niessner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Pollefeys, D. Nistéer, J. Frahm, A. Akbarzadeh, P. Mordoral, B. Cliff, C. Engels, D. Gallup, S. Kim, P. Merrell, et al. 2008. Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vis. 78, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. I. Posner, M. Cummins, and P. Newman. 2009. A generative framework for fast urban labeling using spatial and temporal context. Autom. Robot. 26, 2--3, 153--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and S. Bathiche. 2013. MonoFusion: Real-time 3D reconstruction of small scenes with a single Web camera. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR'13). 83--88.Google ScholarGoogle Scholar
  55. F. Ramos, J. Nieto, and H. Durrant-Whyte. 2008. Combining object recognition and SLAM for extended map representations. In Experimental Robotics. Springer, 55--64.Google ScholarGoogle Scholar
  56. X. Ren, L. Bo, and D. Fox. 2012. ROB-(D) scene labeling: Features and algorithms. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. L. G. Roberts. 1963. Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  58. C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut -- Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3, 309--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. 2002. Real-time 3D model acquisition. ACM Trans. Graph. 21, 3, 438--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: A database and Web-based tool for image annotation. Int. J. Comput. Vis. 77, 1--3, 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischop. 2009. On-line random forests. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW'09).Google ScholarGoogle Scholar
  62. R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison. 2013. SLAM++: Simultaneous localization and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Sengupta, E. Greveson, A. Shahrokni, and P. H. Torr. 2013. Urban 3D semantic modelling using stereo vision. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'13).Google ScholarGoogle Scholar
  64. Q. Shan, R. Adams, B. Curless, Y. Furukawa, and S. M. Seitz. 2013. The visual Turing test for scene reconstruction. In Proceedings of the International Conference on 3D (Vision-3DV). 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, and B. Guo. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. 31, 6, 136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. L. Shapira, S. Shalom, A. Shamir, D. Cohen-Or, and H. Zhang. 2010. Contextual part analogies in 3D objects. Int. J. Comput. Vis. 89, 2--3, 309--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. T. Sharp. 2008. Implementing decision trees and forests on a GPU. In Proceedings of the European Conference on Computer Vision (ECCV'08). Springer, 595--608.Google ScholarGoogle ScholarCross RefCross Ref
  68. C.-H. Shen, H. Fu, K. Chen, and S.-M. Hu. 2012. Structure recovery by part assembly. ACM Trans. Graph. 31, 6, 180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. J. Shotton, J. Winn, C. Rother, and A. Criminisi. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. N. Silberman and R. Fergus. 2011. Indoor scene segmentation using a structured light sensor. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW'11).Google ScholarGoogle Scholar
  72. N. Silberman, D. Hoiem, P. Kohli, and B. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the European Conference on Computer Vision (ECCV'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. N. Snavely, S. M. Seitz, and R. Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 25, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. J. Stückler, B. Waldvogel, H. Schultz, and S. Behnke. 2013. Dense real-time mapping of object-class semantics from RGB-D video. http://www.ais.uni-bonn.de/papers/JRTIP_2014_Stueckler_RT_SemanticSLAM.pdf.Google ScholarGoogle Scholar
  75. J. P. Valentin, S. Sengupta, J. Warrell, A. Shahrokni, and P. H. Torr. 2013. Mesh based semantic modelling for indoor and outdoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. V. Vineet and P. Narayanan. 2008. CUDA cuts: Fast graph cuts on the GPU. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR'08). 1--8.Google ScholarGoogle Scholar
  77. J. S. Vitter. 1985. Random sampling with a reservoir. ACM Trans. Graph. 11, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Y. Wang, J. Feng, Z. Wu, J. Wang, and S.-F. Chang. 2014. From low-cost depth sensors to CAD: Cross-domain 3D shape retrieval via regression tree fields. In Proceedings of the European Conference on Computer Vision (ECCV'14).Google ScholarGoogle ScholarCross RefCross Ref
  79. J. Xiao. 2014. A 2D+3D rich data approach to scene understanding. Ph.D. thesis, Massachusetts Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. 2010. SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  81. J. Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A database of big spaces reconstructed using SFM and object labels. In Proceedings of the International Conference on Computer Vision (ICCV'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. A. Yao, J. Gall, C. Leistner, and L. Van Gool. 2012. Interactive object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12). 3242--3249. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 34, Issue 5
        October 2015
        188 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2843519
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2015
        • Accepted: 1 March 2015
        • Revised: 1 February 2015
        • Received: 1 August 2014
        Published in tog Volume 34, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader