ABSTRACT
We consider the problem of continuous computer-vision based analysis of video streams from mobile cameras over extended periods. Given high computational demands, general visual processing must currently be offloaded to the cloud. To reduce mobile battery and bandwidth consumption, recent proposals offload only "interesting" video frames, discarding the rest. However, determining what to discard is itself typically a power-hungry computer vision calculation, very often well beyond what most mobile devices can afford on a continuous basis. We present the Glimpse system, a re-design of the conventional mobile video processing pipeline to support such "early discard" flexibly, efficiently and accurately. Glimpse is a novel architecture that gates wearable vision using low-power vision modalities. Our proposed architecture adds novel sensing, processing, algorithmic and programming-system components to the camera pipeline to this end. We present a complete implementation and evaluation of our design. In common settings, Glimpse reduces mobile power and data usage by more than one order of magnitude relative to earlier designs, and moves continuous vision on lightweight wearables to the realm of the practical.
- Ambarella a9 ultra hd 4k camera soc product brief. http://www.ambarella.com/uploads/docs/A9-product-brief.pdf.Google Scholar
- Long distance measuring sensor datasheet. http://www.sharp-world.com/products/device/lineup/data/pdf/datasheet/gp2y0a02yke.pdf. Accessed: May 2017.Google Scholar
- Stereographic depth mapping on an fpga. https://courses.cit.cornell.edu/ece576/FinalProjects/f2010/pfk5 jk459/pfk5jk459/index.htmlGoogle Scholar
- S. Agarwal, M. Philipose, and P. Bahl. Vision: The case for cellular small cells for cloudlets. In International Workshop on Mobile Cloud Computing and Services, 2014. Google ScholarDigital Library
- S. U. Ay. A 1.32 pw/frame pixel 1.2 v cmos energy-harvesting and imaging (ehi) aps imager. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 116--118. IEEE, 2011.Google ScholarCross Ref
- P. Bahl, M. Philipose, and L. Zhong. Vision: cloud-powered sight for all: showing the cloud what you see. In Proceedings of the third ACM workshop on Mobile cloud computing and services, pages 53--60. ACM, 2012. Google ScholarDigital Library
- S. Bambach, J. M. Franchak, D. J. Crandall, and C. Yu. Detecting hands in children's egocentric views to understand embodied attention during social interaction. Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 134--139). Quebec City, Canda: Cognitive Science Society, pages 134--139, 2014.Google Scholar
- S. Blessenohl, C. Morrison, A. Criminisi, and J. Shotton. Improving indoor mobility of the visually impaired with depth-based spatial sound. In ICCV-ACVR workshop, December 2015. Google ScholarDigital Library
- T. Y. Chen, L. Ravindranath, S. Deng, P. Bahl, and H. Balakrishnan. Glimpse: Continuous, real-time object recognition on mobile devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys 2015, Seoul, South Korea, November 1--4, 2015, pages 155--168, 2015. Google ScholarDigital Library
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In In CVPR, pages 886--893, 2005. Google ScholarDigital Library
- Y. Deng, S. Chakrabartty, and G. Cauwenberghs. Analog auditory perception model for robust speech recognition. In IEEE International Joint Conference on Neural Networks, 2004.Google ScholarCross Ref
- E. L. Dereniak and G. D. Boreman. Infared Detectors and Systems. Wiley, second edition, 1996.Google Scholar
- V. L. Erickson, A. Beltran, D. A. Winkler, N. P. Esfahani, J. R. Lusby, and A. E. Cerpa. Toss: Thermal occupancy sensing system. In Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings, pages 1--2. ACM, 2013. Google ScholarDigital Library
- K. Ha, Z. Chen, W. Hu, W. Richter, P. Pillai, and M. Satyanarayanan. Towards wearable cognitive assistance. In MobiSys, 2014. Google ScholarDigital Library
- S. Han, R. Nandakumar, M. Philipose, A. Krishna-murthy, and D. Wetherall. Glimpsedata: Towards continuous vision-based personal analytics. In Proceedings of the 2014 workshop on physical analytics, pages 31--36. ACM, 2014. Google ScholarDigital Library
- S. Han, H. Shen, M. Philipose, S. Agarwal, A. Wolman, and A. Krishnamurthy. MCDNN:An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In Proceedings of the 14th International Conference on Mobile Systems, Applications, and Services (MobiSys). ACM, 2016. Google ScholarDigital Library
- S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, S. L. Hicks, and P. H. S. Torr. Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10):2096--2109, Oct 2016. Google ScholarDigital Library
- R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition, 2004. Google ScholarDigital Library
- J. Healey and R. W. Picard. Startlecam: A cybernetic wearable camera. In Second International Symposium on Wearable Computers (ISWC 1998), pages 42--49, 1998. Google ScholarDigital Library
- P. Hevesi, S. Wille, G. Pirkl, N. Wehn, and P. Lukowicz. Monitoring household activities and user location with a cheap, unobtrusive thermal sensor array. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 141--145. ACM, 2014. Google ScholarDigital Library
- S. Hodges, L. Williams, E. Berry, S. Izadi, J. Srinivasan, A. Butler, G. Smyth, N. Kapur, and K. R. Wood. Sense- cam: A retrospective memory aid. In UbiComp 2006, pages 177--193, 2006. Google ScholarDigital Library
- J. Hoisko. Context triggered visual episodic memory prosthesis. In Fourth International Symposium on Wearable Computers (ISWC200. Google ScholarDigital Library
- K. Kagawau, S. Shishido, M. Nunoshita, and J. Ohta. A 3.6 pw/frame pixel 1.35 v pwm cmos imager with dynamic pixel readout and no static bias current. In Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pages 54--595. IEEE, 2008.Google ScholarCross Ref
- T. Kanade and M. Hebert. First-person vision. Pro- ceedings of the IEEE, 100(8):2442--2453, 2012.Google Scholar
- S. Krishna, G. Little, J. Black, and S. Panchanathan. A wearable face recognition system for individuals with visual impairments. In Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility, Assets '05, pages 106--113, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- N. Lazaros, G. C. Sirakoulis, and A. Gasteratos. Review of stereo vision algorithms: From software to hardware. International Journal of Optomechatronics, 2(4):435--462, 2008.Google ScholarCross Ref
- R. LiKamWa, B. Priyantha, M. Philipose, L. Zhong, and P. Bahl. Energy characterization and optimization of image sensing toward continuous mobile vision. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services, pages 69 82. ACM, 2013. Google ScholarDigital Library
- R. LiKamWa, Z. Wang, A. Carroll, F. X. Lin, and L. Zhong. Draining our glass: An energy and heat characterization of google glass. arXiv preprint arXiv:1404.1320, 2014.Google Scholar
- R. LiKamWa and L. Zhong. Starfish: Efficient concurrency support for computer vision applications. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pages 213--226. ACM, 2015. Google ScholarDigital Library
- H. Liu, M. Philipose, M. Pettersson, and M.-T. Sun. Recognizing object manipulation activities using depth and visual cues. Journal of Visual Communication and Image Representation, 25(4):719--726, 2014. Google ScholarDigital Library
- D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, vol- ume 2, pages 1150--1157 vol.2, 1999. Google ScholarDigital Library
- H. Lu, A. B. Brush, B. Priyantha, A. K. Karlson, and J. Liu. Speakersense: Energy efficient unobtru- sive speaker identification on mobile phones. In In- ternational Conference on Pervasive Computing, pages 188--205. Springer, 2011. Google ScholarDigital Library
- B. Mandal, S. Chia, L. Li, V. Chandrasekhar, C. Tan, and J. Lim. A wearable face recognition system on google glass for assisting social interactions. In Computer Vision - ACCV 2014 Workshops - Singapore, Singapore, November 1--2, 2014, Revised Selected Papers, Part III, pages 419--433, 2014.Google Scholar
- C. Murphy, D. Lindquist, A. M. Rynning, T. Cecil, S. Leavitt, and M. L. Chang. Low-cost stereo vision on an fpga. In Field-Programmable Custom Computing Machines, 2007. FCCM 2007. 15th Annual IEEE Symposium on, pages 333--334. IEEE, 2007. Google ScholarDigital Library
- S. Naderiparizi, Z. Kapetanovic, and J. R. Smith. Wis- pcam: An rf-powered smart camera for machine vision applications. In Proceedings of the 4th International Workshop on Energy Harvesting and Energy-Neutral Sensing Systems, pages 19--22. ACM, 2016. Google ScholarDigital Library
- S. Naderiparizi, A. N. Parks, Z. Kapetanovic, B. Ransford, and J. R. Smith. Wispcam: A battery-free rfid camera. In RFID (RFID), 2015 IEEE International Conference on, pages 166--173. IEEE, 2015.Google ScholarCross Ref
- S. Naderiparizi, Y. Zhao, J. Youngquist, A. P. Sample, and J. R. Smith. Self-localizing battery-free cameras. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 445--449. ACM, 2015. Google ScholarDigital Library
- G. Nebehay and R. Pflugfelder. Consensus-based matching and tracking of keypoints for object tracking. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, pages 862--869, March 2014.Google ScholarCross Ref
- N. Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62--66, Jan 1979.Google ScholarCross Ref
- O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, 2015.Google ScholarCross Ref
- H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person camera views. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2847--2854. IEEE, 2012. Google ScholarDigital Library
- H. Pirsiavash, D. Ramanan, and C. C. Fowlkes. Globally-optimal greedy algorithms for tracking a variable number of objects. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1201--1208. IEEE, 2011. Google ScholarDigital Library
- M.-R. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govindan. Odessa: Enabling interactive perception applications on mobile devices. In Mobisys, 2011. Google ScholarDigital Library
- S. Rallapalli, A. Ganesan, K. Chintalapudi, V. N. Pad-manabhan, and L. Qiu. Enabling physical analytics in retail stores using smart glasses. In Proceedings of the 20th annual international conference on Mobile computing and networking, pages 115--126. ACM, 2014. Google ScholarDigital Library
- V. Talla, B. Kellogg, B. Ransford, S. Naderiparizi, S. Gollakota, and J. R. Smith. Powering the next billion devices with wi-fi. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, page 4. ACM, 2015. Google ScholarDigital Library
- N. H. Tan, N. H. Hamid, P. Sebastian, and Y. V. Voon. Resource minimization in a real-time depth-map processing system on fpga. In TENCON 2011--2011 IEEE Region 10 Conference, pages 706--710. IEEE, 2011.Google ScholarCross Ref
- R. Tapu, B. Mocanu, and T. B. Zaharia. ALICE: A smartphone assistant used to increase the mobility of visual impaired people. JAISE, 7(5):659--678, 2015.Google Scholar
- R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. Robotics and Au- tomation, IEEE Journal of, 3(4):323--344, 1987.Google Scholar
- P. Viola and M. J. Jones. Robust real-time face detection. International journal of computer vision, 57(2):137--154, 2004. Google ScholarDigital Library
- J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. M. Rehg. A scalable approach to activity recognition based on object use. In 2007 IEEE 11th International Conference on Computer Vision, pages 1--8, 2007.Google ScholarCross Ref
- Z. Xu, M. Kusner, M. Chen, and K. Q. Weinberger. Cost-sensitive tree of classifiers. In ICML, 2013. Google ScholarDigital Library
- Z. Zhang. A flexible new technique for camera calibra- tion. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11):1330--1334, 2000. Google ScholarDigital Library
- Z. Zhang. Camera calibration with one-dimensional objects. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(7):892--899, 2004. Google ScholarDigital Library
Index Terms
Glimpse: A Programmable Early-Discard Camera Architecture for Continuous Mobile Vision
Recommendations
Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices
SenSys '15: Proceedings of the 13th ACM Conference on Embedded Networked Sensor SystemsGlimpse is a continuous, real-time object recognition system for camera-equipped mobile devices. Glimpse captures full-motion video, locates objects of interest, recognizes and labels them, and tracks them from frame to frame for the user. Because the ...
Realizing Mixed Reality through Mobile Systems Research
S3 '18: Proceedings of the 10th on Wireless of the Students, by the Students, and for the Students WorkshopThe emergence of mobile mixed reality (MR) and augmented reality (AR) frameworks portends a future of computational immersion, placing virtual objects amidst physical environments. As MR and AR become increasingly accessible and programmable, immersive ...
Glimpse.3D: a motion-triggered stereo body camera for 3D experience capture and preview
IPSN '18: Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor NetworksThe Glimpse.3D is a body-worn camera that captures, processes, stores, and transmits 3D visual information of a real-world environment using a low cost camera-based sensor system that is constrained by its limited processing capability, storage, and ...
Comments