ABSTRACT
The world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout, and usually only after labeling them with sighted assistance. We introduce VizLens - an accessible mobile application and supporting backend that can robustly and interactively help blind people use nearly any interface they encounter. VizLens users capture a photo of an inaccessible interface and send it to multiple crowd workers, who work in parallel to quickly label and describe elements of the interface to make subsequent computer vision easier. The VizLens application helps users recapture the interface in the field of the camera, and uses computer vision to interactively describe the part of the interface beneath their finger (updating 8 times per second). We show that VizLens provides accurate and usable real-time feedback in a study with 10 blind participants, and our crowdsourcing labeling workflow was fast (8 minutes), accurate (99.7%), and cheap ($1.15). We then explore extensions of VizLens that allow it to (i) adapt to state changes in dynamic interfaces, (ii) combine crowd labeling with OCR technology to handle dynamic displays, and (iii) benefit from head-mounted cameras. VizLens robustly solves a long-standing challenge in accessibility by deeply integrating crowdsourcing and computer vision, and foreshadows a future of increasingly powerful interactive applications that would be currently impossible with either alone.
Supplemental Material
- The braille literacy crisis in america. facing the truth, reversing the trend, empowering the blind. National Federation of the Blind, Jernigan Institute, March 2009.Google Scholar
- 2Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features. In Computer vision-ECCV 2006. Springer, 2006, 404--417. Google ScholarDigital Library
- Be My Eyes. http://www.bemyeyes.org, 2015.Google Scholar
- Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 33--42. Google ScholarDigital Library
- Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. Communications of the ACM 58, 8 (2015), 85--94. Google ScholarDigital Library
- Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., et al. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 333--342. Google ScholarDigital Library
- Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE (2010), 65--72.Google ScholarCross Ref
- Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. Earcons and icons: Their structure and common design principles. Human-Computer Interaction 4, 1 (1989), 11--44. Google ScholarDigital Library
- Brady, E. L., Zhong, Y., Morris, M. R., and Bigham, J. P. Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW '13, ACM (New York, NY, USA, 2013), 1225--1236. Google ScholarDigital Library
- de Freitas, A. A., Nebeling, M., Chen, X. A., Yang, J., Karthikeyan Ranithangam, A. S. K., and Dey, A. K. Snap-to-it: A user-inspired platform for opportunistic device interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, ACM (New York, NY, USA, 2016), 5909--5920. Google ScholarDigital Library
- Fusco, G., Tekin, E., Ladner, R. E., and Coughlan, J. M. Using computer vision to access appliance displays. In ASSETS/Association for Computing Machinery. ACM Conference on Assistive Technologies, vol. 2014, NIH Public Access (2014), 281. Google ScholarDigital Library
- Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292. Google ScholarDigital Library
- Guo, A., Chen, X., and Bigham, J. P. Appliancereader: A wearable, crowdsourced, vision-based system to make appliances accessible. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM (2015), 2043--2048. Google ScholarDigital Library
- Hara, K., Sun, J., Moore, R., Jacobs, D., and Froehlich, J. Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning. In Proceedings of the ACM symposium on User interface software and technology (2014), 189--204. Google ScholarDigital Library
- 1Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility, ACM (2011), 203--210. Google ScholarDigital Library
- Jung, K., Kim, K. I., and Jain, A. K. Text information extraction in images and video: a survey. Pattern recognition 37, 5 (2004), 977--997. Google ScholarCross Ref
- Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. In Proc. CHI, ACM (2013), 347--350. Google ScholarDigital Library
- KNFB Reader. http://www.knfbreader.com/, 2015.Google Scholar
- Ladner, R. E., Ivory, M. Y., Rao, R., Burgstahler, S., Comden, D., Hahn, S., Renzelmann, M., Krisnandi, S., Ramasamy, M., Slabosky, B., et al. Automating tactile graphics translation. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, ACM (2005), 150--157. Google ScholarDigital Library
- Laput, G., Lasecki, W. S., Wiese, J., Xiao, R., Bigham, J. P., and Harrison, C. Zensors: Adaptive, rapidly deployable, human-intelligent sensor feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1935--1944. Google ScholarDigital Library
- Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 23--34. Google ScholarDigital Library
- Lasecki, W. S., Murray, K. I., White, S., Miller, R. C., and Bigham, J. P. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 23--32. Google ScholarDigital Library
- Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 18. Google ScholarDigital Library
- Lasecki, W. S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J. F., and Bigham, J. P. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, ACM (New York, NY, USA, 2013), 151--162. Google ScholarDigital Library
- Manduchi, R., and Coughlan, J. M. The last meter: blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2014), 3113--3122. Google ScholarDigital Library
- Morris, T., Blenkhorn, P., Crossey, L., Ngo, Q., Ross, M., Werner, D., and Wong, C. Clearspeech: A display reader for the visually handicapped. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 4 (Dec 2006), 492--500. Google ScholarCross Ref
- Nanayakkara, S., Shilkrot, R., Yeo, K. P., and Maes, P. Eyering: a finger-worn input device for seamless interactions with our surroundings. In Proceedings of the 4th Augmented Human International Conference, ACM (2013), 13--20. Google ScholarDigital Library
- OrCam. http://www.orcam.com, 2016.Google Scholar
- Shilkrot, R., Huber, J., Liu, C., Maes, P., and Nanayakkara, S. C. Fingerreader: A wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014), 2359--2364. Google ScholarDigital Library
- Tekin, E., Coughlan, J. M., and Shen, H. Real-time detection and reading of led/lcd displays for visually impaired persons. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, IEEE (2011), 491--496. Google ScholarDigital Library
- Thatcher, J. Screen reader/2: Access to os/2 and the graphical user interface. In Proceedings of the First Annual ACM Conference on Assistive Technologies, Assets '94, ACM (New York, NY, USA, 1994), 39--46. Google ScholarDigital Library
- Vanderheiden, G., and Treviranus, J. Creating a global public inclusive infrastructure. In Universal Access in Human-Computer Interaction. Design for All and eInclusion. Springer, 2011, 517--526. Google ScholarDigital Library
- Vázquez, M., and Steinfeld, A. An assisted photography framework to help visually impaired users properly aim a camera. ACM Transactions on Computer-Human Interaction (TOCHI) 21, 5 (2014), 25. Google ScholarDigital Library
- Vezhnevets, V., Sazonov, V., and Andreeva, A. A survey on pixel-based skin color detection techniques. In Proc. Graphicon, vol. 3, Moscow, Russia (2003), 85--92.Google Scholar
- White, S., Ji, H., and Bigham, J. P. Easysnap: real-time audio feedback for blind photography. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 409--410. Google ScholarDigital Library
- Wikipedia. Optical character recognition -- wikipedia, the free encyclopedia, 2016. {Online; accessed 18-March-2016}.Google Scholar
- Wikipedia. Taxicab geometry -- wikipedia, the free encyclopedia, 2016. {Online; accessed 14-March-2016}.Google Scholar
- Wikipedia. Tesseract (software) -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google Scholar
- Wikipedia. Unsharp masking -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google Scholar
- Zhao, Y., Szpiro, S., and Azenkot, S. Foresee: A customizable head-mounted vision enhancement system for people with low vision. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '15, ACM (New York, NY, USA, 2015), 239--249. Google ScholarDigital Library
- Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 20. Google ScholarDigital Library
- Zhong, Y., Lasecki, W. S., Brady, E., and Bigham, J. P. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 2353--2362. Google ScholarDigital Library
Index Terms
- VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World
Recommendations
Facade: Auto-generating Tactile Interfaces to Appliances
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsCommon appliances have shifted toward flat interface panels, making them inaccessible to blind people. Although blind people can label appliances with Braille stickers, doing so generally requires sighted assistance to identify the original functions ...
Making Real-World Interfaces Accessible Through Crowdsourcing, Computer Vision, and Fabrication
W4A '17: Proceedings of the 14th International Web for All ConferenceThe world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout,...
ApplianceReader: A Wearable, Crowdsourced, Vision-based System to Make Appliances Accessible
CHI EA '15: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing SystemsVisually impaired people can struggle to use everyday appliances with inaccessible control panels. To address this problem, we present ApplianceReader - a system that combines a wearable point-of-view camera with on-demand crowdsourcing and computer ...
Comments