skip to main content
10.1145/2984511.2984518acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

Published:16 October 2016Publication History

ABSTRACT

The world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout, and usually only after labeling them with sighted assistance. We introduce VizLens - an accessible mobile application and supporting backend that can robustly and interactively help blind people use nearly any interface they encounter. VizLens users capture a photo of an inaccessible interface and send it to multiple crowd workers, who work in parallel to quickly label and describe elements of the interface to make subsequent computer vision easier. The VizLens application helps users recapture the interface in the field of the camera, and uses computer vision to interactively describe the part of the interface beneath their finger (updating 8 times per second). We show that VizLens provides accurate and usable real-time feedback in a study with 10 blind participants, and our crowdsourcing labeling workflow was fast (8 minutes), accurate (99.7%), and cheap ($1.15). We then explore extensions of VizLens that allow it to (i) adapt to state changes in dynamic interfaces, (ii) combine crowd labeling with OCR technology to handle dynamic displays, and (iii) benefit from head-mounted cameras. VizLens robustly solves a long-standing challenge in accessibility by deeply integrating crowdsourcing and computer vision, and foreshadows a future of increasingly powerful interactive applications that would be currently impossible with either alone.

Skip Supplemental Material Section

Supplemental Material

uist1584-file3.mp4

mp4

86.3 MB

p651-guo.mp4

mp4

178.1 MB

References

  1. The braille literacy crisis in america. facing the truth, reversing the trend, empowering the blind. National Federation of the Blind, Jernigan Institute, March 2009.Google ScholarGoogle Scholar
  2. 2Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features. In Computer vision-ECCV 2006. Springer, 2006, 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Be My Eyes. http://www.bemyeyes.org, 2015.Google ScholarGoogle Scholar
  4. Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. Communications of the ACM 58, 8 (2015), 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., et al. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE (2010), 65--72.Google ScholarGoogle ScholarCross RefCross Ref
  8. Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. Earcons and icons: Their structure and common design principles. Human-Computer Interaction 4, 1 (1989), 11--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brady, E. L., Zhong, Y., Morris, M. R., and Bigham, J. P. Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW '13, ACM (New York, NY, USA, 2013), 1225--1236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. de Freitas, A. A., Nebeling, M., Chen, X. A., Yang, J., Karthikeyan Ranithangam, A. S. K., and Dey, A. K. Snap-to-it: A user-inspired platform for opportunistic device interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, ACM (New York, NY, USA, 2016), 5909--5920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fusco, G., Tekin, E., Ladner, R. E., and Coughlan, J. M. Using computer vision to access appliance displays. In ASSETS/Association for Computing Machinery. ACM Conference on Assistive Technologies, vol. 2014, NIH Public Access (2014), 281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guo, A., Chen, X., and Bigham, J. P. Appliancereader: A wearable, crowdsourced, vision-based system to make appliances accessible. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM (2015), 2043--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hara, K., Sun, J., Moore, R., Jacobs, D., and Froehlich, J. Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning. In Proceedings of the ACM symposium on User interface software and technology (2014), 189--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 1Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility, ACM (2011), 203--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jung, K., Kim, K. I., and Jain, A. K. Text information extraction in images and video: a survey. Pattern recognition 37, 5 (2004), 977--997. Google ScholarGoogle ScholarCross RefCross Ref
  17. Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. In Proc. CHI, ACM (2013), 347--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. KNFB Reader. http://www.knfbreader.com/, 2015.Google ScholarGoogle Scholar
  19. Ladner, R. E., Ivory, M. Y., Rao, R., Burgstahler, S., Comden, D., Hahn, S., Renzelmann, M., Krisnandi, S., Ramasamy, M., Slabosky, B., et al. Automating tactile graphics translation. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, ACM (2005), 150--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Laput, G., Lasecki, W. S., Wiese, J., Xiao, R., Bigham, J. P., and Harrison, C. Zensors: Adaptive, rapidly deployable, human-intelligent sensor feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1935--1944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lasecki, W. S., Murray, K. I., White, S., Miller, R. C., and Bigham, J. P. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lasecki, W. S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J. F., and Bigham, J. P. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, ACM (New York, NY, USA, 2013), 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Manduchi, R., and Coughlan, J. M. The last meter: blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2014), 3113--3122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Morris, T., Blenkhorn, P., Crossey, L., Ngo, Q., Ross, M., Werner, D., and Wong, C. Clearspeech: A display reader for the visually handicapped. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 4 (Dec 2006), 492--500. Google ScholarGoogle ScholarCross RefCross Ref
  27. Nanayakkara, S., Shilkrot, R., Yeo, K. P., and Maes, P. Eyering: a finger-worn input device for seamless interactions with our surroundings. In Proceedings of the 4th Augmented Human International Conference, ACM (2013), 13--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. OrCam. http://www.orcam.com, 2016.Google ScholarGoogle Scholar
  29. Shilkrot, R., Huber, J., Liu, C., Maes, P., and Nanayakkara, S. C. Fingerreader: A wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014), 2359--2364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tekin, E., Coughlan, J. M., and Shen, H. Real-time detection and reading of led/lcd displays for visually impaired persons. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, IEEE (2011), 491--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Thatcher, J. Screen reader/2: Access to os/2 and the graphical user interface. In Proceedings of the First Annual ACM Conference on Assistive Technologies, Assets '94, ACM (New York, NY, USA, 1994), 39--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vanderheiden, G., and Treviranus, J. Creating a global public inclusive infrastructure. In Universal Access in Human-Computer Interaction. Design for All and eInclusion. Springer, 2011, 517--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vázquez, M., and Steinfeld, A. An assisted photography framework to help visually impaired users properly aim a camera. ACM Transactions on Computer-Human Interaction (TOCHI) 21, 5 (2014), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vezhnevets, V., Sazonov, V., and Andreeva, A. A survey on pixel-based skin color detection techniques. In Proc. Graphicon, vol. 3, Moscow, Russia (2003), 85--92.Google ScholarGoogle Scholar
  35. White, S., Ji, H., and Bigham, J. P. Easysnap: real-time audio feedback for blind photography. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 409--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wikipedia. Optical character recognition -- wikipedia, the free encyclopedia, 2016. {Online; accessed 18-March-2016}.Google ScholarGoogle Scholar
  37. Wikipedia. Taxicab geometry -- wikipedia, the free encyclopedia, 2016. {Online; accessed 14-March-2016}.Google ScholarGoogle Scholar
  38. Wikipedia. Tesseract (software) -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google ScholarGoogle Scholar
  39. Wikipedia. Unsharp masking -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google ScholarGoogle Scholar
  40. Zhao, Y., Szpiro, S., and Azenkot, S. Foresee: A customizable head-mounted vision enhancement system for people with low vision. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '15, ACM (New York, NY, USA, 2015), 239--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhong, Y., Lasecki, W. S., Brady, E., and Bigham, J. P. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 2353--2362. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology
        October 2016
        908 pages
        ISBN:9781450341899
        DOI:10.1145/2984511

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        UIST '16 Paper Acceptance Rate79of384submissions,21%Overall Acceptance Rate842of3,967submissions,21%

        Upcoming Conference

        UIST '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader