research-article

VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

Authors:
Anhong Guo

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Xiang 'Anthony' Chen

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Haoran Qi

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Samuel White

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Suman Ghosh

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Chieko Asakawa

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Jeffrey P. Bigham

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and TechnologyOctober 2016Pages 651–664https://doi.org/10.1145/2984511.2984518

Published:16 October 2016Publication History

UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology

Pages 651–664

ABSTRACT

The world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout, and usually only after labeling them with sighted assistance. We introduce VizLens - an accessible mobile application and supporting backend that can robustly and interactively help blind people use nearly any interface they encounter. VizLens users capture a photo of an inaccessible interface and send it to multiple crowd workers, who work in parallel to quickly label and describe elements of the interface to make subsequent computer vision easier. The VizLens application helps users recapture the interface in the field of the camera, and uses computer vision to interactively describe the part of the interface beneath their finger (updating 8 times per second). We show that VizLens provides accurate and usable real-time feedback in a study with 10 blind participants, and our crowdsourcing labeling workflow was fast (8 minutes), accurate (99.7%), and cheap ($1.15). We then explore extensions of VizLens that allow it to (i) adapt to state changes in dynamic interfaces, (ii) combine crowd labeling with OCR technology to handle dynamic displays, and (iii) benefit from head-mounted cameras. VizLens robustly solves a long-standing challenge in accessibility by deeply integrating crowdsourcing and computer vision, and foreshadows a future of increasingly powerful interactive applications that would be currently impossible with either alone.

Supplemental Material

uist1584-file3.mp4

mp4

86.3 MB

Download

p651-guo.mp4

mp4

178.1 MB

Download

References

The braille literacy crisis in america. facing the truth, reversing the trend, empowering the blind. National Federation of the Blind, Jernigan Institute, March 2009.Google Scholar
2Bay, H., Tuytelaars, T., and Van Gool, L. Surf: Speeded up robust features. In Computer vision-ECCV 2006. Springer, 2006, 404--417. Google ScholarDigital Library
Be My Eyes. http://www.bemyeyes.org, 2015.Google Scholar
Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 33--42. Google ScholarDigital Library
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. Communications of the ACM 58, 8 (2015), 85--94. Google ScholarDigital Library
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., et al. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 333--342. Google ScholarDigital Library
Bigham, J. P., Jayant, C., Miller, A., White, B., and Yeh, T. Vizwiz:: Locateit-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE (2010), 65--72.Google ScholarCross Ref
Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. Earcons and icons: Their structure and common design principles. Human-Computer Interaction 4, 1 (1989), 11--44. Google ScholarDigital Library
Brady, E. L., Zhong, Y., Morris, M. R., and Bigham, J. P. Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW '13, ACM (New York, NY, USA, 2013), 1225--1236. Google ScholarDigital Library
de Freitas, A. A., Nebeling, M., Chen, X. A., Yang, J., Karthikeyan Ranithangam, A. S. K., and Dey, A. K. Snap-to-it: A user-inspired platform for opportunistic device interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, ACM (New York, NY, USA, 2016), 5909--5920. Google ScholarDigital Library
Fusco, G., Tekin, E., Ladner, R. E., and Coughlan, J. M. Using computer vision to access appliance displays. In ASSETS/Association for Computing Machinery. ACM Conference on Assistive Technologies, vol. 2014, NIH Public Access (2014), 281. Google ScholarDigital Library
Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292. Google ScholarDigital Library
Guo, A., Chen, X., and Bigham, J. P. Appliancereader: A wearable, crowdsourced, vision-based system to make appliances accessible. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM (2015), 2043--2048. Google ScholarDigital Library
Hara, K., Sun, J., Moore, R., Jacobs, D., and Froehlich, J. Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning. In Proceedings of the ACM symposium on User interface software and technology (2014), 189--204. Google ScholarDigital Library
1Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility, ACM (2011), 203--210. Google ScholarDigital Library
Jung, K., Kim, K. I., and Jain, A. K. Text information extraction in images and video: a survey. Pattern recognition 37, 5 (2004), 977--997. Google ScholarCross Ref
Kane, S. K., Frey, B., and Wobbrock, J. O. Access lens: a gesture-based screen reader for real-world documents. In Proc. CHI, ACM (2013), 347--350. Google ScholarDigital Library
KNFB Reader. http://www.knfbreader.com/, 2015.Google Scholar
Ladner, R. E., Ivory, M. Y., Rao, R., Burgstahler, S., Comden, D., Hahn, S., Renzelmann, M., Krisnandi, S., Ramasamy, M., Slabosky, B., et al. Automating tactile graphics translation. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, ACM (2005), 150--157. Google ScholarDigital Library
Laput, G., Lasecki, W. S., Wiese, J., Xiao, R., Bigham, J. P., and Harrison, C. Zensors: Adaptive, rapidly deployable, human-intelligent sensor feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1935--1944. Google ScholarDigital Library
Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 23--34. Google ScholarDigital Library
Lasecki, W. S., Murray, K. I., White, S., Miller, R. C., and Bigham, J. P. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM (2011), 23--32. Google ScholarDigital Library
Lasecki, W. S., Thiha, P., Zhong, Y., Brady, E., and Bigham, J. P. Answering visual questions with conversational crowd assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 18. Google ScholarDigital Library
Lasecki, W. S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J. F., and Bigham, J. P. Chorus: A crowd-powered conversational assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, ACM (New York, NY, USA, 2013), 151--162. Google ScholarDigital Library
Manduchi, R., and Coughlan, J. M. The last meter: blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2014), 3113--3122. Google ScholarDigital Library
Morris, T., Blenkhorn, P., Crossey, L., Ngo, Q., Ross, M., Werner, D., and Wong, C. Clearspeech: A display reader for the visually handicapped. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 4 (Dec 2006), 492--500. Google ScholarCross Ref
Nanayakkara, S., Shilkrot, R., Yeo, K. P., and Maes, P. Eyering: a finger-worn input device for seamless interactions with our surroundings. In Proceedings of the 4th Augmented Human International Conference, ACM (2013), 13--20. Google ScholarDigital Library
OrCam. http://www.orcam.com, 2016.Google Scholar
Shilkrot, R., Huber, J., Liu, C., Maes, P., and Nanayakkara, S. C. Fingerreader: A wearable device to support text reading on the go. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014), 2359--2364. Google ScholarDigital Library
Tekin, E., Coughlan, J. M., and Shen, H. Real-time detection and reading of led/lcd displays for visually impaired persons. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, IEEE (2011), 491--496. Google ScholarDigital Library
Thatcher, J. Screen reader/2: Access to os/2 and the graphical user interface. In Proceedings of the First Annual ACM Conference on Assistive Technologies, Assets '94, ACM (New York, NY, USA, 1994), 39--46. Google ScholarDigital Library
Vanderheiden, G., and Treviranus, J. Creating a global public inclusive infrastructure. In Universal Access in Human-Computer Interaction. Design for All and eInclusion. Springer, 2011, 517--526. Google ScholarDigital Library
Vázquez, M., and Steinfeld, A. An assisted photography framework to help visually impaired users properly aim a camera. ACM Transactions on Computer-Human Interaction (TOCHI) 21, 5 (2014), 25. Google ScholarDigital Library
Vezhnevets, V., Sazonov, V., and Andreeva, A. A survey on pixel-based skin color detection techniques. In Proc. Graphicon, vol. 3, Moscow, Russia (2003), 85--92.Google Scholar
White, S., Ji, H., and Bigham, J. P. Easysnap: real-time audio feedback for blind photography. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM (2010), 409--410. Google ScholarDigital Library
Wikipedia. Optical character recognition -- wikipedia, the free encyclopedia, 2016. {Online; accessed 18-March-2016}.Google Scholar
Wikipedia. Taxicab geometry -- wikipedia, the free encyclopedia, 2016. {Online; accessed 14-March-2016}.Google Scholar
Wikipedia. Tesseract (software) -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google Scholar
Wikipedia. Unsharp masking -- wikipedia, the free encyclopedia, 2016. {Online; accessed 12-April-2016}.Google Scholar
Zhao, Y., Szpiro, S., and Azenkot, S. Foresee: A customizable head-mounted vision enhancement system for people with low vision. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS '15, ACM (New York, NY, USA, 2015), 239--249. Google ScholarDigital Library
Zhong, Y., Garrigues, P. J., and Bigham, J. P. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ACM (2013), 20. Google ScholarDigital Library
Zhong, Y., Lasecki, W. S., Brady, E., and Bigham, J. P. Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 2353--2362. Google ScholarDigital Library

Index Terms

VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Social and professional topics

Recommendations

Facade: Auto-generating Tactile Interfaces to Appliances
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Common appliances have shifted toward flat interface panels, making them inaccessible to blind people. Although blind people can label appliances with Braille stickers, doing so generally requires sighted assistance to identify the original functions ...
Read More
Making Real-World Interfaces Accessible Through Crowdsourcing, Computer Vision, and Fabrication
W4A '17: Proceedings of the 14th International Web for All Conference

The world is full of physical interfaces that are inaccessible to blind people, from microwaves and information kiosks to thermostats and checkout terminals. Blind people cannot independently use such devices without at least first learning their layout,...
Read More
ApplianceReader: A Wearable, Crowdsourced, Vision-based System to Make Appliances Accessible
CHI EA '15: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems

Visually impaired people can struggle to use everyday appliances with inaccessible control panels. To address this problem, we present ApplianceReader - a system that combines a wearable point-of-view camera with on-demand crowdsourcing and computer ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology
October 2016
908 pages
ISBN:9781450341899
DOI:10.1145/2984511
General Chairs:
Jun Rekimoto
The University of Tokyo
,
Takeo Igarashi
The University of Tokyo
,
Program Chairs:
Jacob O. Wobbrock
University of Washington
,
Daniel Avrahami
FXPAL
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accessibility
computer vision
crowdsourcing
mobile devices
non-visual interfaces
visually impaired users
Qualifiers
- research-article
Conference

Acceptance Rates
UIST '16 Paper Acceptance Rate79of384submissions,21%Overall Acceptance Rate842of3,967submissions,21%
More
Upcoming Conference
UIST '24

Sponsor:

sigchi

sigchi

UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 910
  Total Downloads
- Downloads (Last 12 months)110
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World

UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Facade: Auto-generating Tactile Interfaces to Appliances

Making Real-World Interfaces Accessible Through Crowdsourcing, Computer Vision, and Fabrication

ApplianceReader: A Wearable, Crowdsourced, Vision-based System to Make Appliances Accessible