research-article

Research on Text Location and Recognition in Natural Images with Deep Learning

Authors:
Ping Zhang

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China
View Profile

,
Ziyu Shi

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China
View Profile

,
Haichang Gao

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China

Institute of Software Engineering, Xidian University, Xi'an, Shaanxi, P.R.China
View Profile

ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial IntelligenceOctober 2018Pages 1–6https://doi.org/10.1145/3292448.3292452

Published:06 October 2018Publication History

ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence

Pages 1–6

ABSTRACT

The location and recognition of scene text has been a difficult topic in many computer vision applications. In scene text location, which differ from the general objects detection and location task, scene text always suffers from large variances of the aspect ratio, scale, and orientation. As for scene text recognition, the changing character types, sharpness, sizes, and font families make it difficult to accurately recognize the text. In this paper, we have carried out work on both scene text location and recognition. We apply the general objects location methods to the scene text location task. We use two methods, one based on Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Connectionist Temporal Classifier (CTC), the other based on CNN, RNN and the Attention mechanism. They perform well both in recognition speed and accuracy rate compared with previous methods.

References

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems (Vol.39, pp.91--99). MIT Press. Google ScholarDigital Library
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn.Google ScholarCross Ref
Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments{J}. 2017:3482--3490.Google Scholar
Liao M, Shi B, Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector{J}. 2018.Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y., et al. (2015). Ssd: single shot multibox detector. 21--37.Google Scholar
Busta M, Neumann L, Matas J. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework{C}// IEEE International Conference on Computer Vision. IEEE Computer Society, 2017:2223--2231.Google Scholar
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779--788, 2016.Google ScholarCross Ref
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation{C}// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015:3431--3440.Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., & He, W., et al. (2017). East: an efficient and accurate scene text detector. 2642--2651.Google Scholar
Gupta A, Vedaldi A, Zisserman A. Synthetic Data for Text Localisation in Natural Images{J}. 2016:2315--2324.Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv.Google Scholar
Lee C Y, Osindero S. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild{C}// Computer Vision and Pattern Recognition. IEEE, 2016:2231--2239.Google Scholar
Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, & Xiaoou Tang. (2015). Reading scene text in deep convolutional sequences. 116(1), 3501--3508. Google ScholarDigital Library
Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition.{J}. Image Processing IEEE Transactions on, 2014, 23(11):4737--4749.Google ScholarCross Ref
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1--20. Google ScholarDigital Library
Zeiler M D, Fergus R. Visualizing and Understanding Convolutional Networks{C}// European Conference on Computer Vision. Springer, Cham, 2014:818--833.Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. 770--778.Google Scholar
C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016.Google Scholar
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network. In European Conference on Computer Vision, pages 56--72. Springer, 2016.Google ScholarCross Ref
Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multi-oriented text detection with fully convolutional networks. In Proc. of CVPR, 2015.Google Scholar

Index Terms

Research on Text Location and Recognition in Natural Images with Deep Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Rectification and recognition of text in 3-D scenes

Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of ...
Read More
Street Sign Recognition Algorithm Based on Deep Learning
ICIGP '20: Proceedings of the 2020 3rd International Conference on Image and Graphics Processing

The complex background, uneven illumination and object occlusion have increased the difficulty of scene texts detection. In this paper, we improved the existing object detection algorithm SSD, and made it possible to detect text objects in traffic ...
Read More
Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
Abstract
Text recognition in the wild is a challenging task in the field of computer vision and machine learning. Existing optical character recognition engines cannot perform well in the natural scene. In this context, deep learning models have emerged as ...
Read More

Reviews

Reviewer: Mariana Damova

A technical account, this paper reports on experiments carried out with combinations of deep learning techniques. The purpose of the proposed research is to explore and evaluate methods for text location and recognition in images. Two approaches are attempted: 1) convolutional neural networks (CNN), recurrent neural networks (RNN), and connectionist temporal classifiers (CTC); and 2) CNN, RNN, and attention mechanisms. The text location method uses faster recurrent convolutional neural networks (RCNN) and mask RCNN. Several text recognition methods are presented: 1) DictNet and CharNet, CNN-based methods; 2) a CNN+RNN+CTC method with two layers of RNN; and 3) a CNN+RNN+attention-based method, including two models of CNN as part of an inception V3 network. Different endpoints in the network indicate different depths and structures of the CNN model. Separate experiments are thoroughly conducted and described for both text location and text recognition. Other approaches are discussed, but the proposed text location method is not directly comparable with other state-of-the-art approaches. The proposed text recognition methods are also compared with each other, evaluated, and analyzed. A well-written paper with interesting research results and discussions of related approaches, it is a good read for scholars, students, and professionals interested in deep learning approaches and/or automated text location and recognition in images.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence
October 2018
61 pages
ISBN:9781450365833
DOI:10.1145/3292448

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computer version
Scene text
Text location
Text recognition
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 182
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Research on Text Location and Recognition in Natural Images with Deep Learning

ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rectification and recognition of text in 3-D scenes

Street Sign Recognition Algorithm Based on Deep Learning

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Research on Text Location and Recognition in Natural Images with Deep Learning

ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Rectification and recognition of text in 3-D scenes

Street Sign Recognition Algorithm Based on Deep Learning

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media