ABSTRACT
The location and recognition of scene text has been a difficult topic in many computer vision applications. In scene text location, which differ from the general objects detection and location task, scene text always suffers from large variances of the aspect ratio, scale, and orientation. As for scene text recognition, the changing character types, sharpness, sizes, and font families make it difficult to accurately recognize the text. In this paper, we have carried out work on both scene text location and recognition. We apply the general objects location methods to the scene text location task. We use two methods, one based on Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Connectionist Temporal Classifier (CTC), the other based on CNN, RNN and the Attention mechanism. They perform well both in recognition speed and accuracy rate compared with previous methods.
- Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems (Vol.39, pp.91--99). MIT Press. Google ScholarDigital Library
- He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn.Google ScholarCross Ref
- Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments{J}. 2017:3482--3490.Google Scholar
- Liao M, Shi B, Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector{J}. 2018.Google Scholar
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y., et al. (2015). Ssd: single shot multibox detector. 21--37.Google Scholar
- Busta M, Neumann L, Matas J. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework{C}// IEEE International Conference on Computer Vision. IEEE Computer Society, 2017:2223--2231.Google Scholar
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779--788, 2016.Google ScholarCross Ref
- Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation{C}// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015:3431--3440.Google Scholar
- Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., & He, W., et al. (2017). East: an efficient and accurate scene text detector. 2642--2651.Google Scholar
- Gupta A, Vedaldi A, Zisserman A. Synthetic Data for Text Localisation in Natural Images{J}. 2016:2315--2324.Google Scholar
- Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv.Google Scholar
- Lee C Y, Osindero S. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild{C}// Computer Vision and Pattern Recognition. IEEE, 2016:2231--2239.Google Scholar
- Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, & Xiaoou Tang. (2015). Reading scene text in deep convolutional sequences. 116(1), 3501--3508. Google ScholarDigital Library
- Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition.{J}. Image Processing IEEE Transactions on, 2014, 23(11):4737--4749.Google ScholarCross Ref
- Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1--20. Google ScholarDigital Library
- Zeiler M D, Fergus R. Visualizing and Understanding Convolutional Networks{C}// European Conference on Computer Vision. Springer, Cham, 2014:818--833.Google Scholar
- He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. 770--778.Google Scholar
- C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016.Google Scholar
- Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network. In European Conference on Computer Vision, pages 56--72. Springer, 2016.Google ScholarCross Ref
- Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multi-oriented text detection with fully convolutional networks. In Proc. of CVPR, 2015.Google Scholar
Index Terms
- Research on Text Location and Recognition in Natural Images with Deep Learning
Recommendations
Rectification and recognition of text in 3-D scenes
Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of ...
Street Sign Recognition Algorithm Based on Deep Learning
ICIGP '20: Proceedings of the 2020 3rd International Conference on Image and Graphics ProcessingThe complex background, uneven illumination and object occlusion have increased the difficulty of scene texts detection. In this paper, we improved the existing object detection algorithm SSD, and made it possible to detect text objects in traffic ...
Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
AbstractText recognition in the wild is a challenging task in the field of computer vision and machine learning. Existing optical character recognition engines cannot perform well in the natural scene. In this context, deep learning models have emerged as ...
Comments