skip to main content
10.1145/3292448.3292452acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicaaiConference Proceedingsconference-collections
research-article

Research on Text Location and Recognition in Natural Images with Deep Learning

Authors Info & Claims
Published:06 October 2018Publication History

ABSTRACT

The location and recognition of scene text has been a difficult topic in many computer vision applications. In scene text location, which differ from the general objects detection and location task, scene text always suffers from large variances of the aspect ratio, scale, and orientation. As for scene text recognition, the changing character types, sharpness, sizes, and font families make it difficult to accurately recognize the text. In this paper, we have carried out work on both scene text location and recognition. We apply the general objects location methods to the scene text location task. We use two methods, one based on Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Connectionist Temporal Classifier (CTC), the other based on CNN, RNN and the Attention mechanism. They perform well both in recognition speed and accuracy rate compared with previous methods.

References

  1. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems (Vol.39, pp.91--99). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn.Google ScholarGoogle ScholarCross RefCross Ref
  3. Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments{J}. 2017:3482--3490.Google ScholarGoogle Scholar
  4. Liao M, Shi B, Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector{J}. 2018.Google ScholarGoogle Scholar
  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y., et al. (2015). Ssd: single shot multibox detector. 21--37.Google ScholarGoogle Scholar
  6. Busta M, Neumann L, Matas J. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework{C}// IEEE International Conference on Computer Vision. IEEE Computer Society, 2017:2223--2231.Google ScholarGoogle Scholar
  7. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779--788, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation{C}// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015:3431--3440.Google ScholarGoogle Scholar
  9. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., & He, W., et al. (2017). East: an efficient and accurate scene text detector. 2642--2651.Google ScholarGoogle Scholar
  10. Gupta A, Vedaldi A, Zisserman A. Synthetic Data for Text Localisation in Natural Images{J}. 2016:2315--2324.Google ScholarGoogle Scholar
  11. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv.Google ScholarGoogle Scholar
  12. Lee C Y, Osindero S. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild{C}// Computer Vision and Pattern Recognition. IEEE, 2016:2231--2239.Google ScholarGoogle Scholar
  13. Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, & Xiaoou Tang. (2015). Reading scene text in deep convolutional sequences. 116(1), 3501--3508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition.{J}. Image Processing IEEE Transactions on, 2014, 23(11):4737--4749.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zeiler M D, Fergus R. Visualizing and Understanding Convolutional Networks{C}// European Conference on Computer Vision. Springer, Cham, 2014:818--833.Google ScholarGoogle Scholar
  17. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. 770--778.Google ScholarGoogle Scholar
  18. C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016.Google ScholarGoogle Scholar
  19. Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network. In European Conference on Computer Vision, pages 56--72. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  20. Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multi-oriented text detection with fully convolutional networks. In Proc. of CVPR, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Research on Text Location and Recognition in Natural Images with Deep Learning

    Recommendations

    Reviews

    Mariana Damova

    A technical account, this paper reports on experiments carried out with combinations of deep learning techniques. The purpose of the proposed research is to explore and evaluate methods for text location and recognition in images. Two approaches are attempted: 1) convolutional neural networks (CNN), recurrent neural networks (RNN), and connectionist temporal classifiers (CTC); and 2) CNN, RNN, and attention mechanisms. The text location method uses faster recurrent convolutional neural networks (RCNN) and mask RCNN. Several text recognition methods are presented: 1) DictNet and CharNet, CNN-based methods; 2) a CNN+RNN+CTC method with two layers of RNN; and 3) a CNN+RNN+attention-based method, including two models of CNN as part of an inception V3 network. Different endpoints in the network indicate different depths and structures of the CNN model. Separate experiments are thoroughly conducted and described for both text location and text recognition. Other approaches are discussed, but the proposed text location method is not directly comparable with other state-of-the-art approaches. The proposed text recognition methods are also compared with each other, evaluated, and analyzed. A well-written paper with interesting research results and discussions of related approaches, it is a good read for scholars, students, and professionals interested in deep learning approaches and/or automated text location and recognition in images.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICAAI '18: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence
      October 2018
      61 pages
      ISBN:9781450365833
      DOI:10.1145/3292448

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader