skip to main content
10.1145/1180639.1180673acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Automatic document orientation detection and categorization through document vectorization

Published: 23 October 2006 Publication History

Abstract

This paper presents an automatic orientation detection and categorization technique that is capable of detecting the orientation of multilingual documents with arbitrary skew and categorizing document images according to the underlying languages. We carry out orientation detection and categorization through document vectorization, which encodes document orientation and language information and converts each document image into an electronic document vector through the exploitation of the density and distribution of vertical component runs. For each language of interest, a pair of vector templates is first constructed through a training process. Orientation and category of the query image are then determined based on distances between the query document vector and the constructed vector templates. Experiments over 492 testing document images show that the average orientation detection and categorization rates reach up to 97.56% and 99.59%, respectively.

References

[1]
T. Akiyama and N. Hagita, Automated entry system for printed documents, Pattern Recognition, 23(11):1141--1154, 1990.
[2]
D. S. Le and G. R. Thoma and H. Wechsler, Automated Page Orientation and Skew Angle Detection for Binary Document Images, Pattern Recognition, 27(10):1325--1344, 1994.
[3]
B. T. Ǡvila and R. D. Lins, A fast orientation and skew detection algorithm for monochromatic document images, ACM symposium on Document engineering, pages 118--126, 2005.
[4]
D. Bloomberg and G. Kopec and L. Dasari, Measuring document image skew and orientation, SPIE 2422, pages 302--316, 1995.
[5]
R. S. Caprari, Algorithm for text page up/down orientation determination, Pattern Recognition Letters, 21(4):311--317, 2000.
[6]
A. Vailaya and H. Zhang and C. Yang and F. Liu and A. K. Jain, Automatic image orientation detection, IEEE Transactions on Image Processing, 11(7):746--755, 2002.
[7]
S. Lyu, Automatic Image Orientation Determination with Natural Image Statistics, Proceedings of the 13th annual ACM international conference on Multimedia, pages 491--494, 2005.
[8]
S. Lu and C. L. Tan, Script and language identification in degraded and distorted document images, Proceedings of the 21th National Conference on Artificial Intelligence (AAAI), 2006, Accepted.
[9]
A. L. Spitz, Determination of Script and Language Content of Document Images, IEEE Transaction on Pattern Analysis and Machine Intelligence, 19(3):235--245, 1997.
[10]
J. Hochberg and L. Kerns and P. Kelly and T. Thomas, Automatic Script Identification from Images Using Cluster-based Templates, IEEE Transaction on Pattern Analysis and Machine Intelligence, 19(2):176--181, 1997.
[11]
T. N. Tan, Rotation Invariant Texture Features and Their Use in Automatic Script Identification, IEEE Transaction on Pattern Analysis and Machine Intelligence, 20(7):751--756, 1998.
[12]
N. Otsu, A Threshold Selection Method from Graylevel Histogram, IEEE Transactions on System, Man, Cybernetics, 19(1):62--66, 1978.
[13]
J. J. Hull and S. L. Taylor, Document image skew detection: Survey and annotated bibliography, Document Analysis Systems, pages 40--64, World Scientific, 1998.
[14]
Y. Lu and C. L. Tan, A nearest-neighbor-chain based approach to skew estimation in document images, Pattern Recognition Letters, 24(14):2315--2323, 2003.

Cited By

View all
  • (2022)Analysis on Skew Detection and Rectification Techniques for Offline Handwritten ScriptsInventive Systems and Control10.1007/978-981-19-1012-8_57(801-810)Online publication date: 2-Aug-2022
  • (2021)Automated Text and Tabular Data Extraction from Scanned Document ImagesData Management, Analytics and Innovation10.1007/978-981-16-2934-1_11(169-182)Online publication date: 5-Aug-2021
  • (2020)Voting-Based Document Image Skew DetectionApplied Sciences10.3390/app1007223610:7(2236)Online publication date: 25-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '06: Proceedings of the 14th ACM international conference on Multimedia
October 2006
1072 pages
ISBN:1595934472
DOI:10.1145/1180639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document image
  2. document orientation detection

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 23 - 27, 2006
CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Analysis on Skew Detection and Rectification Techniques for Offline Handwritten ScriptsInventive Systems and Control10.1007/978-981-19-1012-8_57(801-810)Online publication date: 2-Aug-2022
  • (2021)Automated Text and Tabular Data Extraction from Scanned Document ImagesData Management, Analytics and Innovation10.1007/978-981-16-2934-1_11(169-182)Online publication date: 5-Aug-2021
  • (2020)Voting-Based Document Image Skew DetectionApplied Sciences10.3390/app1007223610:7(2236)Online publication date: 25-Mar-2020
  • (2020)Neural Network-based Efficient Measurement Method on Upside Down Orientation of a Digital DocumentAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0502865:2(697-702)Online publication date: 2020
  • (2017)Automatic Orientation Correction of AEC Drawing Documents2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)10.1109/ICDAR.2017.252(9-10)Online publication date: Nov-2017
  • (2016)Contour-Based Binary Image Orientation Detection by Orientation Context and Roulette DistanceIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E99.A.621E99.A:2(621-633)Online publication date: 2016
  • (2012)A Photogrammetric Analysis of Cuneiform Tablets for the Purpose of Digital ReconstructionInternational Journal of Heritage in the Digital Era10.1260/2047-4970.1.0.491:1_suppl(49-53)Online publication date: 1-Jan-2012
  • (2012)A Method for Detecting Document Orientation by Using Naïve Bayes ClassifierProceedings of the 2012 International Conference on Industrial Control and Electronics Engineering10.1109/ICICEE.2012.120(429-432)Online publication date: 23-Aug-2012
  • (2011)A method for detecting document orientation by using SVM classifier2011 International Conference on Multimedia Technology10.1109/ICMT.2011.6003081(47-50)Online publication date: Jul-2011
  • (2011)Composite Script Identification and Orientation Detection for Indian Text ImagesProceedings of the 2011 International Conference on Document Analysis and Recognition10.1109/ICDAR.2011.67(294-298)Online publication date: 18-Sep-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media