ACM Home Page
Please provide us with feedback. Feedback
Detection and segmentation of tables and math-zones from document images
Full text PdfPdf (171 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2006 ACM symposium on Applied computing table of contents
Dijon, France
SESSION: Document engineering (DE) table of contents
Pages: 841 - 846  
Year of Publication: 2006
ISBN:1-59593-108-2
Authors
S. Mandal  Bengal Engineering and Science University, Shibpur, Howrah, India
S. P. Chowdhury  Bengal Engineering and Science University, Shibpur, Howrah, India
A. K. Das  Bengal Engineering and Science University, Shibpur, Howrah, India
Bhabatosh Chanda  Indian Statistical Institute, Kolkata, India
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 50,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141277.1141469
What is a DOI?

ABSTRACT

We propose an algorithm to separate out tables and math-zones from document images. The algorithm relies on the spatial characteristics of tables and math-zones in a document. It has been observed that tables have distinct columns which imply that gaps between the fields are substantially larger than the gaps between the words in text lines and in math-zones the characters and symbols are less dense in comparison to normal text lines. These deceptively simple observations have led us to design a simple but powerful table and math-zone detection system with low computation cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
A. Belaid and J. P. Haton. A syntactic approach for handwritten mathematical formula recognition. IEEE Trans. PAMI, Vol 6., pages 105--111, 1984.
 
3
 
4
S. Chandran, S. Balasubramanian, T. Gandhi, A. Prasad, R. Kasturi, and A. Chhabra. Structure recognition and information extraction from tabular documents, 7. IJIST, (4):289--303, 1996.
 
5
 
6
A. K. Das. Document Image Segmentation: A morphological approach. PhD thesis, Bengal Engineering College (Deemed University), Sibpur, India, 1998.
 
7
A. K. Das and B. Chanda. Text segmentation from document images: A morphological approach. Journal of Institute of Engineers (I), 77, November, pages 50--56, 1996.
 
8
A. K. Das and B. Chanda. Detection of tables and headings from document image: A morphological approach. In International Conf. on Computational linguistics, Speech and Document Processing (ICCLSDP'98); Feb. 18--20, Calcutta, India, pages A57--A64, 1998.
 
9
A. K. Das and B. Chanda. A fast algorithm for skew detection of document images using morphology. Intl. J. of Document Analysis and Recognition, 4, pages 109--114, 2001.
 
10
R. Fateman, T. Tokuyasu, B. Berman, and N. Mitchell. Optical character recognition and parsing of typeset mathematics. Visual Commun. And Image Representation, Vol 7, no 1, pages 2--15, March 1996.
 
11
 
12
 
13
J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In SPIE Document Recognition and Retrieval VII, pages 291--302, San Jose, CA, 2000.
 
14
K. Itonori. Table Structure Recognition based on Textblock arrangement and Ruled Line Position. In ICDAR93, pages 765--768, 1993.
 
15
 
16
A. Kacem, A. Belaid, and M. B. Ahmed. Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. IJDAR, Vol 4, no 2, pages 97--108, 2001.
 
17
 
18
T. G. Kieninger. Table structure recognition based on robust block segmentation. In Proceedings Document Recognition V, SPIE, vol. 3305, pages 22--32, San Jose, California, Jan 1998, 1998.
 
19
 
20
N. Otsu. A threshold selection method from gray-level histogram. IEEE Trans. SMC, 9, No. 1, pages 62--66, 1979.
 
21
 
22
 
23
T. Tanaka and S. Tsuruoka. Table form document understanding using node classification method and html document generation. In Proc. of 3rd IAPR Workshop on Document Analysis Systems (DAS '98), pages 157--158, Nagano, Japan, 1998.
 
24
W. T. Tersteegen and C. Wenzel. Scantab: Table recognition by reference tables. In Proc. of Third IAPR workshop on Document Analysis Systems (DAS'98), pages 356--365, Nagano, Japan, 1998.
 
25
 
26
 
27
 
28
 
29

Collaborative Colleagues:
S. Mandal: colleagues
S. P. Chowdhury: colleagues
A. K. Das: colleagues
Bhabatosh Chanda: colleagues