| Detection and segmentation of tables and math-zones from document images |
| Full text |
Pdf
(171 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2006 ACM symposium on Applied computing
table of contents
Dijon, France
SESSION: Document engineering (DE)
table of contents
Pages: 841 - 846
Year of Publication: 2006
ISBN:1-59593-108-2
|
|
Authors
|
|
S. Mandal
|
Bengal Engineering and Science University, Shibpur, Howrah, India
|
|
S. P. Chowdhury
|
Bengal Engineering and Science University, Shibpur, Howrah, India
|
|
A. K. Das
|
Bengal Engineering and Science University, Shibpur, Howrah, India
|
|
Bhabatosh Chanda
|
Indian Statistical Institute, Kolkata, India
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 50, Citation Count: 0
|
|
|
ABSTRACT
We propose an algorithm to separate out tables and math-zones from document images. The algorithm relies on the spatial characteristics of tables and math-zones in a document. It has been observed that tables have distinct columns which imply that gaps between the fields are substantially larger than the gaps between the words in text lines and in math-zones the characters and symbols are less dense in comparison to normal text lines. These deceptively simple observations have led us to design a simple but powerful table and math-zone detection system with low computation cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. Belaid and J. P. Haton. A syntactic approach for handwritten mathematical formula recognition. IEEE Trans. PAMI, Vol 6., pages 105--111, 1984.
|
| |
3
|
|
| |
4
|
S. Chandran, S. Balasubramanian, T. Gandhi, A. Prasad, R. Kasturi, and A. Chhabra. Structure recognition and information extraction from tabular documents, 7. IJIST, (4):289--303, 1996.
|
| |
5
|
|
| |
6
|
A. K. Das. Document Image Segmentation: A morphological approach. PhD thesis, Bengal Engineering College (Deemed University), Sibpur, India, 1998.
|
| |
7
|
A. K. Das and B. Chanda. Text segmentation from document images: A morphological approach. Journal of Institute of Engineers (I), 77, November, pages 50--56, 1996.
|
| |
8
|
A. K. Das and B. Chanda. Detection of tables and headings from document image: A morphological approach. In International Conf. on Computational linguistics, Speech and Document Processing (ICCLSDP'98); Feb. 18--20, Calcutta, India, pages A57--A64, 1998.
|
| |
9
|
A. K. Das and B. Chanda. A fast algorithm for skew detection of document images using morphology. Intl. J. of Document Analysis and Recognition, 4, pages 109--114, 2001.
|
| |
10
|
R. Fateman, T. Tokuyasu, B. Berman, and N. Mitchell. Optical character recognition and parsing of typeset mathematics. Visual Commun. And Image Representation, Vol 7, no 1, pages 2--15, March 1996.
|
| |
11
|
|
| |
12
|
|
| |
13
|
J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In SPIE Document Recognition and Retrieval VII, pages 291--302, San Jose, CA, 2000.
|
| |
14
|
K. Itonori. Table Structure Recognition based on Textblock arrangement and Ruled Line Position. In ICDAR93, pages 765--768, 1993.
|
| |
15
|
|
| |
16
|
A. Kacem, A. Belaid, and M. B. Ahmed. Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. IJDAR, Vol 4, no 2, pages 97--108, 2001.
|
| |
17
|
|
| |
18
|
T. G. Kieninger. Table structure recognition based on robust block segmentation. In Proceedings Document Recognition V, SPIE, vol. 3305, pages 22--32, San Jose, California, Jan 1998, 1998.
|
| |
19
|
|
| |
20
|
N. Otsu. A threshold selection method from gray-level histogram. IEEE Trans. SMC, 9, No. 1, pages 62--66, 1979.
|
| |
21
|
|
| |
22
|
|
| |
23
|
T. Tanaka and S. Tsuruoka. Table form document understanding using node classification method and html document generation. In Proc. of 3rd IAPR Workshop on Document Analysis Systems (DAS '98), pages 157--158, Nagano, Japan, 1998.
|
| |
24
|
W. T. Tersteegen and C. Wenzel. Scantab: Table recognition by reference tables. In Proc. of Third IAPR workshop on Document Analysis Systems (DAS'98), pages 356--365, Nagano, Japan, 1998.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
Richard Zanibbi , Dorothea Blostein , R. Cordy, A survey of table recognition: Models, observations, transformations, and inferences, International Journal on Document Analysis and Recognition, v.7 n.1, p.1-16, March 2004
[doi> 10.1007/s10032-004-0120-9]
|
| |
29
|
|
|