| A ground-truthing engine for proofsetting, publishing, re-purposing and quality assurance |
| Full text |
Pdf
(166 KB)
|
| Source
|
Document Engineering
archive
Proceedings of the 2003 ACM symposium on Document engineering
table of contents
Grenoble, France
SESSION: Document management
table of contents
Pages: 150 - 152
Year of Publication: 2003
ISBN:1-58113-724-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 0, Downloads (12 Months): 20, Citation Count: 0
|
|
|
ABSTRACT
We present design strategies, implementation preferences and throughput results obtained in deploying a UI-based ground truthing engine as the last step in the quality assurance (QA) for the conversion of a large out-of-print book collection into digital form. A series of automated QA steps were first performed on the document. Five distinct zoning analysis options were deployed and the PDF output thence generated was used to regenerate TIFF files for comparison to the originals. Regenerated TIFFs failing automated QA or a separate visual QA were tagged for ground truthing. Less than 3% of the pages in a 1.2x106-page corpus required ground truthing, resulting in a throughput rate of "fully-proofed" pages of 2x105 pages/man-week. Among the design advantages crucial for this throughput rate was the use of the identical zoning engine for the original production workflow and for the ground truthing engine.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
MIT Press Classics press release, http://mitpress.mit.edu/main/feature/classics/MITPClassics_release.pdf.
|
| |
2
|
Lee, J. P., Lopez, P. D., and Simske, S. J. "Click and Select User Interface for Document Scanning," U. S. Patent no. 6,151,426, November 21, 2000.
|
| |
3
|
Altamura, O., Esposito, F. and Malerba, D. "Transforming Paper Documents into XML Format with Wisdom++," International Journal of Document Analysis and Recognition, 3(2):175--198, 2000.
|
| |
4
|
Wisdom home page, http://www.di.uniba.it/ malerba/wisdom++/.
|
| |
5
|
Simske, S. "The Use of XML and XML-Data to Provide Document Understanding at the Physical, Logical and Presentational Levels," In Proc. of the ICDAR99 Workshop DLIA, Sept. 1999.
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|