ACM Home Page
Please provide us with feedback. Feedback
A ground-truthing engine for proofsetting, publishing, re-purposing and quality assurance
Full text PdfPdf (166 KB)
Source Document Engineering archive
Proceedings of the 2003 ACM symposium on Document engineering table of contents
Grenoble, France
SESSION: Document management table of contents
Pages: 150 - 152  
Year of Publication: 2003
ISBN:1-58113-724-9
Authors
Steven J. Simske  Hewlett-Packard Labs, Fort Collins, CO
Margaret Sturgill  Hewlett-Packard Labs, Fort Collins, CO
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/958220.958247
What is a DOI?

ABSTRACT

We present design strategies, implementation preferences and throughput results obtained in deploying a UI-based ground truthing engine as the last step in the quality assurance (QA) for the conversion of a large out-of-print book collection into digital form. A series of automated QA steps were first performed on the document. Five distinct zoning analysis options were deployed and the PDF output thence generated was used to regenerate TIFF files for comparison to the originals. Regenerated TIFFs failing automated QA or a separate visual QA were tagged for ground truthing. Less than 3% of the pages in a 1.2x106-page corpus required ground truthing, resulting in a throughput rate of "fully-proofed" pages of 2x105 pages/man-week. Among the design advantages crucial for this throughput rate was the use of the identical zoning engine for the original production workflow and for the ground truthing engine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
MIT Press Classics press release, http://mitpress.mit.edu/main/feature/classics/MITPClassics_release.pdf.
 
2
Lee, J. P., Lopez, P. D., and Simske, S. J. "Click and Select User Interface for Document Scanning," U. S. Patent no. 6,151,426, November 21, 2000.
 
3
Altamura, O., Esposito, F. and Malerba, D. "Transforming Paper Documents into XML Format with Wisdom++," International Journal of Document Analysis and Recognition, 3(2):175--198, 2000.
 
4
Wisdom home page, http://www.di.uniba.it/ malerba/wisdom++/.
 
5
Simske, S. "The Use of XML and XML-Data to Provide Document Understanding at the Physical, Logical and Presentational Levels," In Proc. of the ICDAR99 Workshop DLIA, Sept. 1999.

Collaborative Colleagues:
Steven J. Simske: colleagues
Margaret Sturgill: colleagues

Peer to Peer - Readers of this Article have also read: