skip to main content
10.1145/1284420.1284435acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Extracting reusable document components for variable data printing

Published: 28 August 2007 Publication History

Abstract

Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Every printed instance of a specific class of document can now have different degrees of customized content within the document template.
This flexibility comes at a cost. If every printed page is potentially different from all others it must be rasterized separately, which is a time-consuming process. Technologies such as PPML (Personalized Print Markup Language) attempt to address this problem by dividing the bitmapped page into components that can be cached at the raster level, thereby speeding up the generation of page instances.
A large number of documents are stored in Page Description Languages at a higher level of abstraction than the bitmapped page. Much of this content could be reused within a VDP environment provided that separable document components can be identified and extracted. These components then need to be individually rasterisable so that each high-level component can be related to its low-level (bitmap) equivalent. Unfortunately, the unstructured nature of most Page Description Languages makes it difficult to extract content easily.
This paper outlines the problems encountered in extracting component-based content from existing page description formats, such as PostScript, PDF and SVG, and how the differences between the formats affects the ease with which content can be extracted. The techniques are illustrated with reference to a tool called COG Extractor, which extracts content from PDF and SVG and prepares it for reuse.

References

[1]
Adobe Systems Incorporated, PostScript Language Reference Manual, Addison-Wesley, February 1999. Third edition.
[2]
Adobe Systems Inc, PDF Reference (Third Edition; PDF 1.4), Addison Wesley.
[3]
SVG 1.2 - Multiple Pages. http://www.w3.org/TR/2004/WD-SVG12-20041027/multipage.html
[4]
HP Indigo. http://www.hpl.hp.com/news/2006/jan-mar/indigo.html
[5]
John Lumley, Roger Gimson, and Owen Rees, "A Framework for Structure Layout and Function in Documents," in Proceedings of the ACM Symposium on Document Engineering (DocEng05), pp. 32--41, ACM Press, November 2005.
[6]
John Lumley, Roger Gimson, and Owen Rees, "Extensible Layout in Functional Documents," in SPIE/EI 2006 Digital Publishing Conference, January 2006.
[7]
PODi, Print markup language functional specification version 2.1, June 23 2003. http://www.podi.org
[8]
Steven Bagley, David Brailsford, and Matthew Hardy, "Creating reusable well-structured PDF as a sequence of Component Object Graphic (COG) elements.," in Proceedings of the ACM Symposium on Document Engineering (DocEng'03), pp. 58--67, ACM Press, 20-22 November 2003.
[9]
Alexander J. Macdonald, David F. Brailsford, and Steven R. Bagley, "Encapsulating and manipulating Component Object Graphics (COGs) using SVG.," in Proceedings of the ACM Symposium on Document Engineering (DocEng'05), pp. 61--63, ACM Press, 2-4 November 2005.
[10]
Steven R. Bagley, "COG Extractor," in Proceedings of the ACM Symposium on Document Engineering (DocEng'06), p. 31, ACM Press, 10-13 October 2006.
[11]
S. G. Probets and D. F. Brailsford, "Substituting outline fonts for bitmap fonts in archived PDF files," Software - Practice and Experience, vol. 33, no. 9, p. 885--899, July 2003.

Cited By

View all
  • (2018)A Systematic Method on PDF Privacy Leakage Issues2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)10.1109/TrustCom/BigDataSE.2018.00144(1020-1029)Online publication date: Aug-2018
  • (2014)Advanced authoring of paper-digital systemsMultimedia Tools and Applications10.1007/s11042-012-1217-770:2(1309-1332)Online publication date: 1-May-2014
  • (2012)Optimal pagination and content mapping for customized magazinesJournal of the Brazilian Computer Society10.1007/s13173-012-0066-618:4(331-349)Online publication date: 14-Mar-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '07: Proceedings of the 2007 ACM symposium on Document engineering
August 2007
236 pages
ISBN:9781595937766
DOI:10.1145/1284420
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 August 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PDF
  2. SVG
  3. content extraction
  4. graphic objects
  5. posrScript
  6. variable data printing

Qualifiers

  • Article

Conference

DocEng07
Sponsor:
DocEng07: ACM Symposium on Document Engineering
August 28 - 31, 2007
Manitoba, Winnipeg, Canada

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Systematic Method on PDF Privacy Leakage Issues2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)10.1109/TrustCom/BigDataSE.2018.00144(1020-1029)Online publication date: Aug-2018
  • (2014)Advanced authoring of paper-digital systemsMultimedia Tools and Applications10.1007/s11042-012-1217-770:2(1309-1332)Online publication date: 1-May-2014
  • (2012)Optimal pagination and content mapping for customized magazinesJournal of the Brazilian Computer Society10.1007/s13173-012-0066-618:4(331-349)Online publication date: 14-Mar-2012
  • (2010)Lessons from the dragonProceedings of the 10th ACM symposium on Document engineering10.1145/1860559.1860573(65-68)Online publication date: 21-Sep-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media