skip to main content
10.1145/585058.585077acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Mapping and displaying structural transformations between XML and PDF

Published:08 November 2002Publication History

ABSTRACT

Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'.This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window.Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document.

References

  1. Adobe Systems Incorporated, PDF Reference (Second Edition) version 1.3, ISBN 0-201-61588-6, Addison-Wesley, July 2000.Google ScholarGoogle Scholar
  2. Adobe Systems Incorporated, PDF Reference (Third Edition) version 1.4, ISBN 0-201-75839-3, Addison-Wesley, December 2001.Google ScholarGoogle Scholar
  3. David F. Brailsford, "Separable hyperstructure and delayed link binding," ACM Computing Surveys, vol. 31, no. 4es, December 1999. http://doi.acm.org/10.1145/345966.346029 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kenneth Brooks, "A two-view document editor with user-definable document structure," DEC Research Report No. 33, November 1988. Available online via ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-033.pdfGoogle ScholarGoogle Scholar
  5. Donald D. Chamberlin, James C. King, Donald R. Slutz, Stephen J. Todd, and Bradford W. Wade, "JANUS: An interactive formatter based on declarative tags" IBM Systems Journal, vol. 21, no. 3, pp. 250--271, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Donald D. Chamberlin, H.F. Hasselmeier, A. W. Luniewski, D.P. Paris, B. W. Wade, and M. L. Zolliker, "Quill: An extensible system for editing documents of mixed type," in Proc. 21st Hawaii Int. Conf. on System Sciences, pp. 317--326, IEEE Computer Society Press, April 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. The Document Object Model (DOM). http://www.w3c.org/TR/2000/REC-DOMLevel-2-Core-20001113/Google ScholarGoogle Scholar
  8. W.S. Lovegrove and D. F. Brailsford, "Document Analysis of PDF Files: Methods, Results and Implications," Electronic Publishing-Origination, Dissemination and Design, vol. 8, no. 2 & 3, pp. 207--220, June & September 1995.Google ScholarGoogle Scholar
  9. Vincent Quint and Irène Vatton, "Grif: An interactive system for document structure manipulation," in Proceedings International Conference on Text Processing and Document Manipulation, ed. J. C. van Vliet, pp. 200--213, Cambridge University Press, April 1986.Google ScholarGoogle Scholar
  10. Namespaces in XML. http://www.w3c.org/TR/1999/REC-xml-names-19990114/Google ScholarGoogle Scholar
  11. Philip N. Smith, David F. Brailsford, David R. Evans, Leon Harrison, Steve G. Probets, and Peter E. Sutton, "Journal Publishing with Acrobat: the CAJUN project," Electronic Publishing - Origination, Dissemination and Design, vol. 6, no. 4, pp. 481--493, December 1993. http://cajun.cs.nott.ac.uk/compsci/epo/papers/epoddtoc.htmlGoogle ScholarGoogle Scholar
  12. The treediff project. http://www.alphaworks.ibm.com/tech/xmltreediffGoogle ScholarGoogle Scholar

Index Terms

  1. Mapping and displaying structural transformations between XML and PDF

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              DocEng '02: Proceedings of the 2002 ACM symposium on Document engineering
              November 2002
              168 pages
              ISBN:1581135947
              DOI:10.1145/585058

              Copyright © 2002 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 8 November 2002

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              DocEng '02 Paper Acceptance Rate21of46submissions,46%Overall Acceptance Rate178of537submissions,33%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader