skip to main content
10.1145/1858378.1858426acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesa2cwicConference Proceedingsconference-collections
research-article

A metadata and annotation extractor from PDF document for semantic web

Published:16 September 2010Publication History

ABSTRACT

Research scholars undertake literature survey to identify and problem which they would like to address and possible solutions. As the part of this activity, they download research papers from internet, read them and write comments, observations, explanations or questions either on a separate sheet of a paper or on the paper itself. They use these notes and observations to firm up their understanding of research domain and to define their research problems. These notes and observations are very valuable knowledge asset for the research.

My work is motivated by a desire to capture and to make it available to the community of research scholars, so that they can be benefited from them.

In this paper, I present an editor which facilitates authoring annotations on PDF documents. I have designed a DTD (Document Type Definition) for annotation document. This DTD contains identity of annotation Author, identity of the paper on which annotation will be created, Type of annotation, Comment and Date_time elements. This type field is of enumeration type and may take a value "note", "comment", "insert", "help", "paragraph". "insert" is used to state that the annotation is not on the original PDF document but it is on another annotation. My tool provides a user-friendly interface to query these annotations on PDF document, to classify document on the basis of number of comments and also the relationships between annotations. My tool also extracts metadata from the PDF document. This metadata includes title, author, keywords, summary and date_time. This tool has been implemented using API of java PDF Box.

References

  1. }}Rick Scanlan, Director, Sales Engineering, Pegasus Imaging Corporation. Annotating PDFs in Web-Based ECM Systems. Without Altering. (www.accusoft.com/Annotating_PDFs_In_Web_Based_ECM_Systems.pdf)Google ScholarGoogle Scholar
  2. }}Kazantseva, A. and Szpakowicz, S. 2010. Summarizing short stories. Comput. Linguist. 36, 1 (Mar. 2010), 71--109. DOI= ttp://dx.doi.org/10.1162/coli.2010.36.1.36102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Anna Kazantseva, Stan Szpakowicz 2006. Proceedings of the Workshop on Task-Focused Summarization and Question Answering, pages 8--15, Sydney, July 2006. 2006 Association for Computational Linguistics University of Ottawathe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Robert Charles Abiodum 2006. An Annotation Model for Document Tracking and Recommendation Services. International joint conference on computer, information, and system sciences and engineering, CIS2E 2006 Bridge Port, USAGoogle ScholarGoogle Scholar
  5. }}Amaya, http://www.w3.org/Amaya/Google ScholarGoogle Scholar
  6. }}Annotea Project, www.annotea.orgGoogle ScholarGoogle Scholar
  7. }}Co-ment, www.Co-ment.net/Google ScholarGoogle Scholar
  8. }}A. Nnotate, http://a.nnotate.com/cms-annotation.htmlGoogle ScholarGoogle Scholar
  9. }}http://www.foxitsoftware.com/pdf/reader/Google ScholarGoogle Scholar
  10. }}http://www.pdfill.com/Google ScholarGoogle Scholar
  11. }}W3C, RDF Primer, 2004. 2 (http://www.w3c.org)Google ScholarGoogle Scholar
  12. }}W3C Semantic Web Activity Group. Accessed May 21, 2010.Google ScholarGoogle Scholar

Index Terms

  1. A metadata and annotation extractor from PDF document for semantic web

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
              September 2010
              425 pages
              ISBN:9781450301947
              DOI:10.1145/1858378

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 September 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader