skip to main content
10.1145/1296951.1296969acmconferencesArticle/Chapter ViewAbstractPublication PageswikisymConference Proceedingsconference-collections
Article

Connecting wikis and natural language processing systems

Published:21 October 2007Publication History

ABSTRACT

We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.

References

  1. S. Ananiadou and J. McNaught, editors. Text Mining for Biology and Biomedicine. Artech House, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Natale, C. O'Donovan, N. Redaschi, and L.-S. L. Yeh. The Universal Protein Resource (UniProt). Nucleic Acids Research, 33(suppl 1):D154--D159, January 2005.Google ScholarGoogle Scholar
  3. S. Bergler, R. Witte, M. Khalife, Z. Li, and F. Rudzicz. Using Knowledge-poor Coreference Resolution for Text Summarization. In Proceedings of the HLT/NAACL Workshop on Text Summarization (DUC 2003). Document Understanding Conference, 2003. http://www-nlpir.nist.gov/projects/duc/pubs/2003final.papers/concordia.final.pdf.Google ScholarGoogle Scholar
  4. S. Bergler, R. Witte, Z. Li, M. Khalife, Y. Chen, M. Doandes, and A. Andreevskaia. Multi-ERSS and ERSS 2004. In Proceedings of the HLT/NAACL Workshop on Text Summarization (DUC 2004). Document Understanding Conference, 2004. http://www-nlpir.nist.gov/projects/duc/pubs/2004papers/concordia.witte.pdf.Google ScholarGoogle Scholar
  5. K. Bontcheva, V. Tablan, D. Maynard, and H. Cunningham. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223--254, 2002. http://gate.ac.uk.Google ScholarGoogle ScholarCross RefCross Ref
  7. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002. http://gate.ac.uk.Google ScholarGoogle Scholar
  8. R. Feldman and J. Sanger. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Jurafsky and J. H. Martin. Speech and Language Processing. Prentice Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoffe. Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics, 2(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Krötzsch, D. Vrandeci , and M. Völkel. Semantic MediaWiki. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, editors, The Semantic Web -- ISWC 2006, volume 4273 of LNCS, pages 935--942. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Mani. Automatic Summarization. John Benjamins B.V., 2001.Google ScholarGoogle Scholar
  13. C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Morville. Ambient Findability. O'Reilly, 2005.Google ScholarGoogle Scholar
  15. P. Perera and R. Witte. A Self-Learning Context-Aware Lemmatizer for German. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pages 636--643, Vancouver, British Columbia, Canada, October 6--8 2005. Association for Computational Linguistics. http://www.aclweb.org/anthology/H/H05/H05-1080. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Schaffert. IkeWiki: A Semantic Wiki for Collaborative Knowledge Management. In WETICE, pages 388--396. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Shanks. WikiGateway: a library for interoperability and accelerated wiki development. In D. Riehle, editor, Int. Sym. Wikis, pages 53--66. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Witte. An Integration Architecture for User-Centric Document Creation, Retrieval, and Analysis. In Proceedings of the VLDB Workshop on Information Integration on the Web (IIWeb'04), pages 141--144, Toronto, Canada, August 30 2004. http://rene-witte.net/downloads/witte iiweb04.pdf.Google ScholarGoogle Scholar
  19. R. Witte and C. J. O. Baker. Combining Biological Databases and Text Mining to support New Bioinformatics Applications. In Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems (NLDB 2005), volume 3513 of LNCS, pages 310--321, Alicante, Spain, June 15--17 2005. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Witte and S. Bergler. Fuzzy Clustering for Topic Analysis and Summarization of Document Collections. In Z. Kobti and D. Wu, editors, Proc. of the 20th Canadian Conference on Artificial Intelligence (Canadian A.I. 2007), LNAI 4509, pages 476--488, Montréal, Québec, Canada, May 28--30 2007. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Witte and S. Bergler. Next-Generation Summarization: Contrastive, Focused, and Update Summaries. In International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, September 27-29 2007.Google ScholarGoogle Scholar
  22. R. Witte, P. Gerlach, M. Joachim, T. Kappler, R. Krestel, and P. Perera. Engineering a Semantic Desktop for Building Historians and Architects. In Proceedings of the Semantic Desktop Workshop at the ISWC, volume 175 of CEUR Workshop Proceedings, pages 138--152, Galway, Ireland, November 6 2005. http://CEUR-WS.org/Vol-175/34 witte engineeringsd final.pdf.Google ScholarGoogle Scholar
  23. R. Witte, T. Kappler, and C. J. O. Baker. Ontology Design for Biomedical Text Mining. In Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, chapter 13, pages 281--313. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. Witte, R. Krestel, and S. Bergler. ERSS 2005: Coreference-Based Summarization Reloaded. In Proceedings of Document Understanding Workshop (DUC), Vancouver, B.C., Canada, October 9-10 2005. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google ScholarGoogle Scholar
  25. R. Witte, R. Krestel, and S. Bergler. Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs. In Proceedings of Document Understanding Workshop (DUC), New York City, NY, USA, June 8-9 2006. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google ScholarGoogle Scholar
  26. R. Witte, R. Krestel, and S. Bergler. Generating Update Summaries for DUC 2007. In Proceedings of Document Understanding Workshop (DUC) at NAACL-HLT 2007, Rochester, NY, USA, April 26--27 2007. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google ScholarGoogle Scholar
  27. M. M. Wood, S. J. Lydon, V. Tablan, D. Maynard, and H. Cunningham. Populating a Database from Parallel Texts Using Ontology-Based Information Extraction. In 9th International Conference on Applications of Natural Language to Information Systems (NLDB), volume 3136 of LNCS. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  28. T. Zesch, I. Gurevych, and M. Mühlhäuser. Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In G. Rehm, A. Witt, and L. Lemnitzer, editors, Data Structures for Linguistic Resources and Applications, pages 197--205. Gunter Narr, Tübingen, Tuebingen, Germany, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Connecting wikis and natural language processing systems

              Recommendations

              Reviews

              Klaus K. Obermeier

              What if Wikis were freed from their monolithic task of knowledge presentation, and instead were part of information systems (ISs) that could answer questions, summarize articles, and ultimately become "self-aware" of their own content__?__ The authors seek to demonstrate just that, and to provide, at the very least, a plausible road map to such an outcome using state-of-the-art natural language processing (NLP) techniques, along with open-source products such as MediaWiki and the General Architecture for Text Engineering (GATE). The focus of this research is not to discover new NLP techniques, but rather to augment current ones to improve and automate content retrieval, analysis, and text generation. The result of such research is an IS that shows a major improvement in the quality of index generation based on a semantic metalanguage and metatags, content development, summarization, and question answering. The architecture of such an IS may include multiple tiers, starting with a (Web) client interacting with a presentation/interaction layer, which in turn triggers the services run by the NLP system; another tier includes the knowledge base in support of such processes. Using Wikis as part of NLP is exciting since it demonstrates the use of knowledge-based understanding techniques, in particular the potential value of understanding techniques based on sublanguages (after all, each Wiki is created to reflect knowledge in a particular subject area). More importantly, Wikis as part of NLP might open a way to connect the monoliths of knowledge, ultimately tying them all together in one giant semantic Web. Online Computing Reviews Service

              Access critical reviews of Computing literature here

              Become a reviewer for Computing Reviews.

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                WikiSym '07: Proceedings of the 2007 international symposium on Wikis
                October 2007
                190 pages
                ISBN:9781595938619
                DOI:10.1145/1296951

                Copyright © 2007 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 21 October 2007

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate69of145submissions,48%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader