ABSTRACT
We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.
- S. Ananiadou and J. McNaught, editors. Text Mining for Biology and Biomedicine. Artech House, 2006. Google ScholarDigital Library
- A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Natale, C. O'Donovan, N. Redaschi, and L.-S. L. Yeh. The Universal Protein Resource (UniProt). Nucleic Acids Research, 33(suppl 1):D154--D159, January 2005.Google Scholar
- S. Bergler, R. Witte, M. Khalife, Z. Li, and F. Rudzicz. Using Knowledge-poor Coreference Resolution for Text Summarization. In Proceedings of the HLT/NAACL Workshop on Text Summarization (DUC 2003). Document Understanding Conference, 2003. http://www-nlpir.nist.gov/projects/duc/pubs/2003final.papers/concordia.final.pdf.Google Scholar
- S. Bergler, R. Witte, Z. Li, M. Khalife, Y. Chen, M. Doandes, and A. Andreevskaia. Multi-ERSS and ERSS 2004. In Proceedings of the HLT/NAACL Workshop on Text Summarization (DUC 2004). Document Understanding Conference, 2004. http://www-nlpir.nist.gov/projects/duc/pubs/2004papers/concordia.witte.pdf.Google Scholar
- K. Bontcheva, V. Tablan, D. Maynard, and H. Cunningham. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 2004. Google ScholarDigital Library
- H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223--254, 2002. http://gate.ac.uk.Google ScholarCross Ref
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002. http://gate.ac.uk.Google Scholar
- R. Feldman and J. Sanger. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2006. Google ScholarDigital Library
- D. Jurafsky and J. H. Martin. Speech and Language Processing. Prentice Hall, 2000. Google ScholarDigital Library
- A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoffe. Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics, 2(1), 2005. Google ScholarDigital Library
- M. Krötzsch, D. Vrandeci , and M. Völkel. Semantic MediaWiki. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, editors, The Semantic Web -- ISWC 2006, volume 4273 of LNCS, pages 935--942. Springer, 2006. Google ScholarDigital Library
- I. Mani. Automatic Summarization. John Benjamins B.V., 2001.Google Scholar
- C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarDigital Library
- P. Morville. Ambient Findability. O'Reilly, 2005.Google Scholar
- P. Perera and R. Witte. A Self-Learning Context-Aware Lemmatizer for German. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pages 636--643, Vancouver, British Columbia, Canada, October 6--8 2005. Association for Computational Linguistics. http://www.aclweb.org/anthology/H/H05/H05-1080. Google ScholarDigital Library
- S. Schaffert. IkeWiki: A Semantic Wiki for Collaborative Knowledge Management. In WETICE, pages 388--396. IEEE Computer Society, 2006. Google ScholarDigital Library
- B. Shanks. WikiGateway: a library for interoperability and accelerated wiki development. In D. Riehle, editor, Int. Sym. Wikis, pages 53--66. ACM, 2005. Google ScholarDigital Library
- R. Witte. An Integration Architecture for User-Centric Document Creation, Retrieval, and Analysis. In Proceedings of the VLDB Workshop on Information Integration on the Web (IIWeb'04), pages 141--144, Toronto, Canada, August 30 2004. http://rene-witte.net/downloads/witte iiweb04.pdf.Google Scholar
- R. Witte and C. J. O. Baker. Combining Biological Databases and Text Mining to support New Bioinformatics Applications. In Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems (NLDB 2005), volume 3513 of LNCS, pages 310--321, Alicante, Spain, June 15--17 2005. Springer-Verlag. Google ScholarDigital Library
- R. Witte and S. Bergler. Fuzzy Clustering for Topic Analysis and Summarization of Document Collections. In Z. Kobti and D. Wu, editors, Proc. of the 20th Canadian Conference on Artificial Intelligence (Canadian A.I. 2007), LNAI 4509, pages 476--488, Montréal, Québec, Canada, May 28--30 2007. Springer. Google ScholarDigital Library
- R. Witte and S. Bergler. Next-Generation Summarization: Contrastive, Focused, and Update Summaries. In International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, September 27-29 2007.Google Scholar
- R. Witte, P. Gerlach, M. Joachim, T. Kappler, R. Krestel, and P. Perera. Engineering a Semantic Desktop for Building Historians and Architects. In Proceedings of the Semantic Desktop Workshop at the ISWC, volume 175 of CEUR Workshop Proceedings, pages 138--152, Galway, Ireland, November 6 2005. http://CEUR-WS.org/Vol-175/34 witte engineeringsd final.pdf.Google Scholar
- R. Witte, T. Kappler, and C. J. O. Baker. Ontology Design for Biomedical Text Mining. In Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, chapter 13, pages 281--313. Springer, 2007.Google ScholarCross Ref
- R. Witte, R. Krestel, and S. Bergler. ERSS 2005: Coreference-Based Summarization Reloaded. In Proceedings of Document Understanding Workshop (DUC), Vancouver, B.C., Canada, October 9-10 2005. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google Scholar
- R. Witte, R. Krestel, and S. Bergler. Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs. In Proceedings of Document Understanding Workshop (DUC), New York City, NY, USA, June 8-9 2006. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google Scholar
- R. Witte, R. Krestel, and S. Bergler. Generating Update Summaries for DUC 2007. In Proceedings of Document Understanding Workshop (DUC) at NAACL-HLT 2007, Rochester, NY, USA, April 26--27 2007. http://duc.nist.gov/pubs/2005papers/ukarlsruhe.witte.pdf.Google Scholar
- M. M. Wood, S. J. Lydon, V. Tablan, D. Maynard, and H. Cunningham. Populating a Database from Parallel Texts Using Ontology-Based Information Extraction. In 9th International Conference on Applications of Natural Language to Information Systems (NLDB), volume 3136 of LNCS. Springer, 2004.Google ScholarCross Ref
- T. Zesch, I. Gurevych, and M. Mühlhäuser. Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In G. Rehm, A. Witt, and L. Lemnitzer, editors, Data Structures for Linguistic Resources and Applications, pages 197--205. Gunter Narr, Tübingen, Tuebingen, Germany, 2007.Google Scholar
Index Terms
- Connecting wikis and natural language processing systems
Recommendations
An architecture to support intelligent user interfaces for Wikis by means of Natural Language Processing
WikiSym '09: Proceedings of the 5th International Symposium on Wikis and Open CollaborationWe present an architecture for integrating a set of Natural Language Processing (NLP) techniques with a wiki platform. This entails support for adding, organizing, and finding content in the wiki. We perform a comprehensive analysis of how NLP ...
Comments