ABSTRACT
Although existing work has explored both information extraction and community content creation, most research has focused on them in isolation. In contrast, we see the greatest leverage in the synergistic pairing of these methods as two interlocking feedback cycles. This paper explores the potential synergy promised if these cycles can be made to accelerate each other by exploiting the same edits to advance both community content creation and learning-based information extraction. We examine our proposed synergy in the context of Wikipedia infoboxes and the Kylin information extraction system. After developing and refining a set of interfaces to present the verification of Kylin extractions as a non primary task in the context of Wikipedia articles, we develop an innovative use of Web search advertising services to study people engaged in some other primary task. We demonstrate our proposed synergy by analyzing our deployment from two complementary perspectives: (1) we show we accelerate community content creation by using Kylin's information extraction to significantly increase the likelihood that a person visiting a Wikipedia article as a part of some other primary task will spontaneously choose to help improve the article's infobox, and (2) we show we accelerate information extraction by using contributions collected from people interacting with our designs to significantly improve Kylin's extraction performance.
- Bryant, S.L., Forte, A. and Bruckman, A. (2005). Becoming Wikipedian: Transformation of Participation in a Collaborative Online Encyclopedia. Proceedings of the ACM Conference on Supporting Group Work (GROUP 2005), 1--10. Google ScholarDigital Library
- Cosley, D., Frankowski, D., Terveen, L. and Riedl, J. (2007). SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. Proceedings of the International Conference on Intelligent User Interfaces (IUI 2007), 32--41. Google ScholarDigital Library
- Culotta, A., Kristjansson, T., McCallum, A. and Viola, P. (2006). Corrective Feedback and Persistent Learning for Information Extraction. Artificial Intelligence 170(14). 1101--1122. Google ScholarDigital Library
- DeRose, P., Chai, X., Gao, B., Shen, W., Doan, A., Bohannon, P. and Zhu, J. (2008). Building Community Wikipedias: A Human-Machine Approach. Proceedings of the IEEE International Conference on Data Engineering (ICDE 2008), 646--655. Google ScholarDigital Library
- Giles, C.L., Bollacker, K. and Lawrence, S. (1998). CiteSeer: An Automatic Citation Indexing System. Proceedings of the ACM Conference on Digital Libraries (DL 1998), 89--98. Google ScholarDigital Library
- Grudin, J. (1994). Groupware and Social Dynamics: Eight Challenges for Developers. Communications of the ACM 37(1). 92--105. Google ScholarDigital Library
- Hoffmann, R., Fogarty, J. and Weld, D.S. (2007). Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2007), 13--22. Google ScholarDigital Library
- Horvitz, E. (1999). Principles of Mixed-Initiative Interfaces. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999), 159--166. Google ScholarDigital Library
- Huynh, D.F., Miller, R.C. and Karger, D.R. (2006). Enabling Web Browsers to Augment Web Sites' Filtering and Sorting Functionalities. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2006), 125--134. Google ScholarDigital Library
- Kuznetsov, S. (2006). Motivations of Contributors to Wikipedia. ACM Computers and Society 36(2). 1--7. Google ScholarDigital Library
- Mankoff, J., Hudson, S.E. and Abowd, G.D. (2000). Interaction Techniques for Ambiguity Resolution in Recognition-Based Interfaces. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2000), 11--20. Google ScholarDigital Library
- McCann, R., Shen, W. and Doan, A. (2008). Matching Schemas in Online Communities: A Web 2.0 Approach. Proceedings of the IEEE International Conference on Data Engineering (ICDE 2008), 110--119. Google ScholarDigital Library
- McFarlane, D.C. (2002). Comparison of Four Primary Methods for Coordinating the Interruption of People in Human-Computer Interaction. Human-Computer Interaction 17(1). 63--139. Google ScholarDigital Library
- MediaWiki. http://www.mediawiki.org/.Google Scholar
- Priedhorsky, R., Chen, J., Lam, S.T., Panciera, K., Terveen, L. and Riedl, J. (2007). Creating, Destroying, and Restoring Value in Wikipedia. Proceedings of the ACM Conference on Supporting Group Work (GROUP 2007), 259--268. Google ScholarDigital Library
- Shilman, M., Tan, D.S. and Simard, P. (2006). CueTIP: A Mixed-Initiative Interface for Correcting Handwriting Errors. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2006), 323--332. Google ScholarDigital Library
- von Ahn, L. and Dabbish, L. (2004). Labeling Images with a Computer Game. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2004), 319--326. Google ScholarDigital Library
- von Ahn, L. and Dabbish, L. (2008). Designing Games with a Purpose. Communications of the ACM 51(8). 58--67. Google ScholarDigital Library
- Voss, J. (2005). Measuring Wikipedia. International Conference of the International Society for Scientometrics and Informetrics (ISSI 2005), 221--231.Google Scholar
- Wikipedia: AutoWikiBrowser. http://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser.Google Scholar
- Wikipedia: Be Bold. http://en.wikipedia.org/wiki/Wikipedia:Be_Bold.Google Scholar
- Wikipedia: Bot Policy. http://en.wikipedia.org/wiki/Wikipedia:Bots.Google Scholar
- Wikipedia: Cleanup Tags. http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup.Google Scholar
- Wu, F., Hoffman, R. and Weld, D.S. (2008). Information Extraction from Wikipedia: Moving Down the Long Tail. Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008), 731--739. Google ScholarDigital Library
- Wu, F. and Weld, D.S. (2007). Autonomously Semantifying Wikipedia. Proceedings of the ACM Conference on Information and Knowledge Management (CIKM 2007), 41--50. Google ScholarDigital Library
- Yee, K.-P., Swearingen, K., Li, K. and Hearst, M. (2003). Faceted Metadata for Image Search and Browsing. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2003), 401--408. Google ScholarDigital Library
Index Terms
Amplifying community content creation with mixed initiative information extraction
Recommendations
Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologiesNamed entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical InformaticsDue to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
Two learning approaches for protein name extraction
Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted ...
Comments