ABSTRACT
The Ensemble Portal harvests resources from multiple heterogeneous federated collections. Managing these dynamically increasing collections requires an automatic mechanism to categorize records in to corresponding topics. We propose an approach to use existing ACM DL metadata to build classifiers for harvested resources in the Ensemble project. We also present our experience with utilizing the Amazon Mechanical Turk platform to build ground truth training data sets from Ensemble collections.
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Comput. Surv.34, 1, 1--47. Google ScholarDigital Library
- Jain, A.K., Murty, M.N., and Flynn, P.J. (1999). Data clustering: a review. ACM Comput. Surv. 31, 3, 264--323. Google ScholarDigital Library
- Chen, G., Warren, J., and Riddle,P. (2010). Semantic Space models for classification of consumer webpages on metadata attributes. J. of Biomedical Informatics 43, 5, 725--735. Google ScholarDigital Library
- Meyer, M., Rensing, C., and Steinmetz, R. (2008). Using community-generated contents as a substitute corpus for metadata generation. Int. J. Adv. Media Comm. 2, 1, 59--72. Google ScholarDigital Library
- Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proc. of CHI 08. Google ScholarDigital Library
- Mason, W., & Suri, S. (2010). Conducting Behavioral Research on Amazon's Mechanical Turk. Behavior Research Methods, 5(5), 1--23.Google Scholar
- Chen, J. J., Menezes, N. J., Bradley, A. D., & North, T. A. (2011). Opportunities for Crowdsourcing Research on Amazon Mechanical Turk. Human Factors, 5, 3.Google Scholar
- Yetisgen-yildiz, M., Solti, I., Xia, F., & Halgrim, S. R. (2010). Preliminary Experiments with Amazon's Mechanical Turk for Annotating Medical Named Entities. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 180--183. Google ScholarDigital Library
Index Terms
- Categorization of computing education resources with utilization of crowdsourcing
Recommendations
A Community Rather Than A Union: Understanding Self-Organization Phenomenon on MTurk and How It Impacts Turkers and Requesters
CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing SystemsThis paper aims to understand the self-organization phenomenon among the workers of Amazon Mechanical Turk (MTurk), a well-known crowdsourcing platform. Specifically, we explored 1) why MTurk workers self-organize into online communities (Turker ...
Dynamic categorization of clinical research eligibility criteria by hierarchical clustering
Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design: The UMLS semantic types and a set of previously developed ...
A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems
Soft Computing and Intelligent Systems: Techniques and ApplicationsBreakthrough classification performances have been achieved by utilizing ensemble techniques in machine learning and data mining. Bagging is one such ensemble technique that has outperformed single models in obtaining higher predictive performances. This ...
Comments