ABSTRACT
The increased use of machine learning in recent years led to large volumes of data being manually labeled via crowdsourcing microtasks completed by humans. This brought about dehumanization effects, namely, when task requesters overlook the humans behind the task, leading to issues of ethics (e.g., unfair payment) and amplification of human biases, which are transferred into training data and affect machine learning in the real world. We propose a framework that allocates microtasks considering human factors of workers such as demographics and compensation. We deployed our framework to a popular crowdsourcing platform and conducted experiments with 1,919 workers collecting 160,345 human judgments. By routing microtasks to workers based on demographics and appropriate pay, our framework mitigates biases in the contributor sample and increases the hourly pay given to contributors. We discuss potential extensions and how it can promote transparency in crowdsourcing.
- Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81. Google ScholarDigital Library
- Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women also Snowboard: Overcoming Bias in Captioning Models. In Proceedings of the European Conference on Computer Vision (ECCV). 771--787.Google ScholarDigital Library
- Alexandra Balahur, Ralf Steinberger, Mijail Kabadjov, Vanni Zavarella, Erik Van Der Goot, Matina Halkia, Bruno Pouliquen, and Jenya Belyaeva. 2013. Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013).Google Scholar
- Piyush Bansal, Carsten Eickhoff, and Thomas Hofmann. 2016. Active content-based crowdsourcing task selection. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 529--538. Google ScholarDigital Library
- Justin Cheng and Dan Cosley. 2013. How annotation styles influence content and preferences. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 214--218. Google ScholarDigital Library
- Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making. Commun. ACM 59, 2 (2016), 56--62. Google ScholarDigital Library
- Djellel Eddine Difallah, Gianluca Demartini, and Philippe CudréMauroux. 2013. Pick-a-crowd: tell me what you like, and i'll tell you what to do. In Proceedings of the 22nd international conference on World Wide Web. ACM, 367--374. Google ScholarDigital Library
- Zhenhua Dong, Chuan Shi, Shilad Sen, Loren Terveen, and John Riedl. 2012. War versus inspirational in forrest gump: Cultural effects in tagging communities. In Sixth International AAAI Conference on Weblogs and Social Media.Google Scholar
- Carsten Eickhoff. 2018. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 162--170. Google ScholarDigital Library
- Figure-Eight.com. 2018. Machine Learning, Training Data, and Artificial Intelligence Platform. Retrieved July 3, 2018 from http: //www.figure-eight.comGoogle Scholar
- Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81--85.Google ScholarDigital Library
- Ujwal Gadiraju, Patrick Siehndel, Besnik Fetahu, and Ricardo Kawase. 2015. Breaking bad: understanding behavior of crowd workers in categorization microtasks. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 33--38. Google ScholarDigital Library
- Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 449. Google ScholarDigital Library
- Lilly C Irani and M Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 611--620. Google ScholarDigital Library
- Nila Banerjee Jeffrey P. Bigham, Kristin Williams and John Zimmerman. 2017. Scopist: Building a Skill Ladder into Crowd Work. In Proceedings of the Web for All Conference (W4A '17). ACM, New York, NY, USA, 10. Google ScholarDigital Library
- Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1941--1944. Google ScholarDigital Library
- Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information retrieval 16, 2 (2013), 138--178. Google ScholarDigital Library
- Sarah Kessler. 2018. The Crazy Hacks One Woman Used to Make Money on Mechanical Turk. Retrieved September 18, 2018 from https://www.wired.com/story/ the-crazy-hacks-one-woman-used-to-make-money-on-mechanical-turk/Google Scholar
- Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. 2012. Undoing the damage of dataset bias. In European Conference on Computer Vision. Springer, 158--171. Google ScholarDigital Library
- Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarDigital Library
- Tanushree Mitra, Clayton J Hutto, and Eric Gilbert. 2015. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarDigital Library
- MTurk.com. 2018. Amazon Mechanical Turk. Retrieved July 3, 2018 from http://www.mturk.comGoogle Scholar
- Dong Nguyen, Dolf Trieschnigg, A Seza Dogruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska De Jong. 2014. Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1950--1961.Google Scholar
- Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2016. Social data: Biases, methodological pitfalls, and ethical boundaries. (2016).Google Scholar
- International Labour Organization. 2018. International Labour Organization - ILO Stat. Retrieved September 18, 2018 from https: //www.ilo.org/ilostat/faces/wcnav_defaultSelectionGoogle Scholar
- Jahna Otterbacher, Alessandro Checco, Gianluca Demartini, and Paul Clough. 2018. Investigating user perception of gender bias in image search: the role of sexism. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 933--936. Google ScholarDigital Library
- Sinno Jialin Pan, Qiang Yang, et al. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
- Pygmo. 2018. PyGMO - A Scientific Library for Massively Parallel Optimization. Retrieved September 18, 2018 from https://esa.github. io/pagmo2Google Scholar
- Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. 2011. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. ICWSM 11 (2011), 17--21.Google Scholar
- Joel Ross, Lilly Irani, M Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarDigital Library
- Jeffrey A. Ryan and Joshua M. Ulrich. 2018. quantmod: Quantitative Financial Modelling Framework. https://CRAN.R-project.org/package= quantmod R package version 0.4--13.Google Scholar
- Niloufar Salehi, Lilly C Irani, Michael S Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, et al. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 1621--1630. Google ScholarDigital Library
- Jan Schnellenbach. 2012. Nudges and norms: On the political economy of soft paternalism. European Journal of Political Economy 28, 2 (2012), 266--277.Google ScholarCross Ref
- Shilad Sen, Margaret E Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao Ken Wang, and Brent Hecht. 2015. Turkers, scholars, arafat and peace: Cultural communities and algorithmic gold standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 826--838. Google ScholarDigital Library
- Aaron D Shaw, John J Horton, and Daniel L Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarDigital Library
- Edwin Simpson and Stephen Roberts. 2015. Bayesian methods for intelligent task assignment in crowdsourcing systems. In Decision Making: Uncertainty, Imperfection, Deliberation and Scalability. Springer, 1--32.Google Scholar
- TaskRabbit.com. 2018. TaskRabbit connects you to safe and reliable help in your neighborhood. Retrieved July 3, 2018 from http://www. taskrabbit.comGoogle Scholar
- Jacob Thebault-Spieker, Daniel Kluver, Maximilian A Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A Konstan. 2017. Simulation Experiments On (The Absence of) Ratings Bias in Reputation Systems. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 101. Google ScholarDigital Library
- Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. 2017. A deeper look at dataset bias. In Domain Adaptation in Computer Vision Applications. Springer, 37--55.Google Scholar
- Kotaro Hara Toni Kaplan, Susumu Saito and Jeffrey P. Bigham. 2018. Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018).Google Scholar
- Mark E Whiting, Dilrukshi Gamage, Snehalkumar S Gaikwad, Aaron Gilbee, Shirish Goyal, Alipta Ballav, Dinesh Majeti, Nalin Chhibber, Angela Richmond-Fuller, Freddie Vargus, et al. 2016. Crowd guilds: Worker-led reputation and feedback on crowdsourcing platforms. arXiv preprint arXiv:1611.01572 (2016). Google ScholarDigital Library
- Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017).Google Scholar
Index Terms
- Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning
Recommendations
Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsCrowdsourced data acquired from tasks that comprise a subjective component (e.g. opinion detection, sentiment analysis) is potentially affected by the inherent bias of crowd workers who contribute to the tasks. This can lead to biased and noisy ground-...
On-chain behavior prediction Machine Learning model for blockchain-based crowdsourcing
AbstractIn this paper, we address the problem of behavior prediction for task allocation in blockchain-based crowdsourcing framework. Centralized crowdsourcing frameworks complement workers’ reputations with predicted behavior, through Machine ...
Highlights- A behavior prediction ML model trained Off-chain and deployed on-chain for workers.
Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningCrowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...
Comments