research-article

Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

Authors:
Natã M. Barbosa

Syracuse University & Figure Eight Inc., Syracuse, NY, USA

Syracuse University & Figure Eight Inc., Syracuse, NY, USA
View Profile

,
Monchu Chen

Figure Eight Technologies, Inc., San Francisco, CA, USA

Figure Eight Technologies, Inc., San Francisco, CA, USA
View Profile

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsMay 2019Paper No.: 543Pages 1–12https://doi.org/10.1145/3290605.3300773

Published:02 May 2019Publication History

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Pages 1–12

ABSTRACT

The increased use of machine learning in recent years led to large volumes of data being manually labeled via crowdsourcing microtasks completed by humans. This brought about dehumanization effects, namely, when task requesters overlook the humans behind the task, leading to issues of ethics (e.g., unfair payment) and amplification of human biases, which are transferred into training data and affect machine learning in the real world. We propose a framework that allocates microtasks considering human factors of workers such as demographics and compensation. We deployed our framework to a popular crowdsourcing platform and conducted experiments with 1,919 workers collecting 160,345 human judgments. By routing microtasks to workers based on demographics and appropriate pay, our framework mitigates biases in the contributor sample and increases the hourly pay given to contributors. We discuss potential extensions and how it can promote transparency in crowdsourcing.

References

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81. Google ScholarDigital Library
Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women also Snowboard: Overcoming Bias in Captioning Models. In Proceedings of the European Conference on Computer Vision (ECCV). 771--787.Google ScholarDigital Library
Alexandra Balahur, Ralf Steinberger, Mijail Kabadjov, Vanni Zavarella, Erik Van Der Goot, Matina Halkia, Bruno Pouliquen, and Jenya Belyaeva. 2013. Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013).Google Scholar
Piyush Bansal, Carsten Eickhoff, and Thomas Hofmann. 2016. Active content-based crowdsourcing task selection. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 529--538. Google ScholarDigital Library
Justin Cheng and Dan Cosley. 2013. How annotation styles influence content and preferences. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 214--218. Google ScholarDigital Library
Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making. Commun. ACM 59, 2 (2016), 56--62. Google ScholarDigital Library
Djellel Eddine Difallah, Gianluca Demartini, and Philippe CudréMauroux. 2013. Pick-a-crowd: tell me what you like, and i'll tell you what to do. In Proceedings of the 22nd international conference on World Wide Web. ACM, 367--374. Google ScholarDigital Library
Zhenhua Dong, Chuan Shi, Shilad Sen, Loren Terveen, and John Riedl. 2012. War versus inspirational in forrest gump: Cultural effects in tagging communities. In Sixth International AAAI Conference on Weblogs and Social Media.Google Scholar
Carsten Eickhoff. 2018. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 162--170. Google ScholarDigital Library
Figure-Eight.com. 2018. Machine Learning, Training Data, and Artificial Intelligence Platform. Retrieved July 3, 2018 from http: //www.figure-eight.comGoogle Scholar
Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81--85.Google ScholarDigital Library
Ujwal Gadiraju, Patrick Siehndel, Besnik Fetahu, and Ricardo Kawase. 2015. Breaking bad: understanding behavior of crowd workers in categorization microtasks. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 33--38. Google ScholarDigital Library
Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 449. Google ScholarDigital Library
Lilly C Irani and M Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 611--620. Google ScholarDigital Library
Nila Banerjee Jeffrey P. Bigham, Kristin Williams and John Zimmerman. 2017. Scopist: Building a Skill Ladder into Crowd Work. In Proceedings of the Web for All Conference (W4A '17). ACM, New York, NY, USA, 10. Google ScholarDigital Library
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1941--1944. Google ScholarDigital Library
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information retrieval 16, 2 (2013), 138--178. Google ScholarDigital Library
Sarah Kessler. 2018. The Crazy Hacks One Woman Used to Make Money on Mechanical Turk. Retrieved September 18, 2018 from https://www.wired.com/story/ the-crazy-hacks-one-woman-used-to-make-money-on-mechanical-turk/Google Scholar
Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. 2012. Undoing the damage of dataset bias. In European Conference on Computer Vision. Springer, 158--171. Google ScholarDigital Library
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarDigital Library
Tanushree Mitra, Clayton J Hutto, and Eric Gilbert. 2015. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarDigital Library
MTurk.com. 2018. Amazon Mechanical Turk. Retrieved July 3, 2018 from http://www.mturk.comGoogle Scholar
Dong Nguyen, Dolf Trieschnigg, A Seza Dogruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska De Jong. 2014. Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1950--1961.Google Scholar
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2016. Social data: Biases, methodological pitfalls, and ethical boundaries. (2016).Google Scholar
International Labour Organization. 2018. International Labour Organization - ILO Stat. Retrieved September 18, 2018 from https: //www.ilo.org/ilostat/faces/wcnav_defaultSelectionGoogle Scholar
Jahna Otterbacher, Alessandro Checco, Gianluca Demartini, and Paul Clough. 2018. Investigating user perception of gender bias in image search: the role of sexism. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 933--936. Google ScholarDigital Library
Sinno Jialin Pan, Qiang Yang, et al. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
Pygmo. 2018. PyGMO - A Scientific Library for Massively Parallel Optimization. Retrieved September 18, 2018 from https://esa.github. io/pagmo2Google Scholar
Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. 2011. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. ICWSM 11 (2011), 17--21.Google Scholar
Joel Ross, Lilly Irani, M Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarDigital Library
Jeffrey A. Ryan and Joshua M. Ulrich. 2018. quantmod: Quantitative Financial Modelling Framework. https://CRAN.R-project.org/package= quantmod R package version 0.4--13.Google Scholar
Niloufar Salehi, Lilly C Irani, Michael S Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, et al. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 1621--1630. Google ScholarDigital Library
Jan Schnellenbach. 2012. Nudges and norms: On the political economy of soft paternalism. European Journal of Political Economy 28, 2 (2012), 266--277.Google ScholarCross Ref
Shilad Sen, Margaret E Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao Ken Wang, and Brent Hecht. 2015. Turkers, scholars, arafat and peace: Cultural communities and algorithmic gold standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 826--838. Google ScholarDigital Library
Aaron D Shaw, John J Horton, and Daniel L Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarDigital Library
Edwin Simpson and Stephen Roberts. 2015. Bayesian methods for intelligent task assignment in crowdsourcing systems. In Decision Making: Uncertainty, Imperfection, Deliberation and Scalability. Springer, 1--32.Google Scholar
TaskRabbit.com. 2018. TaskRabbit connects you to safe and reliable help in your neighborhood. Retrieved July 3, 2018 from http://www. taskrabbit.comGoogle Scholar
Jacob Thebault-Spieker, Daniel Kluver, Maximilian A Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A Konstan. 2017. Simulation Experiments On (The Absence of) Ratings Bias in Reputation Systems. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 101. Google ScholarDigital Library
Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. 2017. A deeper look at dataset bias. In Domain Adaptation in Computer Vision Applications. Springer, 37--55.Google Scholar
Kotaro Hara Toni Kaplan, Susumu Saito and Jeffrey P. Bigham. 2018. Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018).Google Scholar
Mark E Whiting, Dilrukshi Gamage, Snehalkumar S Gaikwad, Aaron Gilbee, Shirish Goyal, Alipta Ballav, Dinesh Majeti, Nalin Chhibber, Angela Richmond-Fuller, Freddie Vargus, et al. 2016. Crowd guilds: Worker-led reputation and feedback on crowdsourcing platforms. arXiv preprint arXiv:1611.01572 (2016). Google ScholarDigital Library
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017).Google Scholar

Index Terms

Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Crowdsourced data acquired from tasks that comprise a subjective component (e.g. opinion detection, sentiment analysis) is potentially affected by the inherent bias of crowd workers who contribute to the tasks. This can lead to biased and noisy ground-...
Read More
On-chain behavior prediction Machine Learning model for blockchain-based crowdsourcing
Abstract
In this paper, we address the problem of behavior prediction for task allocation in blockchain-based crowdsourcing framework. Centralized crowdsourcing frameworks complement workers’ reputations with predicted behavior, through Machine ...
Highlights
- A behavior prediction ML model trained Off-chain and deployed on-chain for workers.
Read More
Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
May 2019
9077 pages
ISBN:9781450359702
DOI:10.1145/3290605
General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bias
crowdsourcing
ethics
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 1,363
  Total Downloads
- Downloads (Last 12 months)237
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments

On-chain behavior prediction Machine Learning model for blockchain-based crowdsourcing

Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media