skip to main content
10.1145/3290605.3300773acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

Authors Info & Claims
Published:02 May 2019Publication History

ABSTRACT

The increased use of machine learning in recent years led to large volumes of data being manually labeled via crowdsourcing microtasks completed by humans. This brought about dehumanization effects, namely, when task requesters overlook the humans behind the task, leading to issues of ethics (e.g., unfair payment) and amplification of human biases, which are transferred into training data and affect machine learning in the real world. We propose a framework that allocates microtasks considering human factors of workers such as demographics and compensation. We deployed our framework to a popular crowdsourcing platform and conducted experiments with 1,919 workers collecting 160,345 human judgments. By routing microtasks to workers based on demographics and appropriate pay, our framework mitigates biases in the contributor sample and increases the hourly pay given to contributors. We discuss potential extensions and how it can promote transparency in crowdsourcing.

References

  1. Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women also Snowboard: Overcoming Bias in Captioning Models. In Proceedings of the European Conference on Computer Vision (ECCV). 771--787.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexandra Balahur, Ralf Steinberger, Mijail Kabadjov, Vanni Zavarella, Erik Van Der Goot, Matina Halkia, Bruno Pouliquen, and Jenya Belyaeva. 2013. Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013).Google ScholarGoogle Scholar
  4. Piyush Bansal, Carsten Eickhoff, and Thomas Hofmann. 2016. Active content-based crowdsourcing task selection. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 529--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Justin Cheng and Dan Cosley. 2013. How annotation styles influence content and preferences. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, 214--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making. Commun. ACM 59, 2 (2016), 56--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Djellel Eddine Difallah, Gianluca Demartini, and Philippe CudréMauroux. 2013. Pick-a-crowd: tell me what you like, and i'll tell you what to do. In Proceedings of the 22nd international conference on World Wide Web. ACM, 367--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zhenhua Dong, Chuan Shi, Shilad Sen, Loren Terveen, and John Riedl. 2012. War versus inspirational in forrest gump: Cultural effects in tagging communities. In Sixth International AAAI Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  9. Carsten Eickhoff. 2018. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 162--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Figure-Eight.com. 2018. Machine Learning, Training Data, and Artificial Intelligence Platform. Retrieved July 3, 2018 from http: //www.figure-eight.comGoogle ScholarGoogle Scholar
  11. Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ujwal Gadiraju, Patrick Siehndel, Besnik Fetahu, and Ricardo Kawase. 2015. Breaking bad: understanding behavior of crowd workers in categorization microtasks. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 33--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lilly C Irani and M Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 611--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nila Banerjee Jeffrey P. Bigham, Kristin Williams and John Zimmerman. 2017. Scopist: Building a Skill Ladder into Crowd Work. In Proceedings of the Web for All Conference (W4A '17). ACM, New York, NY, USA, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2011. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1941--1944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information retrieval 16, 2 (2013), 138--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sarah Kessler. 2018. The Crazy Hacks One Woman Used to Make Money on Mechanical Turk. Retrieved September 18, 2018 from https://www.wired.com/story/ the-crazy-hacks-one-woman-used-to-make-money-on-mechanical-turk/Google ScholarGoogle Scholar
  19. Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. 2012. Undoing the damage of dataset bias. In European Conference on Computer Vision. Springer, 158--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tanushree Mitra, Clayton J Hutto, and Eric Gilbert. 2015. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. MTurk.com. 2018. Amazon Mechanical Turk. Retrieved July 3, 2018 from http://www.mturk.comGoogle ScholarGoogle Scholar
  23. Dong Nguyen, Dolf Trieschnigg, A Seza Dogruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska De Jong. 2014. Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1950--1961.Google ScholarGoogle Scholar
  24. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2016. Social data: Biases, methodological pitfalls, and ethical boundaries. (2016).Google ScholarGoogle Scholar
  25. International Labour Organization. 2018. International Labour Organization - ILO Stat. Retrieved September 18, 2018 from https: //www.ilo.org/ilostat/faces/wcnav_defaultSelectionGoogle ScholarGoogle Scholar
  26. Jahna Otterbacher, Alessandro Checco, Gianluca Demartini, and Paul Clough. 2018. Investigating user perception of gender bias in image search: the role of sexism. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 933--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sinno Jialin Pan, Qiang Yang, et al. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pygmo. 2018. PyGMO - A Scientific Library for Massively Parallel Optimization. Retrieved September 18, 2018 from https://esa.github. io/pagmo2Google ScholarGoogle Scholar
  29. Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. 2011. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. ICWSM 11 (2011), 17--21.Google ScholarGoogle Scholar
  30. Joel Ross, Lilly Irani, M Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jeffrey A. Ryan and Joshua M. Ulrich. 2018. quantmod: Quantitative Financial Modelling Framework. https://CRAN.R-project.org/package= quantmod R package version 0.4--13.Google ScholarGoogle Scholar
  32. Niloufar Salehi, Lilly C Irani, Michael S Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, et al. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 1621--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jan Schnellenbach. 2012. Nudges and norms: On the political economy of soft paternalism. European Journal of Political Economy 28, 2 (2012), 266--277.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shilad Sen, Margaret E Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao Ken Wang, and Brent Hecht. 2015. Turkers, scholars, arafat and peace: Cultural communities and algorithmic gold standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 826--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aaron D Shaw, John J Horton, and Daniel L Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Edwin Simpson and Stephen Roberts. 2015. Bayesian methods for intelligent task assignment in crowdsourcing systems. In Decision Making: Uncertainty, Imperfection, Deliberation and Scalability. Springer, 1--32.Google ScholarGoogle Scholar
  37. TaskRabbit.com. 2018. TaskRabbit connects you to safe and reliable help in your neighborhood. Retrieved July 3, 2018 from http://www. taskrabbit.comGoogle ScholarGoogle Scholar
  38. Jacob Thebault-Spieker, Daniel Kluver, Maximilian A Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A Konstan. 2017. Simulation Experiments On (The Absence of) Ratings Bias in Reputation Systems. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. 2017. A deeper look at dataset bias. In Domain Adaptation in Computer Vision Applications. Springer, 37--55.Google ScholarGoogle Scholar
  40. Kotaro Hara Toni Kaplan, Susumu Saito and Jeffrey P. Bigham. 2018. Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018).Google ScholarGoogle Scholar
  41. Mark E Whiting, Dilrukshi Gamage, Snehalkumar S Gaikwad, Aaron Gilbee, Shirish Goyal, Alipta Ballav, Dinesh Majeti, Nalin Chhibber, Angela Richmond-Fuller, Freddie Vargus, et al. 2016. Crowd guilds: Worker-led reputation and feedback on crowdsourcing platforms. arXiv preprint arXiv:1611.01572 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017).Google ScholarGoogle Scholar

Index Terms

  1. Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
      May 2019
      9077 pages
      ISBN:9781450359702
      DOI:10.1145/3290605

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format