skip to main content
10.1145/3306618.3314244acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

Published:27 January 2019Publication History

ABSTRACT

Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits, as scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study, 2) presents new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018, 3) provides performance results on PPB by non-target companies Amazon and Kairos and, 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within 7 months of the original audit, we find that all three targets released new API versions. All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup, that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.

References

  1. Art Manion Allen D. Householder, Garret Wassermann and Chris King. 2017.The CERT Guide to Coordinated Vulnerability Disclosure. Government Technical Report. Carnegie Mellon University.Google ScholarGoogle Scholar
  2. David Garcia Alan Mislove Markus Strohmaier Aniko Hannak, Claudia Wagnerand Christo Wilson. 2017. Bias in Online Freelance Marketplaces: Evidence from Task Rabbit and Fiverr. In2017 ACM Conference. ACM, New York, NY, USA,1914--1933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Castlebar Asset Management John Chevedden Dominican Sisters of Hope Domini Impact Investments LLC Figure 8 Investment Strategies Forrest Hill PhD CFP Harrington Investments Inc. Maryknoll Sisters Mirova Northwest Coalition for Responsible Investment Sustain vest Asset Management The Social Equity Group The Sustainability Group of Loring Wolcott Coolidge Trans-formative Wealth Management LLC Ursuline Sisters of Tildonk US Province Walden Asset Management Zevin Asset Management Arjuna Capital, AsYou Sow. 2018. Letter From Shareholders To Amazon CEO Jeff Bezos Regarding Rekognition. Retrieved August 24, 2017 from https://www.aclu.org/letter/letter-shareholders-Amazon-ceo-jeff-bezos-regarding-rekognitionGoogle ScholarGoogle Scholar
  4. Brian Brackeen. 2018.Face Off: Confronting Bias in Face Recognition AI. Retrieved August 24, 2017 from https://www.Kairos.com/blog/face-off-confronting-bias-in-face-recognition-aiGoogle ScholarGoogle Scholar
  5. Brian Brackeen. 2018. Facial recognition software is not ready for use by law enforcement. Retrieved August 24, 2017 from https://techcrunch.com/2018/06/25/facial-recognition-software-is-not-ready-for-use-by-law-enforcement/Google ScholarGoogle Scholar
  6. Joshua C. Klontz Richard W. Vorder Bruegge Brendan F. Klare, Mark J. Burgeand Anil K. Jain. 2012. Face Recognition Performance: Role of Demographic Information. In IEEE Transactions on Information Forensics and Security, Vol. 7. IEEE, New York, NY, USA, 1789--1801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joy Buolamwini. 2017. Gender Shades. Retrieved August 24, 2017 from http://gendershades.org/Google ScholarGoogle Scholar
  8. Joy Buolamwini. 2018. When the Robot Doesn't See Dark Skin.Google ScholarGoogle Scholar
  9. Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research. Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  10. Jenna Burrell. 2016. How the machine 'thinks': Understanding opacity in machine learning algorithms. Big Data & Society(January 2016).Google ScholarGoogle Scholar
  11. Matt Cagle and Nicole Ozer. 2018. Amazon Teams Up With Government to Deploy Dangerous New Facial Recognition Technology.Google ScholarGoogle Scholar
  12. Karrie Karahalios Christian Sandvig, Kevin Hamilton and Cedric Langbort. 2014.Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Data and Discrimination, Converting Critical Concerns into Productive: A Pre-conference at the 64th Annual Meeting of the International Communication Association.Google ScholarGoogle Scholar
  13. Devin Coldewey. 2017. Sen. Harris Tells Federal Agencies to Get Serious AboutFacial Recognition Risks.Google ScholarGoogle Scholar
  14. Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Anti-discrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 8 (1989).Google ScholarGoogle Scholar
  15. Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making.Commun. ACM 59, 2 (January 2016), 56--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Benjamin Edelman and Michael Luca. 2014. Digital Discrimination: The Case ofAirbnb.com.SSRN Electronic Journal(2014).Google ScholarGoogle Scholar
  17. Motahhare Eslami, Kristen Vaccaro, Karrie Karahalios, and Kevin Hamilton. 2017.Be Careful, Things Can Be Worse than They Appear: Understanding Biased Algorithms and Users' Behavior Around Them in Rating Platforms. In ICWSM.Google ScholarGoogle Scholar
  18. Face++. 2018. Notice: newer version of Face Detect API. Retrieved August 24, 2017 from https://www.faceplusplus.com/blog/article/newer-version-face-detect-api/Google ScholarGoogle Scholar
  19. Christian Sandvig Gary Soeller, Karrie Karahalios and Christo Wilson. 2016.MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. J. ACM(2016), 867--878. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. IBM. 2018. IBM Principles for Trust and Transparency. Retrieved August 24, 2017 from https://www.ibm.com/blogs/policy/wp-content/uploads/2018/05/IBM_Principles_OnePage.pdfGoogle ScholarGoogle Scholar
  21. Fredrik D Johansson Irene Chen and David Sontag. 2018. Why Is My Classifier Discriminatory?. In arXiv preprint. arXiv.Google ScholarGoogle Scholar
  22. Vijay Erramilli and Nikolaos Laoutaris Jakub Mikians, Laszlo Gyarmati. 2012.Detecting price and search discrimination on the Internet. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI. ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Abhijit Narvekar Julianne Ayyad Jonathon Phillips, Fang Jiang and Alice J O'Toole. 2011. An other-race effect for face recognition algorithms. In ACM Transactions on Applied Perception (TAP), Vol. 8. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Johnnatan Messias Muhammad Bilal Zafar Saptarshi Ghosh Krishna P Gummadi Karrie Karahalios Juhi Kulshrestha, Motahhare Eslami. 2017. Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, New York, NY, USA, 417--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Surya Mattu Julia Angwin, Jeff Larson and Lauren Kirchner. 2016. Machine Bias. Retrieved August 24, 2017 from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle ScholarGoogle Scholar
  26. Amirhossein Aleyasen Karrie Karahalios Kevin Hamilton, Motahhare Eslami and Christian Sandvig. 2015. I always assumed that I wasn't really that close to {her}: Reasoning about invisible algorithms in the news feed. In Proceedings of 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christian Sandvig Kevin Hamilton, Karrie Karahalios and Motahhare Eslami. 2014. A path to understanding the effects of algorithm awareness. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (CHI EA '14). ACM, New York, NY, USA, 631--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alan Mislove Le Chen and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proceedings of 2015 ACM Conference. ACM, New York, NY, USA, 495--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mei Ngan Mei Ngan and Patrick Grother. 2015. Face recognition vendor test (FRVT) performance of automated gender classification algorithms. Government Technical Report. US Department of Commerce, National Institute of Standards and Technology.Google ScholarGoogle Scholar
  30. Sorelle A. Friedler Tionney Nix Gabriel Rybeck Carlos Scheidegger Brandon Smith Philip Adler, Casey Falk and Suresh Venkatasubramanian. 2018. Auditing black-box models for indirect influence. Knowledge and Information Systems 54, 1 (January 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S.A. Rizvi P.J. Phillips, Hyeonjoon Moon and P.J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. IEEE, 1090 -- 1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ruchir Puri. 2018. Mitigating Bias in AI Models. Retrieved August 24, 2017 from https://www.ibm.com/blogs/research/2018/02/mitigating-bias-ai-models/Google ScholarGoogle Scholar
  33. Franco Turini Dino Pedreschi Riccardo Guidotti, Anna Monreale and Fosca Giannotti. 2018. A Survey of Methods for Explaining Black Box Models. Comput. Surveys 51, 5 (February 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John Roach. 2018. Microsoft improves facial recognition technology to perform well across all skin tones, genders. Retrieved August 24, 2017 from https://blogs.microsoft.com/ai/gender-skin-tone-facial-recognition-improvement/Google ScholarGoogle Scholar
  35. Brent Mittelstadt Sandra Wachter and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology 31, 2 (2018).Google ScholarGoogle Scholar
  36. Brad Smith. 2018. Facial recognition technology: The need for public regulation and corporate responsibility.Google ScholarGoogle Scholar
  37. Jacob Snow. 2018. Amazon's Face Recognition Falsely Matched 28 Members of Congress With Mugshots. Retrieved August 24, 2017 from https://www.aclu.org/blog/privacy-technology/surveillance-technologies/ Amazons-face-recognition-falsely-matched-28Google ScholarGoogle Scholar
  38. Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel J. Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2015. Discovering Unwarranted Associations in Data-Driven Applications with the Fair Test Testing Toolkit. CoRR abs/1510.02377 (2015).Google ScholarGoogle Scholar

Index Terms

  1. Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
          January 2019
          577 pages
          ISBN:9781450363242
          DOI:10.1145/3306618

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 January 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate61of162submissions,38%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader