research-article

Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

Authors:
Inioluwa Deborah Raji

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Joy Buolamwini

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and SocietyJanuary 2019Pages 429–435https://doi.org/10.1145/3306618.3314244

Published:27 January 2019Publication History

AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society

Pages 429–435

ABSTRACT

Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits, as scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study, 2) presents new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018, 3) provides performance results on PPB by non-target companies Amazon and Kairos and, 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within 7 months of the original audit, we find that all three targets released new API versions. All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup, that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.

References

Art Manion Allen D. Householder, Garret Wassermann and Chris King. 2017.The CERT Guide to Coordinated Vulnerability Disclosure. Government Technical Report. Carnegie Mellon University.Google Scholar
David Garcia Alan Mislove Markus Strohmaier Aniko Hannak, Claudia Wagnerand Christo Wilson. 2017. Bias in Online Freelance Marketplaces: Evidence from Task Rabbit and Fiverr. In2017 ACM Conference. ACM, New York, NY, USA,1914--1933. Google ScholarDigital Library
Castlebar Asset Management John Chevedden Dominican Sisters of Hope Domini Impact Investments LLC Figure 8 Investment Strategies Forrest Hill PhD CFP Harrington Investments Inc. Maryknoll Sisters Mirova Northwest Coalition for Responsible Investment Sustain vest Asset Management The Social Equity Group The Sustainability Group of Loring Wolcott Coolidge Trans-formative Wealth Management LLC Ursuline Sisters of Tildonk US Province Walden Asset Management Zevin Asset Management Arjuna Capital, AsYou Sow. 2018. Letter From Shareholders To Amazon CEO Jeff Bezos Regarding Rekognition. Retrieved August 24, 2017 from https://www.aclu.org/letter/letter-shareholders-Amazon-ceo-jeff-bezos-regarding-rekognitionGoogle Scholar
Brian Brackeen. 2018.Face Off: Confronting Bias in Face Recognition AI. Retrieved August 24, 2017 from https://www.Kairos.com/blog/face-off-confronting-bias-in-face-recognition-aiGoogle Scholar
Brian Brackeen. 2018. Facial recognition software is not ready for use by law enforcement. Retrieved August 24, 2017 from https://techcrunch.com/2018/06/25/facial-recognition-software-is-not-ready-for-use-by-law-enforcement/Google Scholar
Joshua C. Klontz Richard W. Vorder Bruegge Brendan F. Klare, Mark J. Burgeand Anil K. Jain. 2012. Face Recognition Performance: Role of Demographic Information. In IEEE Transactions on Information Forensics and Security, Vol. 7. IEEE, New York, NY, USA, 1789--1801. Google ScholarDigital Library
Joy Buolamwini. 2017. Gender Shades. Retrieved August 24, 2017 from http://gendershades.org/Google Scholar
Joy Buolamwini. 2018. When the Robot Doesn't See Dark Skin.Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research. Conference on Fairness, Accountability, and Transparency.Google Scholar
Jenna Burrell. 2016. How the machine 'thinks': Understanding opacity in machine learning algorithms. Big Data & Society(January 2016).Google Scholar
Matt Cagle and Nicole Ozer. 2018. Amazon Teams Up With Government to Deploy Dangerous New Facial Recognition Technology.Google Scholar
Karrie Karahalios Christian Sandvig, Kevin Hamilton and Cedric Langbort. 2014.Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Data and Discrimination, Converting Critical Concerns into Productive: A Pre-conference at the 64th Annual Meeting of the International Communication Association.Google Scholar
Devin Coldewey. 2017. Sen. Harris Tells Federal Agencies to Get Serious AboutFacial Recognition Risks.Google Scholar
Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Anti-discrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 8 (1989).Google Scholar
Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making.Commun. ACM 59, 2 (January 2016), 56--62. Google ScholarDigital Library
Benjamin Edelman and Michael Luca. 2014. Digital Discrimination: The Case ofAirbnb.com.SSRN Electronic Journal(2014).Google Scholar
Motahhare Eslami, Kristen Vaccaro, Karrie Karahalios, and Kevin Hamilton. 2017.Be Careful, Things Can Be Worse than They Appear: Understanding Biased Algorithms and Users' Behavior Around Them in Rating Platforms. In ICWSM.Google Scholar
Face++. 2018. Notice: newer version of Face Detect API. Retrieved August 24, 2017 from https://www.faceplusplus.com/blog/article/newer-version-face-detect-api/Google Scholar
Christian Sandvig Gary Soeller, Karrie Karahalios and Christo Wilson. 2016.MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. J. ACM(2016), 867--878. Google ScholarDigital Library
IBM. 2018. IBM Principles for Trust and Transparency. Retrieved August 24, 2017 from https://www.ibm.com/blogs/policy/wp-content/uploads/2018/05/IBM_Principles_OnePage.pdfGoogle Scholar
Fredrik D Johansson Irene Chen and David Sontag. 2018. Why Is My Classifier Discriminatory?. In arXiv preprint. arXiv.Google Scholar
Vijay Erramilli and Nikolaos Laoutaris Jakub Mikians, Laszlo Gyarmati. 2012.Detecting price and search discrimination on the Internet. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI. ACM, New York, NY, USA. Google ScholarDigital Library
Abhijit Narvekar Julianne Ayyad Jonathon Phillips, Fang Jiang and Alice J O'Toole. 2011. An other-race effect for face recognition algorithms. In ACM Transactions on Applied Perception (TAP), Vol. 8. ACM Press. Google ScholarDigital Library
Johnnatan Messias Muhammad Bilal Zafar Saptarshi Ghosh Krishna P Gummadi Karrie Karahalios Juhi Kulshrestha, Motahhare Eslami. 2017. Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, New York, NY, USA, 417--432. Google ScholarDigital Library
Surya Mattu Julia Angwin, Jeff Larson and Lauren Kirchner. 2016. Machine Bias. Retrieved August 24, 2017 from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
Amirhossein Aleyasen Karrie Karahalios Kevin Hamilton, Motahhare Eslami and Christian Sandvig. 2015. I always assumed that I wasn't really that close to {her}: Reasoning about invisible algorithms in the news feed. In Proceedings of 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA. Google ScholarDigital Library
Christian Sandvig Kevin Hamilton, Karrie Karahalios and Motahhare Eslami. 2014. A path to understanding the effects of algorithm awareness. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (CHI EA '14). ACM, New York, NY, USA, 631--642. Google ScholarDigital Library
Alan Mislove Le Chen and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proceedings of 2015 ACM Conference. ACM, New York, NY, USA, 495--508. Google ScholarDigital Library
Mei Ngan Mei Ngan and Patrick Grother. 2015. Face recognition vendor test (FRVT) performance of automated gender classification algorithms. Government Technical Report. US Department of Commerce, National Institute of Standards and Technology.Google Scholar
Sorelle A. Friedler Tionney Nix Gabriel Rybeck Carlos Scheidegger Brandon Smith Philip Adler, Casey Falk and Suresh Venkatasubramanian. 2018. Auditing black-box models for indirect influence. Knowledge and Information Systems 54, 1 (January 2018). Google ScholarDigital Library
S.A. Rizvi P.J. Phillips, Hyeonjoon Moon and P.J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. IEEE, 1090 -- 1104. Google ScholarDigital Library
Ruchir Puri. 2018. Mitigating Bias in AI Models. Retrieved August 24, 2017 from https://www.ibm.com/blogs/research/2018/02/mitigating-bias-ai-models/Google Scholar
Franco Turini Dino Pedreschi Riccardo Guidotti, Anna Monreale and Fosca Giannotti. 2018. A Survey of Methods for Explaining Black Box Models. Comput. Surveys 51, 5 (February 2018). Google ScholarDigital Library
John Roach. 2018. Microsoft improves facial recognition technology to perform well across all skin tones, genders. Retrieved August 24, 2017 from https://blogs.microsoft.com/ai/gender-skin-tone-facial-recognition-improvement/Google Scholar
Brent Mittelstadt Sandra Wachter and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology 31, 2 (2018).Google Scholar
Brad Smith. 2018. Facial recognition technology: The need for public regulation and corporate responsibility.Google Scholar
Jacob Snow. 2018. Amazon's Face Recognition Falsely Matched 28 Members of Congress With Mugshots. Retrieved August 24, 2017 from https://www.aclu.org/blog/privacy-technology/surveillance-technologies/ Amazons-face-recognition-falsely-matched-28Google Scholar
Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel J. Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2015. Discovering Unwarranted Associations in Data-Driven Applications with the Fair Test Testing Toolkit. CoRR abs/1510.02377 (2015).Google Scholar

Index Terms

Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Biometrics
2. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Codes of ethics

Recommendations

Actionable Auditing Revisited: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits and continue to find it difficult to translate such independent ...
Read More
Ethical Artificial Intelligence in Telerehabilitation of Neurodevelopmental Disorders: A Position Paper
Computational Science and Its Applications – ICCSA 2023 Workshops
Abstract
Neurodevelopmental disorders are a cluster of mental disorders with neurobiological origins that occur during the development of children and lead to cognitive deficits with possible behavioral and emotional consequences. Intensive and ...
Read More
Exploring Why Underrepresented Students Are Less Likely to Study Machine Learning and Artificial Intelligence
ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1

There is little research on why underrepresented minorities are less likely to specifically study Machine Learning and Artificial Intelligence (ML/AI). We surveyed 159 undergraduate students about their interest in, exposure to, and personal views on ML/...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
January 2019
577 pages
ISBN:9781450363242
DOI:10.1145/3306618
General Chairs:
Vincent Conitzer
Duke University, USA
,
Gillian Hadfield
University of Toronto + Vector Institute, Canada; OpenAI, USA
,
Shannon Vallor
Santa Clara University, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
artificial intelligence
commercial applications
computer vision
ethics
facial recognition
fairness
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate61of162submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 226
  Total Citations
  View Citations
- 3,396
  Total Downloads
- Downloads (Last 12 months)649
- Downloads (Last 6 weeks)79
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society

ABSTRACT

References

Cited By

Index Terms

Recommendations

Actionable Auditing Revisited: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products

Ethical Artificial Intelligence in Telerehabilitation of Neurodevelopmental Disorders: A Position Paper

Exploring Why Underrepresented Students Are Less Likely to Study Machine Learning and Artificial Intelligence