ABSTRACT
Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits, as scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study, 2) presents new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018, 3) provides performance results on PPB by non-target companies Amazon and Kairos and, 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within 7 months of the original audit, we find that all three targets released new API versions. All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup, that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.
- Art Manion Allen D. Householder, Garret Wassermann and Chris King. 2017.The CERT Guide to Coordinated Vulnerability Disclosure. Government Technical Report. Carnegie Mellon University.Google Scholar
- David Garcia Alan Mislove Markus Strohmaier Aniko Hannak, Claudia Wagnerand Christo Wilson. 2017. Bias in Online Freelance Marketplaces: Evidence from Task Rabbit and Fiverr. In2017 ACM Conference. ACM, New York, NY, USA,1914--1933. Google ScholarDigital Library
- Castlebar Asset Management John Chevedden Dominican Sisters of Hope Domini Impact Investments LLC Figure 8 Investment Strategies Forrest Hill PhD CFP Harrington Investments Inc. Maryknoll Sisters Mirova Northwest Coalition for Responsible Investment Sustain vest Asset Management The Social Equity Group The Sustainability Group of Loring Wolcott Coolidge Trans-formative Wealth Management LLC Ursuline Sisters of Tildonk US Province Walden Asset Management Zevin Asset Management Arjuna Capital, AsYou Sow. 2018. Letter From Shareholders To Amazon CEO Jeff Bezos Regarding Rekognition. Retrieved August 24, 2017 from https://www.aclu.org/letter/letter-shareholders-Amazon-ceo-jeff-bezos-regarding-rekognitionGoogle Scholar
- Brian Brackeen. 2018.Face Off: Confronting Bias in Face Recognition AI. Retrieved August 24, 2017 from https://www.Kairos.com/blog/face-off-confronting-bias-in-face-recognition-aiGoogle Scholar
- Brian Brackeen. 2018. Facial recognition software is not ready for use by law enforcement. Retrieved August 24, 2017 from https://techcrunch.com/2018/06/25/facial-recognition-software-is-not-ready-for-use-by-law-enforcement/Google Scholar
- Joshua C. Klontz Richard W. Vorder Bruegge Brendan F. Klare, Mark J. Burgeand Anil K. Jain. 2012. Face Recognition Performance: Role of Demographic Information. In IEEE Transactions on Information Forensics and Security, Vol. 7. IEEE, New York, NY, USA, 1789--1801. Google ScholarDigital Library
- Joy Buolamwini. 2017. Gender Shades. Retrieved August 24, 2017 from http://gendershades.org/Google Scholar
- Joy Buolamwini. 2018. When the Robot Doesn't See Dark Skin.Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research. Conference on Fairness, Accountability, and Transparency.Google Scholar
- Jenna Burrell. 2016. How the machine 'thinks': Understanding opacity in machine learning algorithms. Big Data & Society(January 2016).Google Scholar
- Matt Cagle and Nicole Ozer. 2018. Amazon Teams Up With Government to Deploy Dangerous New Facial Recognition Technology.Google Scholar
- Karrie Karahalios Christian Sandvig, Kevin Hamilton and Cedric Langbort. 2014.Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Data and Discrimination, Converting Critical Concerns into Productive: A Pre-conference at the 64th Annual Meeting of the International Communication Association.Google Scholar
- Devin Coldewey. 2017. Sen. Harris Tells Federal Agencies to Get Serious AboutFacial Recognition Risks.Google Scholar
- Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Anti-discrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 8 (1989).Google Scholar
- Nicholas Diakopoulos. 2016. Accountability in algorithmic decision making.Commun. ACM 59, 2 (January 2016), 56--62. Google ScholarDigital Library
- Benjamin Edelman and Michael Luca. 2014. Digital Discrimination: The Case ofAirbnb.com.SSRN Electronic Journal(2014).Google Scholar
- Motahhare Eslami, Kristen Vaccaro, Karrie Karahalios, and Kevin Hamilton. 2017.Be Careful, Things Can Be Worse than They Appear: Understanding Biased Algorithms and Users' Behavior Around Them in Rating Platforms. In ICWSM.Google Scholar
- Face++. 2018. Notice: newer version of Face Detect API. Retrieved August 24, 2017 from https://www.faceplusplus.com/blog/article/newer-version-face-detect-api/Google Scholar
- Christian Sandvig Gary Soeller, Karrie Karahalios and Christo Wilson. 2016.MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. J. ACM(2016), 867--878. Google ScholarDigital Library
- IBM. 2018. IBM Principles for Trust and Transparency. Retrieved August 24, 2017 from https://www.ibm.com/blogs/policy/wp-content/uploads/2018/05/IBM_Principles_OnePage.pdfGoogle Scholar
- Fredrik D Johansson Irene Chen and David Sontag. 2018. Why Is My Classifier Discriminatory?. In arXiv preprint. arXiv.Google Scholar
- Vijay Erramilli and Nikolaos Laoutaris Jakub Mikians, Laszlo Gyarmati. 2012.Detecting price and search discrimination on the Internet. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI. ACM, New York, NY, USA. Google ScholarDigital Library
- Abhijit Narvekar Julianne Ayyad Jonathon Phillips, Fang Jiang and Alice J O'Toole. 2011. An other-race effect for face recognition algorithms. In ACM Transactions on Applied Perception (TAP), Vol. 8. ACM Press. Google ScholarDigital Library
- Johnnatan Messias Muhammad Bilal Zafar Saptarshi Ghosh Krishna P Gummadi Karrie Karahalios Juhi Kulshrestha, Motahhare Eslami. 2017. Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, New York, NY, USA, 417--432. Google ScholarDigital Library
- Surya Mattu Julia Angwin, Jeff Larson and Lauren Kirchner. 2016. Machine Bias. Retrieved August 24, 2017 from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
- Amirhossein Aleyasen Karrie Karahalios Kevin Hamilton, Motahhare Eslami and Christian Sandvig. 2015. I always assumed that I wasn't really that close to {her}: Reasoning about invisible algorithms in the news feed. In Proceedings of 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, New York, NY, USA. Google ScholarDigital Library
- Christian Sandvig Kevin Hamilton, Karrie Karahalios and Motahhare Eslami. 2014. A path to understanding the effects of algorithm awareness. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (CHI EA '14). ACM, New York, NY, USA, 631--642. Google ScholarDigital Library
- Alan Mislove Le Chen and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proceedings of 2015 ACM Conference. ACM, New York, NY, USA, 495--508. Google ScholarDigital Library
- Mei Ngan Mei Ngan and Patrick Grother. 2015. Face recognition vendor test (FRVT) performance of automated gender classification algorithms. Government Technical Report. US Department of Commerce, National Institute of Standards and Technology.Google Scholar
- Sorelle A. Friedler Tionney Nix Gabriel Rybeck Carlos Scheidegger Brandon Smith Philip Adler, Casey Falk and Suresh Venkatasubramanian. 2018. Auditing black-box models for indirect influence. Knowledge and Information Systems 54, 1 (January 2018). Google ScholarDigital Library
- S.A. Rizvi P.J. Phillips, Hyeonjoon Moon and P.J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. IEEE, 1090 -- 1104. Google ScholarDigital Library
- Ruchir Puri. 2018. Mitigating Bias in AI Models. Retrieved August 24, 2017 from https://www.ibm.com/blogs/research/2018/02/mitigating-bias-ai-models/Google Scholar
- Franco Turini Dino Pedreschi Riccardo Guidotti, Anna Monreale and Fosca Giannotti. 2018. A Survey of Methods for Explaining Black Box Models. Comput. Surveys 51, 5 (February 2018). Google ScholarDigital Library
- John Roach. 2018. Microsoft improves facial recognition technology to perform well across all skin tones, genders. Retrieved August 24, 2017 from https://blogs.microsoft.com/ai/gender-skin-tone-facial-recognition-improvement/Google Scholar
- Brent Mittelstadt Sandra Wachter and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology 31, 2 (2018).Google Scholar
- Brad Smith. 2018. Facial recognition technology: The need for public regulation and corporate responsibility.Google Scholar
- Jacob Snow. 2018. Amazon's Face Recognition Falsely Matched 28 Members of Congress With Mugshots. Retrieved August 24, 2017 from https://www.aclu.org/blog/privacy-technology/surveillance-technologies/ Amazons-face-recognition-falsely-matched-28Google Scholar
- Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel J. Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2015. Discovering Unwarranted Associations in Data-Driven Applications with the Fair Test Testing Toolkit. CoRR abs/1510.02377 (2015).Google Scholar
Index Terms
- Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products
Recommendations
Actionable Auditing Revisited: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products
Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits and continue to find it difficult to translate such independent ...
Ethical Artificial Intelligence in Telerehabilitation of Neurodevelopmental Disorders: A Position Paper
Computational Science and Its Applications – ICCSA 2023 WorkshopsAbstractNeurodevelopmental disorders are a cluster of mental disorders with neurobiological origins that occur during the development of children and lead to cognitive deficits with possible behavioral and emotional consequences. Intensive and ...
Exploring Why Underrepresented Students Are Less Likely to Study Machine Learning and Artificial Intelligence
ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1There is little research on why underrepresented minorities are less likely to specifically study Machine Learning and Artificial Intelligence (ML/AI). We surveyed 159 undergraduate students about their interest in, exposure to, and personal views on ML/...
Comments