research-article

Model Cards for Model Reporting

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and TransparencyJanuary 2019Pages 220–229https://doi.org/10.1145/3287560.3287596

Published:29 January 2019Publication History

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Pages 220–229

ABSTRACT

Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type [15]) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related artificial intelligence technology, increasing transparency into how well artificial intelligence technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.

References

Avrio AI. 2018. Avrio AI: AI Talent Platform. (2018). https:/www.goavrio.com/Google Scholar
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. (2016). https:/www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
Emily M. Bender and Batya Friedman. 2018. "Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science". Transactions of the ACL (TACL) (2018).Google Scholar
Joy Buolamwini. 2016. How I'm fighting Bias in Algorithms. (2016). https:/www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms#t-63664Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research), Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, New York, NY, USA, 77--91. http://proceedings.mlr.press/v81/buolamwini18a.htmlGoogle Scholar
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google Scholar
Federal Trade Commission. 2016. Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues. (2016). https:/www.ftc.gov/reports/big-data-tool-inclusion-or-exclusion-understanding-issues-ftc-reportGoogle Scholar
Kimberle Crenshaw. 1989. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. U. Chi. Legal F. (1989), 139.Google Scholar
Black Desi. 2009. HP computers are racist. (2009). https:/www.youtube.com/watch?v=t4DT3tQqgRMGoogle Scholar
William Dieterich, Christina Mendoza, and Tim Brennan. 2016. COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity. (2016). https:/www.documentcloud.org/documents/2998391-ProPublica-Commentary-Final-070616.htmlGoogle Scholar
Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (2018).Google ScholarDigital Library
Cynthia Dwork. 2008. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation, Manindra Agrawal, Dingzhu Du, Zhenhua Duan, and Angsheng Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--19. Google ScholarDigital Library
Entelo. 2018. Recruitment Software | Entelo. (2018). https:/www.entelo.com/Google Scholar
Daniel Faggella. 2018. Follow the Data: Deep Learning Leads the Transformation of Enterprise - A Conversation with Naveen Rao. (2018).Google Scholar
Thomas B Fitzpatrick. 1988. The validity and practicality of sun-reactive skin types I through VI. Archives of dermatology 124, 6 (1988), 869--871.Google Scholar
Food and Drug Administration. 1989. Guidance for the Study of Drugs Likely to Be Used in the Elderly. (1989).Google Scholar
U.S. Food and Drug Administration. 2013. FDA Drug Safety Communication: Risk of next-morning impairment after use of insomnia drugs; FDA requires lower recommended doses for certain drugs containing Zolpidem (Ambien, Ambien CR, Edluar, and Zolpimist). (2013). https://web.archive.org/web/20170428150213/ https:/www.fda.gov/drugs/drugsafety/ucm352085.htmGoogle Scholar
IIHS (Insurance Institute for Highway Safety: Highway Loss Data Institute). 2003. Special Issue: Side Impact Crashworthiness. Status Report 38, 7 (2003).Google Scholar
Institute for the Future, Omidyar Network's Tech, and Society Solutions Lab. 2018. Ethical OS. (2018). https://ethicalos.org/Google Scholar
Clare Garvie, Alvaro Bedoya, and Jonathan Frankle. 2016. The Perpetual Line-Up. (2016). https:/www.perpetuallineup.org/Google Scholar
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. CoRR abs/1803.09010 (2018). http://arxiv.org/abs/1803.09010Google Scholar
Google. 2018. Responsible AI Practices. (2018). https://ai.google/education/responsible-ai-practicesGoogle Scholar
Gooru. 2018. Navigator for Teachers. (2018). http://gooru.org/about/teachersGoogle Scholar
Cyril Goutte and Eric Gaussier. 2005. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In European Conference on Information Retrieval. Springer, 345--359. Google ScholarDigital Library
Collins GS, Reitsma JB, Altman DG, and Moons KM. 2015. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): The tripod statement. Annals of Internal Medicine 162, 1 (2015), 55--63.Google ScholarCross Ref
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3315--3323. http://papers.nips.cc/paper/6374-equality-of-opportunity-in-supervised-learning.pdf Google ScholarDigital Library
Michael Hind, Sameep Mehta, Aleksandra Mojsilovic, Ravi Nair, Karthikeyan Natesan Ramamurthy, Alexandra Olteanu, and Kush R. Varshney. 2018. Increasing Trust in AI Services through Supplier's Declarations of Conformity. CoRR abs/1808.07261 (2018).Google Scholar
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. CoRR abs/1805.03677 (2018). http://arxiv.org/abs/1805.03677Google Scholar
Ideal. 2018. AI For Recruiting Software | Talent Intelligence for High-Volume Hiring. (2018). https://ideal.com/Google Scholar
DrivenData Inc. 2018. An Ethics Checklist for Data Scientists. (2018). http://deon.drivendata.org/Google Scholar
Jigsaw. 2017. Conversation AI Research. (2017). https://conversationai.github.io/Google Scholar
Jigsaw. 2017. Perspective API. (2017). https:/www.perspectiveapi.com/Google Scholar
B. Kim, Wattenberg M., J. Gilmer, Cai C., Wexler J., F. Viegas, and R. Sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). ICML (2018).Google Scholar
Brendan F. Klare, Mark J. Burge, Joshua C. Klontz, Richard W. Vorder Bruegge, and Anil K. Jain. 2012. Face recognition performance: Role of demographic information. IEEE Transactions on Information Forensics and Security 7, 6 (2012), 1789--1801. Google ScholarDigital Library
Der-Chiang Li, Susan C Hu, Liang-Sian Lin, and Chun-Wu Yeh. 2017. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PloS one 12, 8 (2017), e0181853. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, FAT* '19, January 29-31, 2019, Atlanta, CA, USA Timnit GebruGoogle ScholarCross Ref
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV). Google ScholarDigital Library
Shira Mitchell, Eric Potash, and Solon Barocas. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 (2018).Google Scholar
Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. 2018. Did the Model Understand the Question? Proceedings of the Association for Computational Linguistics (2018).Google ScholarCross Ref
AI Now. 2018. Litigating Algorithms: Challenging Government Use Of Algorithmic Decision Systems. AI Now Institute.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318. Google ScholarDigital Library
Inioluwa Raji. 2018. Black Panther Face Scorecard: Wakandans Under the Coded Gaze of AI. (2018).Google Scholar
Microsoft Research. 2018. Project InnerEye - Medical Imaging AI to Empower Clinicians. (2018). https:/www.microsoft.com/en-us/research/project/medical-image-analysis/Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. PMLR, Sydney, Australia.Google Scholar
Digital Reasoning Systems. 2018. AI-Enabled Cancer Software | Healthcare AI: Digital Reasoning. (2018). https://digitalreasoning.com/solutions/healthcare/Google Scholar
Turnitin. 2018. Revision Assistant. (2018). http://turnitin.com/en_us/what-we-offer/revision-assistantGoogle Scholar
Shannon Vallor, Brian Green, and Irina Raicu. 2018. Ethics in Technology Practice: An Overview. (22 6 2018). https:/www.scu.edu/ethics-in-technology-practice/overview-of-ethics-in-tech-practice/Google Scholar
Lucy Vasserman, John Li, CJ Adams, and Lucas Dixon. 2018. Unintended bias and names of frequently targeted groups. Medium (2018). https://medium.com/the-false-positive/unintended-bias-and-names-of-frequently-targeted-groups-8e0b81f80a23Google Scholar
Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. (2018).Google Scholar
Joz Wang. 2010. Flickr Image. (2010). https:/www.flickr.com/photos/jozjozjoz/3529106844Google Scholar
Amy Westervelt. 2018. The medical research gender gap: how excluding women from clinical trials is hurting our health. (2018).Google Scholar
Mingyuan Zhou, Haiting Lin, S Susan Young, and Jingyi Yu. 2018. Hybrid sensing face detection and registration for low-light and unconstrained conditions. Applied optics 57, 1 (2018), 69--78.Google Scholar

Index Terms

Model Cards for Model Reporting

Recommendations

Understanding Implementation Challenges in Machine Learning Documentation
EAAMO '22: Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization

The lack of transparency in machine learning (ML) systems makes it difficult to identify sources of potential risks and harms. In recent years, various organizations have proposed standardized frameworks and processes for documentation for ML systems. ...
Read More
Reward Reports for Reinforcement Learning
AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

Building systems that are good for society in the face of complex societal effects requires a dynamic approach. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these ...
Read More
Towards a Semantic Approach for Linked Dataspace, Model and Data Cards
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

The vast majority of artificial intelligence practitioners overlook the importance of documentation when building and publishing models and datasets. However, due to the recent trend in the explainability and fairness of AI models, several frameworks ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
January 2019
388 pages
ISBN:9781450361255
DOI:10.1145/3287560

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ML model evaluation
datasheets
disaggregated evaluation
documentation
ethical considerations
fairness evaluation
model cards
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Upcoming Conference

FAccT '24

The 2024 ACM Conference on Fairness, Accountability, and Transparency

June 3 - 6, 2024

Rio de Janeiro , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 746
  Total Citations
  View Citations
- 9,600
  Total Downloads
- Downloads (Last 12 months)2,279
- Downloads (Last 6 weeks)407
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Model Cards for Model Reporting

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding Implementation Challenges in Machine Learning Documentation

Reward Reports for Reinforcement Learning

Towards a Semantic Approach for Linked Dataspace, Model and Data Cards

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Model Cards for Model Reporting

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding Implementation Challenges in Machine Learning Documentation

Reward Reports for Reinforcement Learning

Towards a Semantic Approach for Linked Dataspace, Model and Data Cards

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media