You may have heard of racial profiling in AI within law enforcement or unrestricted recommender systems working in the favor of pro-genocide propaganda machines in Myanmar. As newsworthy as these headlines are, we’ve noticed these massive issues with AI too late — there are millions more subtle impacts of AI going largely unnoticed from the public, waiting to turn into another disaster.
While the “move fast and break things,” mantra ushered key advancements in AI tech, it came at the expense of key processes that ensure correctness and ethical practices. For instance, it took several years before researchers stopped throwing the “black box” label on AI and devoted time investigating what was actually inside that box. In a technology so evasive and difficult to understand, it may be easy for Google to blame racist search results on “the algorithm.” Yet, AI engineers would understand that there are still key nuances of AI models that can help make informed decisions on fair use, including technical limitations or known data biases. Unfortunately, there is still lacking documentation, lacking transparency, and therefore lacking accountability.
Software engineering, on the other hand, has been around for decades. As a field, SWE has already matured techniques and best practices around documentation, testing, and monitoring. These practices ensure the resilience of software systems and applications— specifically, that end customers would not receive half-baked products that did not function as they intended. For accountability, software companies specified SLAs and provided terms and documentation that outlined use cases and expected results. The AI equivalents are still in their infancy if not totally ignored.
In the name of speed, evaluation criteria in AI research only knighted “state of the art” status based on singular statistical metrics. Prestigious challenges like ImageNet, Squad, and the Alexa challenge gamified AI research, rewarding teams which achieved the highest numerical benchmarks such as accuracy. The faster a research group reached 0.1 percentage points higher on a challenge, the more they could publish and win in million dollar prize pools. Absent from any of these challenges was a requirement to examine other key human aspects of their models, such as accuracy by race or other trade-offs. Unless we are more transparent with documentation and other vital evaluation criterion of AI, the occurrence of unintended consequences will only increase.
To promote an emerging documentation paradigm called Model Cards that offers one way to introduce transparency in AI, I created a prototype of AICards, a tool to help showcase and guide to help create, share, and discover model details in an easy and presentable manner. Although AICards alone won’t fix AI, it is a stepping stone in creating tools and processes to make this transparency and documentation easier for everyone.
Documentation, especially those that confront societal impacts, is hard to write. However, we need transparency so that fewer unintended consequences make it through and hurt democracy, undermine privacy, or perpetuate racial inequity.
The Problem with Model Cards
Model Cards was initially proposed by Margaret Mitchell and Timnit Gebru, former top research scientists at Google now controversially terminated in the past few months. The framework is simple: to label AI models in similar ways that nutrition facts provide crucial information about food. Instead of calories, researchers, and engineers could summarize population-based effects, ethical limitations, and critical quantitative general analyses for quick, widespread analysis. Optimally, every researcher and every model would include a section of their paper or code describing the human impacts of the models in simple language. The intention was that if engineers incorporated Model Cards into their work, they could better anticipate higher-order effects while opening the conversation on model impacts to targeted groups, end users, and other engineers.
In its inception three years ago, the paper sparked rapid internal corporate uptake including within Google teams. For instance, I started seeing lectures in my Stanford AI courses emphasizing the role of Model Cards. However, despite its favorable reception, Model Cards saw little use in the general AI community, and have largely remained an afterthought. For example, GPT-3’s 2020 release by OpenAI only had a Model Card in its Github repository four months after it was released. The majority of projects that have Model Cards have also largely missed the spirit of the framework: ethical limitations, stakeholders, or population-based effects are typically missing from the documentation — particularly in projects from large firms. Of the Model Cards that have been implemented, each one has a different hierarchy of information with different lengths. Therefore, comparing models from card to card was impossible.
For AI researchers and engineers, transforming abstract ethics questions and frameworks into actions leaves much to individual interpretation. Unless an AI practitioner is experienced with Model Cards, the learning curve on Model Cards can make them difficult to adopt. This is also likely why the Python required Model Card’s Toolkit released last year by Google hasn’t (yet) moved the needle on documentation norms. Albeit a clever documentation paradigm, a developer or researcher should not have to know Python, HTML, or be familiar with the nuances of Model Cards to give non-technical, ethics input on their models.
Past that, conflicting implementations of ethical principles and techniques without a shared set of norms make discussions and documentation on AI disorganized and incomparable, even if guided by a central framework. Therefore, although many AI practitioners have intentions to be careful about ethics, the ethical principles and frameworks set forth fail to reach the intended objective.
With these problems in mind, I’d like to introduce how AICards are meant to be used. I’ll start with the engineers and researchers who will create the AICard, then I’ll discuss how users would discover and find, and act on these AICards.
Card Creator Design
Outside of machine learning, AI, and ethics, general principles described by general ethics frameworks are most effectively operationalized through defined processes and deliberate guidance. For example, checklists and defined processes such as flowcharts have been useful in healthcare for complex treatment decisions and academic institutions for ethics review. Their effectiveness comes from assistance from checklists to distill complex high-level principles into foolproof rapid actions.
Applied to user interfaces, design patterns such as process funnels and stepped interactive wizards have been highly effective in directing users through an opinionated process flow to accomplish an end goal. Instead of exposing the entire checklist to a decision maker, UIs have more flexibility to direct and focus user attention and thought on a sequence of steps to nudge effective responses.
AICards leverages checklist principles in its UI for enhanced guidance around the Model Cards framework. The creation flow is designed with a carefully selected sequence of questions, and it provides examples to guide practitioners in generating useful documentation. Each step of the process trains researchers on what they should focus on when providing inputs. Below is a description and order of questions based on research and user interviews.
These questions were heavily based on the original questions from the Model Cards paper. We modified wording and broke down complex high level concepts into multiple cards based on initial user feedback. For example, instead of asking about “Factors” from the original Model Cards paper, we broke this concept down into “Group Impacts” and “Limitations,” adding examples and other prompting language to retain user focus and guide state of mind. At every step, there are examples of good responses that may be useful or helpful in cases where it may be unclear what should be documented or elaborated.
A key part of integrating AICards into workflows is ensuring that they are easily discoverable. This means that integrating with other information systems or catalogs will be crucial to getting the ethics documentation front and center. AICards was designed to abstract the complexity of portraying, viewing, and discovering the card to reduce friction for anyone who wanted to check out an AICard.
Specifically, AICards has several ways in which it generates content based on the card an engineer or researcher has created. The contents were designed to be functional to get information out to end users, as well as aesthetically pleasing so the engineer doesn’t need to focus on styling. To make it simpler to read, we render definitions of technical terms and related jargon on hover and provide a predictable information hierarchy across all cards to help with comparisons. By adding a predictable structure with ordered headings similar to those found in the AICard creation workflow, users were able to more quickly find relevant information on limitations much more quickly. The information hierarchy was derived from the Model Cards paper and initial user research, but over time these standards should change in response to expert input if AICards gains more adoption.
In developer communities, Github and its READMEs are the primary ways that projects get discovered and used. If anything, Github READMEs are usually the only thing that other users and developers read before integrating the particular model or code into their projects. Therefore, AICards includes a badged, abridged version of their card to display as highlighted documentation that links back to the full documentation. Moreover, since it is common for people to place documentation next to their code in repositories, AICards adopts the same convention by providing a well-known file format to include in the repository. As AICards matures, the badge can potentially serve as certification that a standard of ethical documentation has been reached.
The AICards project is a starting point for creating tools that make it easier for everyone to participate in AI discussions. More feedback is needed to better fit AICards into individual developer or corporate workflows to enhance usage and spread awareness of the framework. Although I think AICards has a place in helping with better ethics documentation for AI models, many other sources of bias (such as dataset bias) need to be documented, along with the development of the other best practices in testing that have been left behind in AI as a field.
More importantly, as a community, we need to figure out how to change norms within the AI community to focus more on documentation and related practices. For example, Stanford AI classes should mandate that final projects come with some form of model card, even if not using the AICards interface. Additionally, leaders in AI research need to start including ethics documentation and compliance in the definition of “state of the art.” In this transition, I hope that AICards can be helpful.
Ultimately, the AICards project contributes to increasing conversations targeted groups, users, AI practitioners, and more so that we can better anticipate unintended consequences and be accountable for adverse effects of AI.
AICards is open-source software, written in typescript, react, and simple stuff. It is on Github. If you want to work on this with me, do contact me at email@example.com — let’s start a conversation.
This project would not be possible without the following amazing people: Matthew Jörke, Dr. James Landay, Jasmine Sun, and Jessica Dai. Thanks for the encouragement, mentorship, and ideas I could never come up with myself.
- AICards Website: https://algocards.netlify.app/
- AICards Paper: https://docs.google.com/document/d/1CKgBDmwjhOSkJuvtOnTkhEU-1qtcT2dJPqcfUAgxGQ8/edit?usp=sharing
- AICards Repo: https://github.com/natel9178/algocards