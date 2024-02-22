As a leader in trustworthy artificial intelligence, IBM has experience in developing governance frameworks that guide responsible use of AI in alignment with client organizations’ values. IBM also has its own frameworks for use of AI within IBM itself, informing policy positions such as the use of facial recognition technology.

AI tools are now utilized in national security and to help protect against data breaches and cyberattacks. But AI also supports other strategic goals of the DoD. It can augment the workforce, helping to make them more effective, and help them reskill. It can help create resilient supply chains to support soldiers, sailors, airmen and marines in roles of warfighting, humanitarian aid, peacekeeping and disaster relief.

The CDAO includes five ethical principles of responsible, equitable, traceable, reliable, and governable as part of its responsible AI toolkit. Based on the US military’s existing ethics framework, these principles are grounded in the military’s values and help uphold its commitment to responsible AI.

There must be a concerted effort to make these principles a reality through consideration of the functional and non-functional requirements in the models and the governance systems around those models. Below, we provide broad recommendations for the operationalization of the CDAO’s ethical principles.

1. Responsible

“DoD personnel will exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.”

Everyone agrees that AI models should be developed by personnel that are careful and considerate, but how can organizations nurture people to do this work? We recommend:

Fostering an organizational culture that recognizes the sociotechnical nature of AI challenges. This must be communicated from the outset, and there must be a recognition of the practices, skill sets and thoughtfulness that need to be put into models and their management to monitor performance.

Detailing ethics practices throughout the AI lifecycle, corresponding to business (or mission) goals, data preparation and modeling, evaluation and deployment. The CRISP-DM model is useful here. IBM’s Scaled Data Science Method, an extension of CRISP-DM, offers governance across the AI model lifecycle informed by collaborative input from data scientists, industrial-organizational psychologists, designers, communication specialists and others. The method merges best practices in data science, project management, design frameworks and AI governance. Teams can easily see and understand the requirements at each stage of the lifecycle, including documentation, who they need to talk to or collaborate with, and next steps.

Providing interpretable AI model metadata (for example, as factsheets) specifying accountable persons, performance benchmarks (compared to human), data and methods used, audit records (date and by whom), and audit purpose and results.

Note: These measures of responsibility must be interpretable by AI non-experts (without “mathsplaining”).

2. Equitable

“The Department will take deliberate steps to minimize unintended bias in AI capabilities.”

Everyone agrees that use of AI models should be fair and not discriminate, but how does this happen in practice? We recommend:

Establishing a center of excellence to give diverse, multidisciplinary teams a community for applied training to identify potential disparate impact.

Using auditing tools to reflect the bias exhibited in models. If the reflection aligns with the values of the organization, transparency surrounding the chosen data and methods is key. If the reflection does not align with organizational values, then this is a signal that something must change. Discovering and mitigating potential disparate impact caused by bias involves far more than examining the data the model was trained on. Organizations must also examine people and processes involved. For example, have appropriate and inappropriate uses of the model been clearly communicated?

Measuring fairness and making equity standards actionable by providing functional and non-functional requirements for varying levels of service.

Using design thinking frameworks to assess unintended effects of AI models, determine the rights of the end users and operationalize principles. It’s essential that design thinking exercises include people with widely varied lived experiences—the more diverse the better.

3. Traceable

“The Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.”

Operationalize traceability by providing clear guidelines to all personnel using AI:

Always make clear to users when they are interfacing with an AI system.

Provide content grounding for AI models. Empower domain experts to curate and maintain trusted sources of data used to train models. Model output is based on the data it was trained on.

IBM and its partners can provide AI solutions with comprehensive, auditable content grounding imperative to high-risk use cases.

Capture key metadata to render AI models transparent and keep track of model inventory. Make sure that this metadata is interpretable and that the right information is exposed to the appropriate personnel. Data interpretation takes practice and is an interdisciplinary effort. At IBM, our Design for AI group aims to educate employees on the critical role of data in AI (among other fundamentals) and donates frameworks to the open-source community.

Make this metadata easily findable by people (ultimately at the source of output).

Include human-in-the-loop as AI should augment and assist humans. This allows humans to provide feedback as AI systems operate.

Create processes and frameworks to assess disparate impact and safety risks well before the model is deployed or procured. Designate accountable people to mitigate these risks.

4. Reliable

“The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life cycles.”

Organizations must document well-defined use cases and then test for compliance. Operationalizing and scaling this process requires strong cultural alignment so practitioners adhere to the highest standards even without constant direct oversight. Best practices include:

Establishing communities that constantly reaffirm why fair, reliable outputs are essential. Many practitioners earnestly believe that simply by having the best intentions, there can be no disparate impact. This is misguided. Applied training by highly engaged community leaders who make people feel heard and included is critical.

Building reliability testing rationales around the guidelines and standards for data used in model training. The best way to make this real is to offer examples of what can happen when this scrutiny is lacking.

Limit user access to model development, but gather diverse perspectives at the onset of a project to mitigate introducing bias.

Perform privacy and security checks along the entire AI lifecycle.

Include measures of accuracy in regularly scheduled audits. Be unequivocally forthright about how model performance compares to a human being. If the model fails to provide an accurate result, detail who is accountable for that model and what recourse users have. (This should all be baked into the interpretable, findable metadata).

5. Governable

“The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.”

Operationalization of this principle requires: