How to enable trustworthy AI with the right data fabric solution

Organizations are increasingly depending upon artificial intelligence (AI) and Machine Learning (ML) to assist humans in decision making. It’s how top organizations improve customer interactions and accelerate time-to-market for goods and services. But these organizations need to be able to trust their AI/ML models before they can be operationalized and used in crucial business processes. Trustworthy AI has become a requirement for the successful adoption of AI in the industry.

These days, if an AI model makes a biased, unfair decision involving the health, wealth or well-being of humans, an organization can hit the news for the wrong reasons. Alongside the significant brand reputation risk, there’s also a growing set of data and AI regulations across the world and across industries — like the upcoming European Union AI Act (link resides outside ibm.com) — that companies must adhere to.

Examine the following checklist for grading the trustworthiness of any AI model:

Fairness: Can you confirm that the machine learning model is not providing a systematic disadvantage to any individual group of people over another, based on factors like gender, orientation, age or ethnicity?
Explainability: Can you explain why the model made a certain decision? For instance, if someone applies for a loan, the bank should be able to clearly explain why that person was rejected or approved.
Privacy: Are the right rules and policies in place for various people to access the data at different stages of the AI lifecycle?
Robustness: Does the model behave consistently as conditions change? Is it scalable? How do you accommodate for drifting data patterns?
Transparency: Do you have all the facts relevant to the usage of the model? Are they captured throughout different stages of the lifecycle and readily available (much like a nutrition label)?

How a data fabric enables trustworthy AI

Before you can trust an AI model and its insights, you need to be able to trust the data that’s being used. The right data fabric solution will naturally support these pillars and help you build trustworthy AI models. Consider these three crucial steps in the lifecycle of building out your next AI or machine learning model or improving a current one.

1. Comprehensive, trusted data sets

First things first: you need access and insight into all relevant data.

Research shows that up to 68% of data is not analyzed in most organizations. But successful AI implementations require connection to high quality, accurate data that’s ready for self-service consumption by the right stakeholders. Without the ability to aggregate data from disparate internal and external sources (on-premises, public or private clouds), you’ll have an inferior AI model, simply because you don’t have all the information you need.

Second, you need to make sure that the data itself can be trusted. There are two factors in a trusted data set:

Do you have the right rules and policies for who can access and use data?
Do you understand bias that exists in the data, and do you have the right guardrails to use that data for building and training models?

2. Guardrails during model building, deployment, management and monitoring

According to Gartner, 53% of AI and ML projects are stuck in pre-production phases (link resides outside ibm.com). You can operationalize your AI (link resides outside ibm.com) by looking at all stages of the AI lifecycle. Automated, integrated data science tools help build, deploy, and monitor AI models. This approach helps ensure transparency and accountability at each stage of the model lifecycle. But to do so, you also need to ensure guardrails for fairness, robustness, fact collection and more, throughout each stage of the model life cycle.

Often data scientists aren’t thrilled with the prospect of generating all the documentation necessary to meet ethical and regulatory standards. This is where technology such as IBM FactSheets, can help by reducing the manual labor needed to capture metadata and other facts about a model across stages of the AI lifecycle. With AI governance solutions, a data scientist using standard, open Python libraries and frameworks can have facts about the model building and training automatically collected.

Similarly, facts can be collected while the model is in the testing and validation stages. All this information is incorporated into end-to-end workflows to ensure the team meets ethical and regulatory standards.

3. Processes that provide AI governance

In most organizations there are a number of data science tools, making it difficult to govern and manage information, let alone adhere to increasingly strict security, compliance and governance regulations. You can use automated, scalable AI governance to drive consistent, repeatable processes designed to increase model transparency and ensure both traceability and accountability. You can improve collaboration, compare model predictions, quantify model risk and optimize model performance, identify and mitigate bias, reduce risks like drift and decrease the need for model retraining.

Ultimately, data management and providing users access to the right data at the right time are at the core of successful AI and AI governance. A data fabric architecture helps you accomplish this by minimizing data integration complexities and simplifying data access across an organization to facilitate self-service data consumption. With IBM Cloud Pak® for Data, you can formalize a workflow that allows different teams to interact with your model at various stages. It’s not just about granting proper access to data science teams. Your model risk management team, IT operations team and line-of-business employees also need appropriate access.

You can also handle different data sets and sources, from training data to payload data to ground truth data, with the right levels of privacy and governance around them. Critically, you can automate the capture of metadata from each data set and model and keep it in a central catalog. Using IBM Cloud Pak for Data, you can do this at scale with consistency and apply it to models that have been built using open-source or third-party tools.

Better data-driven decision making with AI and AI governance

The potential advantage of AI is reflected in the strategy trends of industry leaders. By 2023, it’s estimated that 60% of enterprise intelligence initiatives will be business-specific (link resides outside ibm.com), shortening the data-to-decisions time frame by 30%, driving higher agility and resiliency. But to cement this data-driven trust with clients, it’s crucial that proper controls are in place across the AI lifecycle, especially when AI is used in critical situations.

Author

John J Thomas

Vice President & Distinguished Engineer, IBM Expert Labs

Download the MLOps and Trustworthy AI ebook