Python client samples for model evaluations

Review and use sample Jupyter Notebooks that use the Python client library for model evaluations to demonstrate features and tasks.

When you use a sample notebook to demonstrate features and tasks with the Python client, you must be comfortable with coding in a Jupyter Notebook. A Jupyter Notebook is a web-based environment for interactive computing. You can run small pieces of code that process your data, and then immediately view the results of your computation. With sample Jupyter Notebooks, you can complete tutorials to demonstrate tasks such as building, training, and deploying models and configuring model evaluations.

Sample notebooks

View or run the following Jupyter notebooks to learn how to complete different tasks:

Sample name	Tasks demonstrated
Working with Watson Machine Learning	Train, create and deploy a German Credit Risk model, configure model evaluations to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the Insights dashboard.
Working with SPSS Collaboration and Deployment services	Log payload for a model that's deployed on a custom model serving engine.
Batch Processing: Apache Spark on Cloud Pak for Data with IBM Analytics Engine	Enable quality and drift monitoring and run on-demand evaluations with IBM Analytics Engine
Batch Processing: Remote Spark	Enable quality and drift monitoring and run on-demand evaluations with Remote Spark
OpenScale Model Risk Governance with OpenPages Integration on IBM Cloud Pak for Data	Integrate your model evaluations with IBM OpenPages and set up an end-to-end risk management solution.
OpenScale Model Risk Management on IBM Cloud Pak for Data	Set up a model risk management solution.
Indirect bias and active debiasing on IBM Cloud Pak for Data	Configure fairness evaluations to determine indirect bias.
Adversarial Robustness Metrics for image models	Use the Adversarial Robustness Toolkit (ART) to evaluate the robustness of image models.
Prompt template evaluation for RAG tasks with watsonx.governance	Create a prompt template asset for the RAG task and configure evaluations in watsonx.governance projects and spaces.
Design time notebook for Multi Lingual support of Generative AI Quality metrics for IBM WatsonX.governance	Demonstrate the generative AI quality prompt template evaluation results in Japanese.
Retrieval and answer quality metrics computation using LLM as Judge in IBM watsonx.governance for RAG task	Calculate RAG and answer quality metrics to generate responses for RAG tasks.

Next steps

To learn more about using notebook editors, see Notebooks.
To learn more about working with notebooks, see Coding and running notebooks.
To learn more about authenticating in a notebook, see Authenticating.