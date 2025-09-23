Another innovation by the IBM team is the “Evaluation Studio.” This feature provides two key capabilities:

Prompt optimization by comparing different versions of the prompts side by side and Experimentation tracking for Agents

Evaluation studio helps developers evaluate different versions of the prompt on a dataset and compare the results in an intuitive user interface. It also provides support for a unique custom ranking where users can come up with a custom ranking scheme by selecting metrics and assigning them weights based on importance. This helps users easily optimize a prompt which is to be used in a tool or agent.

watsonx.governance, evaluation studio also supports experiment tracking which is a powerful tool for building better agentic AI systems. You can quickly set up experiments, try different variants (of the agent) and tag them with details like the model, retriever or prompt you used. Side-by-side comparisons based on latency, cost and quality (such as faithfulness) make it easy to see what works best. Importantly, the platform helps you save the exact code for each run, freeing developers time from storing each version and letting them focus on building and improving the agent.