IBM named a leader in the 2025 IDC Marketscape Worldwide GenAI Evaluation Technology Products 2025 Vendor Assessment

Four people in an office, standing on either side of desk, discussing ideas

Authors

Maryam Ashoori

VP of Product and Engineering, watsonx.governance

IBM

Manish Bhide

Distinguished Engineer and CTO, watsonx.governance

IBM

Sahiba Pahwa

Product Marketing, watsonx.governance

IBM

IBM has been named a Leader in the IDC MarketScape: Worldwide GenAI Evaluation Technology Products 2025 Vendor Assessment.

We believe this recognition reflects the growing impact and continuous innovation of IBM watsonx.governance—and IBM’s commitment to meeting modern demands for trusted, scalable and responsible AI.

“Enterprises that have a diverse technology environment may find that IBM represents a neutral supplier—it’s not tied to a particular cloud service, for instance. Also, enterprises that value the broader set of adjacent IBM offerings, including the automated documentation, guardrails, and security offerings, should consider IBM,” says the IDC MarketScape report.

IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of technology and suppliers in a given market. The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each supplier’s position within a given market. The Capabilities score measures supplier product, go-to-market and business execution in the short-term. The Strategy score measures alignment of supplier strategies with customer requirements in a 3-5-year timeframe. Supplier market share is represented by the size of the icons

The 5 categories of IDC MarketScape methodology 

IDC MarketScape’s rigorous evaluation framework provides an objective, third-party assessment that organizations can trust when making gen AI model evaluation technology decisions.

The framework considers the following 5 categories:

  1. Customer satisfaction: Direct interviews with customers provided real-world insights into implementation success, ROI and ongoing support, not just on current offerings but on their vision and roadmap for addressing emerging risk challenges.
  2. Functionality or offering: The report assesses neutrality, like ease of use regardless of tools used to build them or where they run, or customization in terms of dashboard and metrics.
  3. Technological Innovation: Vendors were assessed on differentiated capability or offering that deliver notable value to customer.
  4. Range of services: The report considered the breath of capabilities from RAG evaluation, with special attention to the ability to evaluate agents.
  5. Portfolio: Special attention was given to number of adjacent offerings to model tuning such as production monitoring, model governance and model security and whether the evaluation tools seamless integrated across the lifecycle from development to production.

IBM watsonx.governance strengths 

What we believe are IBM’s strengths:

1. Streamlined model development and governance

IBM watsonx.governance offers a unified approach to managing the entire AI lifecycle, from development to deployment. With a guided questionnaire, users can define business problems, help discover and identify the potential risks and uncover mitigation strategies.

These risk dimensions are mapped into metrics that can be used during evaluation process. Moreover, this integrated process automatically extracts metadata during the evaluation process, storing it in a centralized fact sheet and provides a transparent record of the application development process, including information on the model, prompt templates and more. By incorporating evaluation technology into this tightly integrated lifecycle, which includes documentation, we differentiate ourselves as an ideal solution for enterprise users.

2. Automatic identification of risk at run time with dynamic dashboards

By integrating risk data, risk and control assessments, internal and external loss events and key risk indicators or metrics, teams can gain a comprehensive view of their risk posture across the enterprise. This can help enterprises automatically identify risks as they arise, in real-time. Additionally, IBM watsonx.governance provides an automatic risk rating, giving risk teams a clear and objective assessment of the risk level. Dynamic dashboards and charts facilitate swift identification, measurement, monitoring and analysis, while automated alerts enable prompt remediation when risk thresholds are breached.

3. Out of the box metrics

With IBM watsonx.governance, users have access to a wide range of pre-built metrics for evaluating AI system performance and effectiveness. These include metrics for drift identification, model performance and other key areas:

  • HAP
  • PII
  • Prompt injection
  • Context relevance
  • Faithfulness
  • Answer similarity
  • Answer relevance
  • Hit rate
  • Average precision
  • Reciprocal rank
  • Unsuccessful requests
  • And many more

These metrics, among others, provide a comprehensive framework for evaluating AI system performance and effectiveness. Additionally, users can create custom metrics to tailor their evaluations to specific business requirements and risk profiles, providing a comprehensive evaluation framework.

4. Efficient and agile agent optimization and experimentation

Another innovation by the IBM team is the “Evaluation Studio.” This feature provides two key capabilities:

  1. Prompt optimization by comparing different versions of the prompts side by side and
  2. Experimentation tracking for Agents

Evaluation studio helps developers evaluate different versions of the prompt on a dataset and compare the results in an intuitive user interface.  It also provides support for a unique custom ranking where users can come up with a custom ranking scheme by selecting metrics and assigning them weights based on importance.  This helps users easily optimize a prompt which is to be used in a tool or agent. 

Watsonx.governance, evaluation studio also supports experiment tracking which is a powerful tool for building better agentic AI systems. You can quickly set up experiments, try different variants (of the agent) and tag them with details like the model, retriever or prompt you used. Side-by-side comparisons based on latency, cost and quality (such as faithfulness) make it easy to see what works best. Importantly, the platform helps you save the exact code for each run, freeing developers time from storing each version and letting them focus on building and improving the agent.

5. In-the-loop evaluators: A key market differentiator

The IBM watsonx.governance solution supports out of the box, decorator based, In-the-loop evaluators which sets a new standard for agent governance, providing customers with the ability to evaluate metrics and use them to decide the Agent execution flow. IBM watsonx.governance also supports offline agent evaluation via agent evaluators which help evaluate AI Agents on test data as they are built. Key features include:

  • In-the-loop evaluators: The in-the-loop evaluators can be used to compute a metric whenever a tool or node in a LangGraph Agent is executed. This can be used to compute diverse metrics such as context relevance, faithfulness, tool calling hallucination and more. The agent execution flow can be adjusted based on the value of the computed metrics. For example, if in an Agentic RAG application, the context relevance is low, there is no point in generating an answer using the fetched context.  Hence the agent flow can be altered based on the computed context relevance value to not go to the answer generation node, but to directly respond back to the user.
  • Ease of use: The typical way to use in-the-loop evaluators is to add custom code as a node in the LangGraph application. The in-the-loop evals from watsonx.governance are available as python decorators which make it very easy to use when building an Agentic AI Application.

This innovative tool offers unparalleled visibility and control over agent performance, enabling customers to optimize their workflows and drive better outcomes.

Supporting the evolving needs of AI/ML Ops teams: The future roadmap

To further empower AI/ML Ops teams, IBM is committed to driving innovation with a pipeline of new features. In the upcoming releases, you will experience additional agentic governance features, such as:

  1. Advanced production monitoring for agentic AI: IBM’s watsonx.governance will be equipped to offer continuous oversight of agentic applications, initiating alerts when any of the specified metrics exceed their predefined limits. This feature ensures proactive management and timely intervention for maintaining optimal and trusted AI performance.
  2. Governed Agentic Catalog: This will allow users to add governance to the process of adding tools and agents to a central catalog. This will help enterprises ensure that only trusted tools and agents are made available to their developers.

Governance is no longer a barrier defined by compliance and audit. It’s now an enabler for scale, empowering teams to build gen AI systems that are robust, transparent and ready for enterprise deployment. Governance is about building AI agents, applications and models that are efficient, safe and trustworthy from the ground up.

As gen AI continues to evolve, watsonx.governance enables teams to move fast with confidence, transparency and control. Our approach to evaluation focuses on real-time risk management, automated experiment management and tracking and transparency at every stage. Built with real-world complexity in mind, watsonx.governance helps teams scale responsibly, reduce risk and unlock the full potential of gen AI without slowing you down.

Download the excerpt

Learn more about IBM watsonx.governance

Try product for free