watsonx.ai Runtime service plans
You use watsonx.ai Runtime resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use watsonx.ai Runtime resources, measured by tokens consumed or at an hourly rate, when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.
watsonx.ai Runtime in Cloud Pak for Data as a Service and watsonx
The watsonx.ai Runtime plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.
For more information on watsonx.ai, see:
- Overview of IBM watsonx.ai
- Comparison of IBM watsonx and Cloud Pak for Data as a Service
- Signing up for IBM watsonx.ai
If you are using both watsonx and Cloud Pak for Data as a Service in your account, you can switch between the two platforms.
Choosing a watsonx.ai Runtime plan
watsonx.ai Runtime plans govern how you are billed for models you train and deploy with watsonx.ai Runtime and for prompts you use with foundation models. You must select a plan for each watsonx.ai Runtime instance you create. Choose a plan based on your needs:
-
Lite is a free plan with limited capacity. Choose this plan if you are evaluating watsonx.ai Runtime and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
-
Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
-
Standard is a high-capacity enterprise plan that is designed to support all of an organization's AI needs. The Standard plan incurs a monthly instance fee of $1,050 USD per month that includes a block of 2500 Capacity unit hours (CUH). Any CUH usage above this amount and all other usage is metered on a pay-as-you-go basis at the Standard plan rate.
Note:The instance fee for the watsonx.ai Runtime Standard plan is billed regardless of CUH usage. For example, if you only consume resource units, you are still charged the instance fee. The fee is pro-rated if the plan is canceled. -
HIPAA-Ready is an enterprise plan that is designed to provide generative AI capabilities and machine learning assets that comply with Health Insurance Portability and Accountability Act of 1996 (HIPAA) security and privacy rule requirements. The HIPAA-Ready plan incurs a monthly instance fee of $1,800 USD per month. All usage is metered on a pay-as-you-go basis at the same rate as the Standard Plan.
Note:The instance fee for the watsonx.ai Runtime HIPAA plan is billed regardless of CUH usage. For example, if you only consume resource units, you are still charged the instance fee. The fee is pro-rated if the plan is canceled.
For plan details and pricing, see IBM Cloud catalog: watsonx.ai Runtime.
How resource consumption is tracked
For metering and billing purposes, machine learning models and deployments or foundation models are measured with these charge metrics:
-
Capacity Unit Hour (CUH) measures compute resource consumption per unit hour for usage and billing purposes. CUH measures all watsonx.ai Runtime activity except for foundation model inferencing.
-
Resource Unit (RU) measures foundation model inference consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt. For details on tokens, see Tokens and tokenization.
-
Hour rate is used to calculate charges for custom foundation models that you import into watsonx.ai and deploy. The rate is based on configuration size and is charged for the duration of the model deployment.
-
Page rate is used to calculate charges for document text classification and extraction. The page rate is set by plan.
What is measured for resource consumption?
Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, Bash scripts, and custom model deployments consume resources for as long as they are active.
watsonx.ai Runtime plan details
The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.
| Plan features | Lite | Essentials | Standard |
|---|---|---|---|
| watsonx.ai Runtime usage in CUH | 20 CUH per month | CUH billing based on CUH rate multiplied by hours of consumption | 2500 CUH per month |
| Foundation model inferencing in tokens or Resource Units (RU) | 300,000 tokens per month | Billed for usage (1000 tokens = 1 RU) | Billed for usage (1000 tokens = 1 RU) |
| Max parallel Decision Optimization batch jobs per deployment | 2 | 5 | 100 |
| Deployment jobs retained per space | 100 | 1000 | 3000 |
| Deployment time to idle | 1 day | 3 days | 3 days |
| Rate limit per plan ID | 2 inference requests per second | 8 inference requests per second | 8 inference requests per second |
| Support for custom foundation models | Not available | Not available | Billed hourly by configuration |
| Document text classification and extraction | 100 pages per month | Billed per page | Billed per page |
| Foundation model tuning | Not available | Tuning billed at 43 CUH per hour Inferencing billed for token usage |
Tuning billed at 43 CUH per hour Inferencing billed for token usage |
watsonx.ai Runtime HIPAA-Ready plan details
With the watsonx.ai Runtime HIPAA-Ready plan, IBM introduces controls to meet security and privacy rule requirements that are commensurate with the Health Insurance Portability and Accountability Act of 1996. These requirements include the appropriate administrative, physical, and technical safeguards that are required of Business Associates in 45 CFR Part 160 and Subparts A and C of Part 164.
The watsonx.ai Runtime HIPAA-Ready plan is not available in your IBM Cloud account by default.
To sign up for the HIPAA-Ready plan, IBM requires that you agree to the terms of the Business Associate Addendum (BAA) agreement with IBM for your IBM Cloud account. The BAA outlines IBM's responsibilities, but also your responsibilities to maintain HIPAA compliance.
To get access to the HIPAA-Ready plan in the IBM Cloud catalog:
- Log in to your IBM Cloud account in the Dallas (US South) region.
- Click Manage > Account and then Account settings.
- Copy and save the IBM Cloud account ID.
- Contact IBM Sales and provide your account ID to an associate to enable the HIPAA-Ready plan in the IBM Cloud catalog in your account.
- You will be notified after the plan is enabled in your account. You can then access the HIPAA-Ready plan from the IBM Cloud catalog.
After you have the HIPAA-Ready plan set up in your account, you can then proceed with setting up the watsonx platform for your organization. For details, see Setting up the IBM watsonx platform for administrators.
HIPAA-ready plan usage guidelines
The following capabilities and restrictions apply when you work with a watsonx.ai Runtime instance created under the HIPAA-Ready plan:
- Service instance
-
- You must create a new watsonx.ai Runtime instance created under the HIPAA-Ready plan to use HIPAA-compliant features in your account.
- You can use the HIPAA-Ready watsonx.ai Runtime instance with a watsonx.ai Studio instance created under the Professional plan only.
- You cannot downgrade a watsonx.ai Runtime instance from the HIPAA-Ready plan to any other plan. If you want to stop using the HIPAA-Ready plan, you must delete the watsonx.ai Runtime instance.
- You cannot upgrade an existing watsonx.ai Runtime instance provisioned under a different plan to the HIPAA-Ready plan.
- For other services, you must determine if the service is HIPAA-compliant and whether you need to reprovision the service to use with the HIPAA-Ready watsonx.ai Runtime instance.
- Projects and spaces
-
- You must create a new blank project or space to associate with the HIPAA-Ready watsonx.ai Runtime instance. You should not import an existing project or space.
- A HIPAA-compliant project or space created in the watsonx platform cannot be used in the Cloud Pak for Data as a Service platform if you are using both platforms in your account.
- You cannot use HIPAA-compliant projects and spaces with projects and spaces that are not compliant with HIPAA. You cannot promote project components that are not compliant with HIPAA into HIPAA-compliant spaces.
- You cannot remove the HIPAA-Ready watsonx.ai Runtime instance from a project or space.
- You can replace an existing HIPAA-Ready watsonx.ai Runtime instance in a project or space with another instance created under the HIPAA-ready plan only.
When you delete a HIPAA-Ready watsonx.ai Runtime instance, all data in the instance will be deleted by IBM, in compliance with the standard HIPAA process. You cannot export data from a HIPAA-compliant project. Contact IBM Support to export your data and assets as a zip file before you delete the instance.
Plan features
The HIPAA-Ready plan provides the following features and capacity limits:
- Inferencing foundation model and prompt template deployments. Inference is billed based on token usage or Resource Units (RU). 1 RU is equal to 1000 tokens.
- 8 inference requests per second
- Text embeddings and text reranking APIs
The following features and tools are not supported with the HIPAA-Ready plan:
- Foundation model tuning
- AutoAI
- Batch deployments
- Online deployments
- Decision Optimization
- SPSS Modeler
- RStudio
- Synthetic Data Generator
- AI services
- Agent templates and Agent Lab
- Custom foundation models
- Deploy on demand foundation models
- AI guardrails with the text generation API
- Text extraction and classification APIs
- Vector index
- SPSS Modeler
- Data Refinery flows
Learn more
- Billing details for machine learning assets
- Billing details for generative AI assets
- For the list of supported foundation models and their prices, see Supported foundation models.
- For the list of supported encoder models and their prices, see Supported encoder models.
- For more information on tracking computing resource allocation and consumption, see Runtime usage.
- IBM Cloud catalog: watsonx.ai Runtime