Creating a REST proxy for watsonx.ai with the AI Gateway

Create a REST proxy that uses the AI Gateway to control access to AI models.

Before you begin

Complete the tasks in Prerequisites for using the AI Gateway to ensure that you can authenticate with IBM Cloud and access watsonx.ai.

About this task

The AI Gateway is only supported with OpenAPI 3.0 APIs.

Procedure

  1. In the navigation pane, click Develop icon.

  2. On the Develop page, click Add > API.

  3. On the Select API type page, click the OpenAPI 3.0 tab.

  4. In the Create section, select AI gateway, then click Next.

  5. On the Create API from AI gateway page, use the Platform field to select watsonx as the AI service that the new API will use.

  6. Use the "Info" section to provide basic information about the API, and then click Next:
    • The OpenAPI version is based upon the step 3 selection where you clicked on OpenAPI 3.0.

    • Title: The Title can include special characters but should be kept short so that it can be easily displayed in the user interface.

    • The Name is filled in for you based on the title. The value is a single string that is used to identify the API in developer toolkit CLI commands.

    • Version: Accept the default value or modify it as needed. The version corresponds to the value of the info.version property of the OpenAPI definition. The version.release.modification version numbering scheme is recommended; for example 1.0.0.

    • Base path: Accept the default value or modify it as needed. The API's "base path" is the server URL, which is used to determine the full URL endpoint for the calling the API, taking into account any vanity endpoint configuration in the catalog in which the API is published. For an API that is enforced by the DataPower API Gateway, you only need to provide the base path value. In addition:
      • Do not include the host name or any additional segments for paths or operations
      • Do not include special characters
      • Begin the URL with a forward slash ( / ) even if it is otherwise empty

    • Description: The optional description helps to identify the API.
  7. Provide information for authenticating with watsonx.ai, and then click Next:
    • API key: The default value is $(watsonx-ai-apikey).
      Note: Do not enter the actual API key value in this field. First, create a catalog property named watsonx-ai-apikey and set its value to the API key required for accessing watsonx.ai services. For information on creating catalog specific property values, see Defining Catalog specific property values.

      The API key allows the API to authenticate with IBM Cloud Authentication, and is required for access to the watsonx.ai service.

    • Region: Select the IBM Cloud region where watsonx.ai requests are sent.

    • Operation version: Accept the default value or provide the appropriate Active Version Date of the watsonx.ai API, using the format yyyy-mm-dd.

      To see the available version dates, refer to Active Version Dates in the watsonx.ai API documentation. For information on API versioning in watsonx.ai, see Versioning in the same documentation.

  8. Provide information about how the API accesses watsonx.ai to submit requests:
    • Project ID: Provide either the ID of the watsonx.ai project used for resolving API requests, or a $(name of a catalog property) whose value is the ID of the watsonx.ai project.

    • Exposed paths: Accept the default list of exposed paths, or select only the paths that you want your API to access. The exposed paths define which watsonx.ai operations are included in the generated API.

    • Enable response caching: Response caching is enabled by default to optimize API performance; however, you can disable it if needed for your API. If you use response caching, you can specify the duration of the cache in the Cache TTL in seconds field.

      When response caching is enabled, then when a request is sent to the watsonx.ai service, the response cache is inspected to determine if the request payload has an associated cached response. If so, that response and its associated HTTP response headers are placed into the DataPower API Gateway context message (which by default is named message). The Output Message property in the policy UI can be modified after the API has been generated if a different message is needed.

      If there is no cached response, the request is passed to the watsonx.ai service, and the response is cached for subsequent operations using the time-to-live specified in the Cache TTL in seconds property.

      Cache TTL in seconds: If you enable response caching, configure the duration of the cache by accepting the default value or selecting (or typing) a new value. The minimum duration is 60 seconds and the maximum duration is 86400 seconds (1 day) -- any value outside of that range will fail validation when the API is published, even if the value is accepted in this field.

  9. In the Rate Limiting section, optionally configure rate limiting, and then click Next:
    1. Select Create product if you want to generate an API Connect product that controls rate limiting for the API.

      In API Connect, a product serves as a container for an API and its associated usage plans, which define rate limits. Setting the product rate limits here creates a plan that contains the required watson-ai-default and watson-ai-infer-text assembly rate limits and specifies the limits that you defined.

      If you choose not to generate a product automatically, you must create one as explained in Creating a custom product for a watsonx.ai API.

      Attention: A product that is created later with the auto-publish feature (when publishing your API) does not include the required plan for using the AI service. Either create the product now, or create a custom product before you publish the API.
    2. Configure the types of rate limiting to enforce on API calls:
      Note: If you choose not to specify the values of a rate limit, then default values for that rate limit will be assigned.
      • Set rate limit : (Time based) Accept the default rate limit or configure a new limit based on the number of API requests sent within a specified period of time. This type of rate limit does not use tokens.

      • Set AI token limit : (Token based) Accept the default rate limit or configure a new limit based on the number of cost-based tokens used within a specified period of time. Token-based rate limiting uses the /text-generate operation to track token usage.

        The AI token limit applies rate limiting based on the token usage. The rate limit determines how many tokens are allowed to pass thru the gateway within a specified period of time.

  10. Review the Summary page to verify that the API has no issues, and then click Done.

  11. Edit the new API and add policies and logic constructs that control the API's workflow.

    The created API will contain the watsonx.ai invoke policies for the exposed paths that were requested. Their properties can be edited as needed.