Getting started with watsonx.ai Lightweight Engine

5.0.1 or later After you install the watsonx.ai lightweight engine and add foundation models to your cluster, you can work with the models by using the API.

Accessing the web client

You can perform administrative tasks such as monitoring platform resource use from the web client.

To access the web client for watsonx.ai lightweight engine, you need to get details about the installed service.
  1. Use the following command to get details for the service:
    cpd-cli manage get-cpd-instance-details \
    --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
    --get_admin_initial_credentials=true
    The output includes the following information:
    • Base URL: example.clustername.cluster.domain
    • User name: cpadmin
    • Password: <password>
  2. Save the service details as environment variables so you can reference the details later.
    export CPDUSER=cpadmin
    export CPDUSER_PASSWORD=<password>
    export CPD_URL=example.clustername.cluster.domain
  3. To access the web client, open a web page from a URL with the following syntax:
    CPD_URL/zen
For more information about administrative tasks, see Monitoring the platform.

To perform generative AI tasks, such as inferencing foundation models or using embedding models to vectorize text, you use the IBM watsonx.ai API.

Generating API credentials

Get API credentials that you can specify when you make requests to the watsonx.ai lightweight engine service to show that you are authorized to use the available methods. To generate credentials, complete the following steps:
  1. To get an API key, you can generate the key in one of the following ways:
    • From the web client user interface.

      For more information, see Generating API keys for authentication.

    • Programmatically, by completing the following steps.
      1. Use the service details that you retrieved earlier to create a bearer token.

        For details, see the Get authorization token API in the IBM Cloud Pak for Data platform API documentation.

        curl --request POST \
          --url "https://${CPD_URL}/icp4d-api/v1/authorize" \
          --header "Content-Type: application/json" \
          --data "{
          \"username\":\"${CPDUSER}\",
          \"password\": \"${CPDUSER_PASSWORD}\"
        }"
        Copy the bearer token from the JSON that is returned.
      2. Replace TOKEN with the copied token to use as the bearer token in the following request.

        For details, see the Get API key API in the IBM Cloud Pak for Data platform API documentation.

        curl -s -X GET \
            --url "${CPD_URL}/usermgmt/v1/user/apiKey" \
            --header "Authorization: Bearer <TOKEN>"
        Note: When you use the usermgmt/v1/user/apiKey endpoint, a new API key is created and if there is an existing key, the existing key expires.

        Copy the API key that is generated.

  2. For the IBM watsonx.ai REST API, use the API key to request new bearer tokens as needed.

    If you generated the API key programmatically, you might notice that you use the same ${CPD_URL}/icp4d-api/v1/authorize endpoint that was used earlier, but this time submit the API key instead of a password to generate the bearer token.

    For details, see the Get authorization token API in the IBM Cloud Pak for Data platform API documentation.

    curl --request POST \
      --url "https://${CPD_URL}/icp4d-api/v1/authorize" \
      --header "Content-Type: application/json" \
      --data "{ \
      \"username\":\"${CPDUSER}\",
      \"api_key\": \"<APIKEY>\"
    }"
    Copy the bearer token from the access_token section of the JSON response.
  3. When you submit requests, include the bearer token that was returned in the previous step.
    curl --request POST \
      --url "https://${CPD_URL}/ml/v1/text/generation?version=2023-05-29" \
      --header "Accept: application/json" \
      --header "Authorization: Bearer <TOKEN>" \
      --header "Content-Type: application/json" \
        ...

Inferencing a foundation model programmatically

Submit a REST API request by using the IBM watsonx.ai API to inference a foundation model programmatically.

You must add the curated or custom foundation models that you want to use for text generation before you can use the API to inference them. See Adding foundation models or Adding custom foundation models.

Get a list of the available foundation models
curl -k -X GET \
--url "https://${CPD_URL}/ml/v1/foundation_model_specs?version=2024-07-23&limit=50"
Any custom foundation models that you added are available from the same endpoint as any curated foundation models that you added to the service. For more information, see Foundation model specs.
Inference a foundation model
curl -k -X POST \
--url "https:${CPD_URL}/ml/v1/text/generation?version=2024-07-23" 
--header "Authorization: Bearer <TOKEN>"
--header "Content-Type: application/json"
--data "{
    "model_id": "ibm/granite-13b-chat-v2", 
    "input": "Tell me about mortgage insurance."}"
Inference a custom foundation model
curl -k -X POST \
--url "https:${CPD_URL}/ml/v1/text/generation?version=2024-07-23"
--header "Authorization: Bearer <TOKEN>"
--header "Content-Type: application/json"
--data "{
    "model_id": "tiiuae/falcon-7b", 
    "input": "Tell me about mortgage insurance."}"
For more information about the text generation method, see Text generation.
Attention: Omit the project_id that is shown in the API reference examples. Projects are not used in the watsonx.ai lightweight engine.

Supported REST API methods

In addition to text generation, you can use the following IBM watsonx.ai REST API method from the watsonx.ai lightweight engine:
  • Text embeddings
    You must add embedding models before you can vectorize text by using the API. See Adding foundation models.
    Attention: Omit the project_id that is shown in the API reference examples. Projects are not used in the watsonx.ai lightweight engine.
For more information, see REST API.

Using the Python library

You can use the IBM watsonx.ai Python library to work with foundation models that you deploy from the watsonx.ai lightweight engine.

For more information, see Python library.

Evaluating a prompt template

You can inference a foundation model that is hosted in the watsonx.ai lightweight engine with various inputs and store the model output in a CSV file. You can then import the CSV file to a cluster where watsonx.governance is installed and evaluate the model output as a detached prompt template.

The high level steps you can take to evaluate a prompt template are as follows:
  1. Create a CSV file named prompt_data.csv to use as the starting point. Define values for the city_name prompt variable to cycle through from the code.
      city_name generated_text input_token_count generated_token_count
    0 New York City      
    1 London      
    2 Tokyo      
    3 Stockholm      
  2. Generate a bearer token that you can specify with REST requests.

    See Credentials for programmatic access.

  3. Define a method for submitting a post request to the text generation method of the IBM watsonx.ai API. In the example in Step 4, this method is referred to as `<your-text-generation-method>`.

    See Inferencing a foundation model programmatically.

  4. Use the pandas library to work with structured data in a CSV file. Add the model output to the generated_text column, and the token counts to the input_token_count and generated_token_count columns. The following code example adds information about the output that is generated by the foundation model for each input that is submitted to the CSV file.
    import pandas as pd
    
    test_df = pd.read_csv("prompt_data.csv")
    token = <specify-your-token>
    
    generated_text = []
    input_token_count = []
    generated_token_count = []
    for city in test_df["city_name"]:
        payload = {
            "model_id": "mistralai/mixtral-8x7b-instruct-v01",
            "input": f"Describe the must-see attractions for visiting {city} as a tourist."
        }
        scored_response = <your-text-generation-method>(CPD_URL, payload, token)
        generated_text.append(scored_response["results"][0]["generated_text"])
        input_token_count.append(scored_response["results"][0]["input_token_count"])
        generated_token_count.append(scored_response["results"][0]["generated_token_count"])
    
    test_df["generated_text"] = generated_text
    test_df["input_token_count"] = input_token_count
    test_df["generated_token_count"] = generated_token_count
    test_df.to_csv("custom_detached_test_prompt_data.csv", index=False)
    test_df.head()

    You can import the generated CSV file into a full installation of the watsonx.governance service and evaluate the model by following the instructions in the Evaluating detached prompt templates in projects procedure.