Working with the watsonx.ai lightweight engine

You can work with foundational models in the lightweight installation mode by using the API. The watsonx.ai lightweight engine does not have a dedicated user interface. However, you can perform administrative tasks such as monitoring platform resource usage from the IBM Software Hub platform web client.

Generating API credentials

To perform generative AI tasks, such as inferencing foundation models or using embedding models to vectorize text, you use the watsonx API. Get API credentials that you can specify when you make requests to the watsonx.ai lightweight engine service to show that you are authorized to use the available methods.

To generate credentials, complete the following steps:

To get an API key, you can generate the key in one of the following ways:
- From the web client user interface. For more information, see Generating API keys for authentication.
- Programmatically, by completing the following steps:
1. Use the service details that you retrieved earlier to create a bearer token. For details, see the Get authorization token.
```
curl --request POST \
--url "https://${CPD_URL}/icp4d-api/v1/authorize" \
--header "Content-Type: application/json" \
--data "{
\"username\":\"${CPDUSER}\",
\"password\": \"${CPDUSER_PASSWORD}\"
}"
```
```
Copy the bearer token from the JSON that is returned.
```
1. Replace TOKEN with the copied token to use as the bearer token in the following request. For details, see the Get API key.
```
curl -s -X GET \
--url "${CPD_URL}/usermgmt/v1/user/apiKey" \
--header "Authorization: Bearer <TOKEN>"
```
Note: When you use the `usermgmt/v1/user/apiKey` endpoint, a new API key is created and if there is an existing key, the existing key expires.

Copy the API key that is generated.
For the watsonx.ai REST API, use the API key to request new bearer tokens as needed. If you generated the API key programmatically, you might notice that you use the same ${CPD_URL}/icp4d-api/v1/authorize endpoint that was used earlier, but this time submit the API key instead of a password to generate the bearer token. For details, see the Get authorization token.
```
curl --request POST \
--url "https://${CPD_URL}/icp4d-api/v1/authorize" \
--header "Content-Type: application/json" \
--data "{
\"username\":\"${CPDUSER}\",
\"api_key\": \"<APIKEY>\"
}"
```
Copy the bearer token from the access_token section of the JSON response.

When you submit requests, include the bearer token that was returned in the previous step.

curl --request POST \
--url "https://${CPD_URL}/ml/v1/text/generation?version=2023-05-29" \
--header "Accept: application/json" \
--header "Authorization: Bearer <TOKEN>" \
--header "Content-Type: application/json" \
...

Inferencing a foundation model

Submit a REST API request by using the watsonx.ai API to inference a foundation model programmatically. You must add the curated or custom foundation models that you want to use for text generation before you can use the API to inference them.

Get a list of the available foundation models

Any custom foundation models that you added are accessible from the same endpoint as any curated foundation models that you added to the service. For more information, see Foundation model specs.

curl -k -X GET \
--url "https://${CPD_URL}/ml/v1/foundation_model_specs?version=2024-07-23&limit=50"

Inference a foundation model

curl -k -X POST \
--url "https:${CPD_URL}/ml/v1/text/generation?version=2024-07-23" 
--header "Authorization: Bearer <TOKEN>"
--header "Content-Type: application/json"
--data "{
    "model_id": "ibm/granite-13b-chat-v2", 
    "input": "Tell me about mortgage insurance."}"

Inference a custom foundation model

curl -k -X POST \
--url "https:${CPD_URL}/ml/v1/text/generation?version=2024-07-23"
--header "Authorization: Bearer <TOKEN>"
--header "Content-Type: application/json"
--data "{
    "model_id": "tiiuae/falcon-7b", 
    "input": "Tell me about mortgage insurance."}"

For more information about the text generation method, see Text generation.

Attention:

Omit the `project_id` that is shown in the watsonx.ai API reference documentation examples. watsonx.ai lightweight engine does not use projects.

Supported REST API methods

In addition to text generation, you can use the following watsonx.ai REST API method from the watsonx.ai lightweight engine:

Text embeddings You must add embedding models before you can vectorize text by using the API.

Attention:

Omit the `project_id` that is shown in the watsonx.ai API reference documentation examples. watsonx.ai lightweight engine does not use projects.

Using the Python library

You can use the IBM watsonx.ai Python library to work with foundation models that you deploy from the watsonx.ai lightweight engine.

For more information, see Using the Python library from the watsonx.ai lightweight engine.

Sample notebooks:

Use watsonx, Chroma, and LangChain to answer questions in lightweight clusters

Evaluating a prompt template

You can inference a foundation model that is hosted in the watsonx.ai lightweight engine with various inputs and store the model output in a CSV file. You can then import the CSV file to a cluster where watsonx.governance is installed and evaluate the model output as a detached prompt template.

The high level steps you can take to evaluate a prompt template are as follows:

Create a CSV file named prompt_data.csv to use as the starting point. Define values for the city_name prompt variable to cycle through from the code.

	city_name	generated_text	input_token_count	generated_token_count
0	New York City
1	London
2	Tokyo
3	Stockholm

Generate a bearer token that you can specify with REST requests. See Credentials for programmatic access.
Define a method for submitting a post request to the text generation method of the watsonx.ai API. In the example in Step 4, this method is referred to as <your-text-generation-method>. See Inferencing a foundation model programmatically.
Use the pandas library to work with structured data in a CSV file. Add the model output to the generated_text column, and the token counts to the input_token_count and generated_token_count columns. The following code example adds information about the output that is generated by the foundation model for each input that is submitted to the CSV file.

import pandas as pd

test_df = pd.read_csv("prompt_data.csv")
token = <specify-your-token>

generated_text = []
input_token_count = []
generated_token_count = []
for city in test_df["city_name"]:
    payload = {
        "model_id": "mistralai/mixtral-8x7b-instruct-v01",
        "input": f"Describe the must-see attractions for visiting {city} as a tourist."
    }
    scored_response = <your-text-generation-method>(CPD_URL, payload, token)
    generated_text.append(scored_response["results"][0]["generated_text"])
    input_token_count.append(scored_response["results"][0]["input_token_count"])
    generated_token_count.append(scored_response["results"][0]["generated_token_count"])

test_df["generated_text"] = generated_text
test_df["input_token_count"] = input_token_count
test_df["generated_token_count"] = generated_token_count
test_df.to_csv("custom_detached_test_prompt_data.csv", index=False)
test_df.head()

You can import the generated CSV file into a full installation of the watsonx.governance service and evaluate the model by following the instructions in the Evaluating detached prompt templates in projects procedure.

Learn more

Parent topic: Devloping generative AI solutions