wx-ai text tokenize
Checks the conversion of provided input to tokens for a model. It splits text into words or subwords, which are converted to IDs through a look-up table (vocabulary). Tokenization allows the model to have a reasonable vocabulary size.
Syntax
cpdctl wx-ai text tokenize \
--input INPUT \
--model-id MODEL-ID \
[--cpd-scope CPD-SCOPE] \
[--parameters PARAMETERS | --parameters-return-tokens PARAMETERS-RETURN-TOKENS] \
[--project-id PROJECT-ID] \
[--space-id SPACE-ID]
Options
| Option | Description |
|---|---|
--cpd-scope (string) |
The IBM Software Hub space,
project, or catalog scope. For example,
cpd://default-context/spaces/7bccdda4-9752-4f37-868e-891de6c48135.
|
--input (string) |
The input string to tokenize. Required. |
--model-id (string) |
The |
--parameters |
The parameters for text tokenization. This JSON option can instead be provided by setting individual fields with other options. It is mutually exclusive with those options. Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a |
--parameters-return-tokens (Boolean) |
If this option is The default value is |
--project-id (string) |
The project that contains the resource. Either The maximum length is |
--space-id (string) |
The space that contains the resource. Either The maximum length is |
Examples
cpdctl wx-ai text tokenize \
--model-id google/flan-ul2 \
--input 'Write a tagline for an alumni association: Together we' \
--space-id exampleString \
--project-id 12ac4cf1-252f-424b-b52d-5cdd9814987f \
--parameters '{"return_tokens": true}'
Alternatively, granular options are available for the sub-fields of JSON string options:
cpdctl wx-ai text tokenize \
--model-id google/flan-ul2 \
--input 'Write a tagline for an alumni association: Together we' \
--space-id exampleString \
--project-id 12ac4cf1-252f-424b-b52d-5cdd9814987f \
--parameters-return-tokens true
Example output
The response with the token count.
The response with the token count and the tokens, if requested.
{
"model_id" : "google/flan-ul2",
"result" : {
"token_count" : 11,
"tokens" : [ "Write", "a", "tag", "line", "for", "an", "alumni", "associ", "ation:", "Together", "we" ]
}
}