wx-ai deployment
text-generate
Infers the next tokens for a given deployed model with a set of parameters. If a
serving_name is used, then it must match the serving_name that is
returned in the inference when the deployment was created.
Syntax
cpd-cli wx-ai deployment
text-generate \
--id-or-name=<id-or-name> \
--parameters-include-stop-sequence=<parameters-include-stop-sequence> \
--parameters-length-penalty=<parameters-length-penalty> \
--parameters-max-new-tokens=<parameters-max-new-tokens> \
--parameters-min-new-tokens=<parameters-min-new-tokens> \
--parameters-prompt-variables=<parameters-prompt-variables>\
--parameters-random-seed=<parameters-random-seed> \
--parameters-repetition-penalty=<parameters-repetition-penalty> \
--parameters-return-options=<parameters-return-options> \
--parameters-stop-sequences=<parameters-stop-sequences> \
--parameters-temperature=<parameters-temperature> \
--parameters-time-limit=<parameters-time-limit> \
--parameters-top-k=<parameters-top-k> \
--parameters-top-p=<parameters-top-p> \
--parameters-truncate-input-tokens=<parameters-truncate-input-tokens> \
--parameters-typical-p=<parameters-typical-p> \
[--input=<input>] ] \
[--moderations=<moderations>] \
[--parameters=<parameters> | --parameters-decoding-method=<parameters-decoding-method> ]
Options
Table 1: Command options
| Option | Description |
|---|---|
--id-or-name |
The id_or_name can be either the
deployment_id that identifies the deployment or a serving_name that allows a predefined URL to be
used to post a prediction.
|
--input |
The prompt to generate
completions. Note: The method tokenizes the input
internally.
|
--moderations |
Properties that control the
moderations, for usages such as Hate and profanity (HAP) and Personal identifiable information (PII)
filtering. This list can be extended with new types of
moderations.
|
--parameters |
The template properties if this
request refers to a prompt template. This JSON option can instead be provided by setting individual
fields with other options. It is mutually exclusive with those
options.
|
--parameters-decoding-method |
Represents the strategy that is
used for picking the tokens during generation of the output
text.
|
--parameters-include-stop-sequence |
Pass false to omit matched stop
sequences from the end of the output text. The default is true, meaning that the output ends with
the stop sequence text when matched. This option provides a value for a sub-field of the JSON option
'parameters'. It is mutually exclusive with that option.
|
--parameters-length-penalty |
It can be used to exponentially
increase the likelihood of the text generation terminating when a specified number of tokens have
been generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is
mutually exclusive with that option.
|
--parameters-max-new-tokens |
The maximum number of new tokens
to be generated. The maximum supported value for this field depends on the model that is
used.
|
--parameters-min-new-tokens |
If stop sequences are given, they
are ignored until minimum tokens are generated. This option provides a value for a sub-field of the
JSON option 'parameters'. It is mutually exclusive with that option. The minimum value is
0.
|
--parameters-prompt-variables |
The prompt variables. This option
provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that
option.
|
--parameters-random-seed |
Random number generator seed to
use in sampling mode for experimental repeatability. This option provides a value for a sub-field of
the JSON option 'parameters'. It is mutually exclusive with that option. The minimum value is
1.
|
--parameters-repetition-penalty |
Represents the penalty for
penalizing tokens that have already been generated or belong to the context. The value 1.0 means
that there is no penalty. This option provides a value for a sub-field of the JSON option
'parameters'. It is mutually exclusive with that option.
|
--parameters-return-options |
Properties that control what is
returned. This option provides a value for a sub-field of the JSON option 'parameters'. It is
mutually exclusive with that option.
|
--parameters-stop-sequences |
Stop sequences are one or more
strings that cause the text generation to stop if/when they are produced as part of the output. Stop
sequences encountered before the minimum number of tokens being generated will be ignored. This
option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive
with that option. The maximum length is 6 items. The minimum length is 0
items.
|
--parameters-temperature |
A value used to modify the
next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability
distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability
distribution, resulting in "more random" output. A value of 1.0 has no effect. This option provides
a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that
option.
|
--parameters-time-limit |
Time limit in milliseconds - if
not completed within this time, generation stops. The text generated so far is returned along with
the TIME_LIMIT stop reason.
|
--parameters-top-k |
The number of highest probability
vocabulary tokens to keep for top-k-filtering. Applies only for sampling mode. When
decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates
for the next generated token. This option provides a value for a sub-field of the JSON option
'parameters'. It is mutually exclusive with that option.
|
--parameters-top-p |
Similar to top_k except the
candidates to generate the next token are the most likely tokens with probabilities that add up to
at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled. This
option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive
with that option.
|
--parameters-truncate-input-tokens |
Represents the maximum number of
input tokens accepted. Use this option to avoid requests failing due to input being longer than the
configured limits. If the text is truncated, then it truncates the start of the input (on the left),
so the end of the input remains the same. If this value exceeds the maximum sequence length (refer
to the documentation to find this value for the model), then the call fails if the total number of
tokens exceeds the maximum sequence length. Zero means don't truncate. This option provides a value
for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that
option.
|
--parameters-typical-p |
Local typicality measures how
similar the conditional probability of predicting a target token next is to the expected conditional
probability of predicting a random token next, given the partial text already generated. If less
than 1, the smallest set of the most locally typical tokens with probabilities that add up to
typical_p or higher are kept for generation. This option provides a value for a sub-field of the
JSON option 'parameters'. It is mutually exclusive with that
option.
|
Examples
cpd-cli wx-ai deployment
text-generate \
--id-or-name classification \
--input 'how far is paris from bangalore:\n' \
--parameters '{"decoding_method": "greedy", "length_penalty": {"decay_factor": 2.5, "start_index": 5}, "max_new_tokens": 100, "min_new_tokens": 5, "random_seed": 1, "stop_sequences": ["fail"], "temperature": 1.5, "time_limit": 600000, "top_k": 50, "top_p": 0.5, "repetition_penalty": 1.5, "truncate_input_tokens": 0, "return_options": {"input_text": true, "generated_tokens": true, "input_tokens": true, "token_logprobs": true, "token_ranks": true, "top_n_tokens": 2}, "include_stop_sequence": true, "typical_p": 0.5, "prompt_variables": {}}' \
--moderations '{"hap": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "pii": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "input_ranges": [{"start": 0, "end": 0}]}'
Alternatively, granular options are available for the sub-fields of JSON string options:
cpd-cli wx-ai deployment
text-generate \
--id-or-name classification \
--input 'how far is paris from bangalore:\n' \
--moderations '{"hap": moderationHapPropertiesModel, "pii": moderationPiiPropertiesModel, "input_ranges": [moderationTextRangeModel]}' \
--parameters-decoding-method greedy \
--parameters-include-stop-sequence true \
--parameters-length-penalty textGenLengthPenalty \
--parameters-max-new-tokens 30 \
--parameters-min-new-tokens 5 \
--parameters-prompt-variables '{}' \
--parameters-random-seed 1 \
--parameters-repetition-penalty 1.5 \
--parameters-return-options returnOptionProperties \
--parameters-stop-sequences fail \
--parameters-temperature 1.5 \
--parameters-time-limit 600000 \
--parameters-top-k 50 \
--parameters-top-p 0.5 \
--parameters-truncate-input-tokens 0 \
--parameters-typical-p 0.5