wx-ai deployment text-generate-stream
Infers the next tokens for a given deployed model with a set of parameters. This
operation returns the output tokens as a stream of events. If a serving_name
is
used, then it must match the serving_name
that is returned in the
inference
when the deployment was created.
Syntax
cpdctl wx-ai deployment text-generate-stream \
--id-or-name ID-OR-NAME \
[--accept ACCEPT] \
[--input INPUT] \
[--parameters PARAMETERS | \
--parameters-decoding-method PARAMETERS-DECODING-METHOD \
--parameters-length-penalty PARAMETERS-LENGTH-PENALTY \
--parameters-max-new-tokens PARAMETERS-MAX-NEW-TOKENS \
--parameters-min-new-tokens PARAMETERS-MIN-NEW-TOKENS \
--parameters-random-seed PARAMETERS-RANDOM-SEED \
--parameters-stop-sequences PARAMETERS-STOP-SEQUENCES \
--parameters-temperature PARAMETERS-TEMPERATURE \
--parameters-time-limit PARAMETERS-TIME-LIMIT \
--parameters-top-k PARAMETERS-TOP-K \
--parameters-top-p PARAMETERS-TOP-P \
--parameters-repetition-penalty PARAMETERS-REPETITION-PENALTY \
--parameters-truncate-input-tokens PARAMETERS-TRUNCATE-INPUT-TOKENS \
--parameters-return-options PARAMETERS-RETURN-OPTIONS \
--parameters-include-stop-sequence PARAMETERS-INCLUDE-STOP-SEQUENCE \
--parameters-typical-p PARAMETERS-TYPICAL-P \
--parameters-prompt-variables PARAMETERS-PROMPT-VARIABLES] \
[--moderations MODERATIONS]
Options
Option | Description |
---|---|
--id-or-name (string) |
The |
--accept (string) |
The type of the response Allowable values are |
--input (string) |
The prompt to generate completions. This command tokenizes the input internally. Do not leave any trailing spaces. |
--parameters |
The template properties if this request refers to a prompt template. This JSON option can instead be provided by setting individual fields with other options. It is mutually exclusive with those options. Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a |
--moderations |
Properties that control the moderations, for usages such as Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a The following example shows the format of the
Moderations
object.
|
--parameters-decoding-method (string) |
Represents the strategy that is used for picking the tokens during generation of the output text. |
--parameters-include-stop-sequence (Boolean) |
Pass The default value is |
--parameters-length-penalty |
It can be used to exponentially increase the likelihood of the text generation terminating when a specified number of tokens have been generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a |
--parameters-max-new-tokens (int64) |
The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used. |
--parameters-min-new-tokens (int64) |
If stop sequences are given, they are ignored until minimum tokens are generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The default value is |
--parameters-prompt-variables (string) |
The prompt variables. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a |
--parameters-random-seed (int64) |
Random number generator seed to use in sampling mode for experimental repeatability. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The minimum value is |
--parameters-repetition-penalty (float64) |
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The default value is |
--parameters-return-options |
Properties that control what is returned. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. Provide a JSON string option or specify a JSON file to read from by providing a file path option
that begins with a |
--parameters-stop-sequences (string) |
Stop sequences are one or more strings that cause the text generation to stop when they are produced as part of the output. Stop sequences encountered before the minimum number of tokens being generated will be ignored. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The maximum length is |
--parameters-temperature (float64) |
A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The default value is |
--parameters-time-limit (int64) |
Time limit in milliseconds - if not completed within this time, generation stops. The text generated so far will be returned along with the TIME_LIMIT stop reason. |
--parameters-top-k (int64) |
The number of highest probability vocabulary tokens to keep for top-k-filtering. Applies only for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The maximum value is |
--parameters-top-p (float64) |
Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The default value is |
--parameters-truncate-input-tokens (int64) |
Represents the maximum number of input tokens accepted. Use this optoin to avoid requests failing
due to input being longer than the configured limits. If the text is truncated, then it truncates
the start of the input (on the left), so the end of the input remains the same. If this value
exceeds the The minimum value is |
--parameters-typical-p (float64) |
Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If less than 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The maximum value is |
Examples
cpdctl wx-ai deployment text-generate-stream \
--id-or-name classification \
--input exampleString \
--parameters '{"decoding_method": "greedy", "length_penalty": {"decay_factor": 2.5, "start_index": 5}, "max_new_tokens": 30, "min_new_tokens": 5, "random_seed": 1, "stop_sequences": ["fail"], "temperature": 1.5, "time_limit": 600000, "top_k": 50, "top_p": 0.5, "repetition_penalty": 1.5, "truncate_input_tokens": 0, "return_options": {"input_text": true, "generated_tokens": true, "input_tokens": true, "token_logprobs": true, "token_ranks": true, "top_n_tokens": 2}, "include_stop_sequence": true, "typical_p": 0.5, "prompt_variables": {}}' \
--moderations '{"hap": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "pii": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "input_ranges": [{"start": 0, "end": 0}]}' \
--accept application/json
Alternatively, granular options are available for the sub-fields of JSON string options:
cpdctl wx-ai deployment text-generate-stream \
--id-or-name classification \
--accept application/json \
--input exampleString \
--moderations '{"hap": moderationHapPropertiesModel, "pii": moderationPiiPropertiesModel, "input_ranges": [moderationTextRangeModel]}' \
--parameters-decoding-method greedy \
--parameters-length-penalty textGenLengthPenalty \
--parameters-max-new-tokens 30 \
--parameters-min-new-tokens 5 \
--parameters-random-seed 1 \
--parameters-stop-sequences fail \
--parameters-temperature 1.5 \
--parameters-time-limit 600000 \
--parameters-top-k 50 \
--parameters-top-p 0.5 \
--parameters-repetition-penalty 1.5 \
--parameters-truncate-input-tokens 0 \
--parameters-return-options returnOptionProperties \
--parameters-include-stop-sequence true \
--parameters-typical-p 0.5 \
--parameters-prompt-variables '{}'
Example output
The generated text from the model along with other details.
[ {
"model_id" : "google/flan-ul2",
"created_at" : "2023-07-21T19:17:36.673Z",
"results" : [ {
"generated_text" : "",
"generated_token_count" : 4,
"input_token_count" : 0,
"stop_reason" : "eos_token"
} ]
}, {
"model_id" : "google/flan-ul2",
"created_at" : "2023-07-21T19:17:36.647Z",
"results" : [ {
"generated_text" : " km",
"generated_token_count" : 3,
"input_token_count" : 0,
"stop_reason" : "not_finished"
} ]
}, {
"model_id" : "google/flan-ul2",
"created_at" : "2023-07-21T19:17:36.647Z",
"results" : [ {
"generated_text" : "4,000",
"generated_token_count" : 2,
"input_token_count" : 0,
"stop_reason" : "not_finished"
} ]
} ]