wx-ai deployment text-generate

Infers the next tokens for a given deployed model with a set of parameters. If a serving_name is used, then it must match the serving_name that is returned in the inference when the deployment was created.

Syntax

cpd-cli wx-ai deployment
text-generate \
--id-or-name=<id-or-name> \
--parameters-include-stop-sequence=<parameters-include-stop-sequence> \
--parameters-length-penalty=<parameters-length-penalty> \ 
--parameters-max-new-tokens=<parameters-max-new-tokens> \
--parameters-min-new-tokens=<parameters-min-new-tokens> \
--parameters-prompt-variables=<parameters-prompt-variables>\ 
--parameters-random-seed=<parameters-random-seed> \
--parameters-repetition-penalty=<parameters-repetition-penalty> \
--parameters-return-options=<parameters-return-options> \
--parameters-stop-sequences=<parameters-stop-sequences> \
--parameters-temperature=<parameters-temperature> \
--parameters-time-limit=<parameters-time-limit> \
--parameters-top-k=<parameters-top-k> \
--parameters-top-p=<parameters-top-p> \
--parameters-truncate-input-tokens=<parameters-truncate-input-tokens> \
--parameters-typical-p=<parameters-typical-p> \
[--input=<input>] ] \
[--moderations=<moderations>] \
[--parameters=<parameters> | --parameters-decoding-method=<parameters-decoding-method> ]

Options

Table 1: Command options
Option Description
--id-or-name The id_or_name can be either the deployment_id that identifies the deployment or a serving_name that allows a predefined URL to be used to post a prediction.
Status
Required.
Syntax
--id-or-name=<id-or-name>
Default value
No default.
Input type
string
--input The prompt to generate completions. Note: The method tokenizes the input internally.
Status
Optional.
Syntax
--input=<input>
Default value
No default.
Input type
string
--moderations Properties that control the moderations, for usages such as Hate and profanity (HAP) and Personal identifiable information (PII) filtering. This list can be extended with new types of moderations.
Status
Optional.
Syntax
--moderations=<moderations>
Default value
No default.
Input type
string
Valid values
Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --moderations=@path/to/file.json. The following example shows the format of the Moderations object.

{
  "hap" : {
    "input" : {
      "enabled" : true,
      "threshold" : 0
    },
    "output" : {
      "enabled" : true,
      "threshold" : 0
    },
    "mask" : {
      "remove_entity_value" : false
    }
  },
  "pii" : {
    "input" : {
      "enabled" : true,
      "threshold" : 0
    },
    "output" : {
      "enabled" : true,
      "threshold" : 0
    },
    "mask" : {
      "remove_entity_value" : false
    }
  },
  "input_ranges" : [ {
    "start" : 0,
    "end" : 0
  } ]
}
--parameters The template properties if this request refers to a prompt template. This JSON option can instead be provided by setting individual fields with other options. It is mutually exclusive with those options.
Status
Optional.
Syntax
--parameters=<parameters>
Default value
No default.
Input type
string
Valid values
Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters=@path/to/file.json.
--parameters-decoding-method Represents the strategy that is used for picking the tokens during generation of the output text.
Status
Optional.
Syntax
--parameters-decoding-method=<parameters-decoding-method>
Default value
No default.
Input type
string
--parameters-include-stop-sequence Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output ends with the stop sequence text when matched. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-include-stop-sequence=<parameters-include-stop-sequence>
Default value
True.
Input type
Boolean
--parameters-length-penalty It can be used to exponentially increase the likelihood of the text generation terminating when a specified number of tokens have been generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-length-penalty=<parameters-length-penalty>
Default value
No default.
--parameters-max-new-tokens The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model that is used.
Status
Required.
Syntax
--parameters-max-new-tokens=<parameters-max-new-tokens>
Input type
Int64
Default value
No default.
--parameters-min-new-tokens If stop sequences are given, they are ignored until minimum tokens are generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The minimum value is 0.
Status
Required.
Syntax
--parameters-min-new-tokens=<parameters-min-new-tokens>
Input type
Int64
Default value
The default value is 0.
--parameters-prompt-variables The prompt variables. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-prompt-variables=<parameters-prompt-variables>
Input type
string
Default value
No default.
Valid values
Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-prompt-variables=@path/to/file.json.
--parameters-random-seed Random number generator seed to use in sampling mode for experimental repeatability. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The minimum value is 1.
Status
Required.
Syntax
--parameters-random-seed=<parameters-random-seed>
Input type
int64
Default value
No default.
--parameters-repetition-penalty Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-repetition-penalty=<parameters-repetition-penalty>
Input type
float64
Default value
The default value is 1. The maximum value is 2. The minimum value is 1.
--parameters-return-options Properties that control what is returned. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-return-options=<parameters-return-options>
Default value
No default.
Valid values
Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-return-options=@path/to/file.json.
--parameters-stop-sequences Stop sequences are one or more strings that cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered before the minimum number of tokens being generated will be ignored. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option. The maximum length is 6 items. The minimum length is 0 items.
Status
Required.
Syntax
--parameters-stop-sequences=<parameters-stop-sequences>
Input type
string
Default value
No default.
Valid values
Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-return-options=@path/to/file.json.
--parameters-temperature A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-temperature=<parameters-temperature>
Input type
float64
Default value
The default value is 1. The maximum value is 2. The minimum value is 0.
--parameters-time-limit Time limit in milliseconds - if not completed within this time, generation stops. The text generated so far is returned along with the TIME_LIMIT stop reason.
Status
Required.
Syntax
--parameters-time-limit=<parameters-time-limit>
Input type
int64
Default value
No default.
--parameters-top-k The number of highest probability vocabulary tokens to keep for top-k-filtering. Applies only for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-top-k=<parameters-top-k>
Input type
int64
Default value
No default. The maximum value is 100. The minimum value is 1.
--parameters-top-p Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-top-p=<parameters-top-p>
Input type
float64
Default value
The default value is 1. The maximum value is 1. The value must be greater than 0.
--parameters-truncate-input-tokens Represents the maximum number of input tokens accepted. Use this option to avoid requests failing due to input being longer than the configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input remains the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model), then the call fails if the total number of tokens exceeds the maximum sequence length. Zero means don't truncate. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-truncate-input-tokens=<parameters-truncate-input-tokens>
Input type
int64
Default value
No default. The minimum value is 0.
--parameters-typical-p Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If less than 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.
Status
Required.
Syntax
--parameters-typical-p=<parameters-typical-p>
Input type
float64
Default value
No default. The maximum value is 1. The value must be greater than 0.

Examples

cpd-cli wx-ai deployment
text-generate \
    --id-or-name classification \
    --input 'how far is paris from bangalore:\n' \
    --parameters '{"decoding_method": "greedy", "length_penalty": {"decay_factor": 2.5, "start_index": 5}, "max_new_tokens": 100, "min_new_tokens": 5, "random_seed": 1, "stop_sequences": ["fail"], "temperature": 1.5, "time_limit": 600000, "top_k": 50, "top_p": 0.5, "repetition_penalty": 1.5, "truncate_input_tokens": 0, "return_options": {"input_text": true, "generated_tokens": true, "input_tokens": true, "token_logprobs": true, "token_ranks": true, "top_n_tokens": 2}, "include_stop_sequence": true, "typical_p": 0.5, "prompt_variables": {}}' \
    --moderations '{"hap": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "pii": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "input_ranges": [{"start": 0, "end": 0}]}'

Alternatively, granular options are available for the sub-fields of JSON string options:

cpd-cli wx-ai deployment
text-generate \
    --id-or-name classification \
    --input 'how far is paris from bangalore:\n' \
    --moderations '{"hap": moderationHapPropertiesModel, "pii": moderationPiiPropertiesModel, "input_ranges": [moderationTextRangeModel]}' \
    --parameters-decoding-method greedy \ 
    --parameters-include-stop-sequence true \
    --parameters-length-penalty textGenLengthPenalty \
    --parameters-max-new-tokens 30 \
    --parameters-min-new-tokens 5 \ 
    --parameters-prompt-variables '{}' \
    --parameters-random-seed 1 \
    --parameters-repetition-penalty 1.5 \
    --parameters-return-options returnOptionProperties \
    --parameters-stop-sequences fail \
    --parameters-temperature 1.5 \
    --parameters-time-limit 600000 \
    --parameters-top-k 50 \
    --parameters-top-p 0.5 \
    --parameters-truncate-input-tokens 0 \
    --parameters-typical-p 0.5