wx-ai deployment text-generate-stream

Infers the next tokens for a given deployed model with a set of parameters. This operation returns the output tokens as a stream of events. If a serving_name is used, then it must match the serving_name that is returned in the inference when the deployment was created.

Syntax

cpdctl wx-ai deployment text-generate-stream \
--id-or-name ID-OR-NAME \
[--accept ACCEPT] \
[--input INPUT] \
[--parameters PARAMETERS | \
--parameters-decoding-method PARAMETERS-DECODING-METHOD \
--parameters-length-penalty PARAMETERS-LENGTH-PENALTY \
--parameters-max-new-tokens PARAMETERS-MAX-NEW-TOKENS \
--parameters-min-new-tokens PARAMETERS-MIN-NEW-TOKENS \
--parameters-random-seed PARAMETERS-RANDOM-SEED \
--parameters-stop-sequences PARAMETERS-STOP-SEQUENCES \
--parameters-temperature PARAMETERS-TEMPERATURE \
--parameters-time-limit PARAMETERS-TIME-LIMIT \
--parameters-top-k PARAMETERS-TOP-K \
--parameters-top-p PARAMETERS-TOP-P \
--parameters-repetition-penalty PARAMETERS-REPETITION-PENALTY \
--parameters-truncate-input-tokens PARAMETERS-TRUNCATE-INPUT-TOKENS \
--parameters-return-options PARAMETERS-RETURN-OPTIONS \
--parameters-include-stop-sequence PARAMETERS-INCLUDE-STOP-SEQUENCE \
--parameters-typical-p PARAMETERS-TYPICAL-P \
--parameters-prompt-variables PARAMETERS-PROMPT-VARIABLES] \
[--moderations MODERATIONS]

Options

Table 1: Command options
Option Description
--id-or-name (string)

The id_or_name can be either the deployment_id that identifies the deployment or a serving_name that allows a predefined URL to be used to post a prediction.

--accept (string)

The type of the response application/JSON or text/event-stream. A character encoding can be specified by including a charset parameter. For example, text/event-stream;charset=utf-8.

Allowable values are application/json, text/event-stream.

--input (string)

The prompt to generate completions. This command tokenizes the input internally. Do not leave any trailing spaces.

--parameters

The template properties if this request refers to a prompt template. This JSON option can instead be provided by setting individual fields with other options. It is mutually exclusive with those options.

Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters=@path/to/file.json.

--moderations

Properties that control the moderations, for usages such as Hate and profanity (HAP) and Personal identifiable information (PII) filtering. This list can be extended with new types of moderations.

Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --moderations=@path/to/file.json.

The following example shows the format of the Moderations object.

{
  "hap" : {
    "input" : {
      "enabled" : true,
      "threshold" : 0
    },
    "output" : {
      "enabled" : true,
      "threshold" : 0
    },
    "mask" : {
      "remove_entity_value" : false
    }
  },
  "pii" : {
    "input" : {
      "enabled" : true,
      "threshold" : 0
    },
    "output" : {
      "enabled" : true,
      "threshold" : 0
    },
    "mask" : {
      "remove_entity_value" : false
    }
  },
  "input_ranges" : [ {
    "start" : 0,
    "end" : 0
  } ]
}
--parameters-decoding-method (string)

Represents the strategy that is used for picking the tokens during generation of the output text.

--parameters-include-stop-sequence (Boolean)

Pass false to omit matched stop sequences from the end of the output text. The default is true, meaning that the output ends with the stop sequence text when matched. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The default value is true.

--parameters-length-penalty

It can be used to exponentially increase the likelihood of the text generation terminating when a specified number of tokens have been generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-length-penalty=@path/to/file.json.

--parameters-max-new-tokens (int64)

The maximum number of new tokens to be generated. The maximum supported value for this field depends on the model being used.

--parameters-min-new-tokens (int64)

If stop sequences are given, they are ignored until minimum tokens are generated. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The default value is 0. The minimum value is 0.

--parameters-prompt-variables (string)

The prompt variables. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-prompt-variables=@path/to/file.json.

--parameters-random-seed (int64)

Random number generator seed to use in sampling mode for experimental repeatability. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The minimum value is 1.

--parameters-repetition-penalty (float64)

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The value 1.0 means that there is no penalty. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The default value is 1. The maximum value is 2. The minimum value is 1.

--parameters-return-options

Properties that control what is returned. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

Provide a JSON string option or specify a JSON file to read from by providing a file path option that begins with a @, for example --parameters-return-options=@path/to/file.json.

--parameters-stop-sequences (string)

Stop sequences are one or more strings that cause the text generation to stop when they are produced as part of the output. Stop sequences encountered before the minimum number of tokens being generated will be ignored. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The maximum length is 6 items. The minimum length is 0 items.

--parameters-temperature (float64)

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The default value is 1. The maximum value is 2. The minimum value is 0.

--parameters-time-limit (int64)

Time limit in milliseconds - if not completed within this time, generation stops. The text generated so far will be returned along with the TIME_LIMIT stop reason.

--parameters-top-k (int64)

The number of highest probability vocabulary tokens to keep for top-k-filtering. Applies only for sampling mode. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The maximum value is 100. The minimum value is 1.

--parameters-top-p (float64)

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. Also known as nucleus sampling. A value of 1.0 is equivalent to disabled. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The default value is 1. The maximum value is 1. The value must be greater than 0.

--parameters-truncate-input-tokens (int64)

Represents the maximum number of input tokens accepted. Use this optoin to avoid requests failing due to input being longer than the configured limits. If the text is truncated, then it truncates the start of the input (on the left), so the end of the input remains the same. If this value exceeds the maximum sequence length (refer to the documentation to find this value for the model), then the call fails if the total number of tokens exceeds the maximum sequence length. Zero means don't truncate. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The minimum value is 0.

--parameters-typical-p (float64)

Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If less than 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. This option provides a value for a sub-field of the JSON option 'parameters'. It is mutually exclusive with that option.

The maximum value is 1. The value must be greater than 0.

Examples

cpdctl wx-ai deployment text-generate-stream \
    --id-or-name classification \
    --input exampleString \
    --parameters '{"decoding_method": "greedy", "length_penalty": {"decay_factor": 2.5, "start_index": 5}, "max_new_tokens": 30, "min_new_tokens": 5, "random_seed": 1, "stop_sequences": ["fail"], "temperature": 1.5, "time_limit": 600000, "top_k": 50, "top_p": 0.5, "repetition_penalty": 1.5, "truncate_input_tokens": 0, "return_options": {"input_text": true, "generated_tokens": true, "input_tokens": true, "token_logprobs": true, "token_ranks": true, "top_n_tokens": 2}, "include_stop_sequence": true, "typical_p": 0.5, "prompt_variables": {}}' \
    --moderations '{"hap": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "pii": {"input": {"enabled": true, "threshold": 0}, "output": {"enabled": true, "threshold": 0}, "mask": {"remove_entity_value": false}}, "input_ranges": [{"start": 0, "end": 0}]}' \
    --accept application/json

Alternatively, granular options are available for the sub-fields of JSON string options:

cpdctl wx-ai deployment text-generate-stream \
    --id-or-name classification \
    --accept application/json \
    --input exampleString \
    --moderations '{"hap": moderationHapPropertiesModel, "pii": moderationPiiPropertiesModel, "input_ranges": [moderationTextRangeModel]}' \
    --parameters-decoding-method greedy \
    --parameters-length-penalty textGenLengthPenalty \
    --parameters-max-new-tokens 30 \
    --parameters-min-new-tokens 5 \
    --parameters-random-seed 1 \
    --parameters-stop-sequences fail \
    --parameters-temperature 1.5 \
    --parameters-time-limit 600000 \
    --parameters-top-k 50 \
    --parameters-top-p 0.5 \
    --parameters-repetition-penalty 1.5 \
    --parameters-truncate-input-tokens 0 \
    --parameters-return-options returnOptionProperties \
    --parameters-include-stop-sequence true \
    --parameters-typical-p 0.5 \
    --parameters-prompt-variables '{}'

Example output

The generated text from the model along with other details.

[ {
  "model_id" : "google/flan-ul2",
  "created_at" : "2023-07-21T19:17:36.673Z",
  "results" : [ {
    "generated_text" : "",
    "generated_token_count" : 4,
    "input_token_count" : 0,
    "stop_reason" : "eos_token"
  } ]
}, {
  "model_id" : "google/flan-ul2",
  "created_at" : "2023-07-21T19:17:36.647Z",
  "results" : [ {
    "generated_text" : " km",
    "generated_token_count" : 3,
    "input_token_count" : 0,
    "stop_reason" : "not_finished"
  } ]
}, {
  "model_id" : "google/flan-ul2",
  "created_at" : "2023-07-21T19:17:36.647Z",
  "results" : [ {
    "generated_text" : "4,000",
    "generated_token_count" : 2,
    "input_token_count" : 0,
    "stop_reason" : "not_finished"
  } ]
} ]