Enabling voice capabilities in the embedded agent

You can enable voice input and output in your embedded agent so users can interact with the agent using spoken commands and audio responses. Voice capabilities improve accessibility and support more natural conversational experiences in embedded applications.

You configure and select a voice for the agent in Voice modality. The selected voice appears as a toggle in the Embedded agent channel, where you enable voice.

Before you begin

Make sure that you complete the following prerequisites:

  • Create an agent or open an existing agent.

  • Configure voice settings for the agent. For setup instructions, see Configuring voice settings for agents.

  • Select a voice in Voice modality so that a voice is available for the agent. For more information, see Selecting the voice in the agent.

  • Deploy the agent if you plan to enable voice in the Live environment.

Enabling voice capabilities

Follow these steps to enable voice features for the embedded agent:

  1. Go to your agent’s configuration page.

  2. Select Channels > Embedded agent.

  3. Select the environment where you want to enable voice (Draft or Live). Use the Draft environment to test voice behavior in Preview. Use the Live environment to make voice available to users in the embedded chat after you deploy the agent. Changes in Draft never affect Live until you deploy.

  4. Turn on the toggle for the voice model to enable voice.

Voice support becomes available in the embedded chat widget after you enable voice in the live environment.

Security considerations

Voice capabilities in embedded chat work with both security-enabled and security-disabled modes:

  • Security enabled: Voice audio streams are authenticated through the existing JWT token mechanism. The audio session is tied to the user identity established in the JWT, ensuring that only authorized users can access voice features. For details on configuring security, see Securing the embedded chat.

  • Security disabled: Voice capabilities work in an anonymous mode, allowing users to interact without authentication. Use this mode only when anonymous access is explicitly required and no sensitive data is exposed. For more information, see Security modes.

Voice authentication follows the same security model as text-based chat interactions, using the JWT token to validate requests and maintain session integrity.

What to do next

  • Use Preview to test the voice experience before deploying.

  • Update the web page where the agent is embedded, if needed, to ensure that the latest configuration is applied.

  • For more customization options, including layout, styling, and security setup, see Integrating agents with web applications.