IBM Support

Watson Generated Captions for Video

General Page

This article explains how to use IBM Watson Speech to Text to automatically generate closed captions for videos in IBM Video Streaming. It covers how to enable caption generation at the channel or video level, monitor processing status, and publish or download generated captions. The guide also highlights best practices for improving caption accuracy, including audio quality considerations, and explains how to edit captions when needed. Additionally, it outlines supported languages and key configuration options to ensure captions are properly generated and accessible to viewers.

Watson Generated Captions for Video

Closed captions are an essential part of the video experience. They improve accessibility for viewers who are deaf or hard of hearing, non-native speakers, and those who prefer watching video without sound.

With the IBM Watson Speech to Text feature, you can automatically generate captions for videos in your IBM Video Streaming account quickly and easily.

How to Enable Watson Speech to Text

To generate captions, you must first set the language of your content. This can be done in three different ways:

1. During Channel Creation

When creating a new channel, you will be prompted to enter the channel title and select a language from a dropdown menu.

2. For an Existing Channel

  1. Go to the Channel dropdown menu in the left sidebar.
  2. Select Caption Settings.
  3. Click Change to select your channel language.
  4. Optionally enable Auto-publish generated captions to make captions available automatically.

3. For Individual Videos

  1. Go to Videos in your dashboard.
  2. Select the video and click Edit.
  3. Choose the appropriate language from the dropdown menu.

Managing Generated Captions

After enabling caption generation, you can monitor the progress and manage captions:

  1. Go to Videos from your dashboard.
  2. Select a video and click Edit.
  3. Open the Closed Captions tab.

Caption generation time depends on video length (e.g., a 45-minute video typically takes about 45 minutes).

Once processing is complete, you will receive an email notification and can:

  • Download captions as a .VTT file
  • Publish or Unpublish captions for viewer visibility
  • Use Settings to enable automatic publishing

Improving Caption Quality

Caption accuracy depends heavily on audio quality. Best results are achieved when:

  • There is a single speaker
  • The speaker uses a native language and normal pace
  • Audio quality is clear with minimal background noise

Challenges may arise with multiple speakers, brand names, or technical terminology.

To correct errors:

  1. Download the generated .VTT file.
  2. Edit it using a text editor or caption editor.
  3. Re-upload the corrected file using the Add Captions button.

Supported Languages

Generated captions are available for the following languages:

  • Arabic
  • Chinese
  • Czech
  • Dutch (Belgium, Netherlands)
  • English (Australia, India, UK, US)
  • French (France, Canada)
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Portuguese (Brazil)
  • Spanish (Spain, Latin America, Mexico)

If you select a language outside this list, captions may not generate properly. Additional languages are being added over time.

Additional Notes

Default languages will appear at the top of the language selector for quick access.

To customize the appearance of captions in the player, refer to: Customizing captions for Live and Recorded/Uploaded videos

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSLQ0V","label":"IBM Video Streaming"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Document Information

Modified date:
08 May 2026

UID

ibm17272456