Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Developing VoiceXML applications: Using the IBM Voice Toolkit for WebSphere Studio

Write, modify, and test your own application

Mark Weber (weberm@us.ibm.com), Senior Software Engineer, IBM, Software Group
Author photo
Mark Weber is a Senior Software Engineer in the e-business Architecture and Integration Consulting team within Developer Relations. He provides technical consulting, architecture, and skills enablement for IBM Business Partners building solutions using IBM software. The software products and technologies that Mark focuses on includes the WebSphere family of products, J2EE, storage management software, and DB/2, with a special emphasis on Pervasive Computing and Wireless including WebSphere Everyplace Access and the WebSphere Voice family of products. You can contact him at weberm@us.ibm.com.

Summary:  In this first part of a series on voice application development, Mark Weber covers the basics of voice applications based on the VoiceXML language. He provides an overview of the Voice Toolkit for WebSphere Studio, and uses the toolkit and WebSphere Voice Server SDK to develop a VoiceXML application. He uses features such as VoiceXML editing, reusable dialog components, JSGF grammars, custom dictionary and pronunciation entries, and prerecorded prompts to build the application. Details on how to test the application in the Voice Toolkit for WebSphere Studio, and how to deploy and run it outside of the toolkit in a production environment, are also provided.

Date:  01 Apr 2003
Level:  Introductory

Activity:  5765 views
Comments:  

Introduction

This article introduces the basic concepts of voice application development using the Voice Toolkit for WebSphere Studio and the VoiceXML programming language. I'll explore major features of the toolkit, and explain how to write and modify a simple application.

I'll focus on developing a VoiceXML application modeled on the Shipping Info sample from the Voice Toolkit's reusable dialog components (RDCs), which are part of the Voice Toolkit samples. (See Resources to download the toolkit.) The Voice Toolkit for WebSphere Studio lets you create VoiceXML applications using a combination of:

  • Manually written VoiceXML code
  • The RDCs
  • A variety of wizard-based tools for custom grammars, dictionary entries, pronunciation entries, prerecorded dialog prompts, and interfaces to Web application components such as servlets, JavaBeans, JavaServer Pages (JSP) components, and SQL Queries.

Overview of Voice Toolkit for WebSphere Studio

With the Voice Toolkit for WebSphere Studio you can:

  • Create voice projects
  • Create VoiceXML files
  • Visually edit VoiceXML files and dialogs using the VoiceXML editor
  • Take advantage of RDCs
  • Create custom VoiceXML grammars
  • Create custom dictionary entries
  • Create custom vocabulary pronunciations using the Pronunciation Builder
  • Record custom audio prompts and add prerecorded audio prompts to a VoiceXML application
  • Invoke Web applications with servlets or JavaBeans components, and provide data from VoiceXML applications to Web applications
  • Get results of Web application processing in VoiceXML applications
  • Test applications.

The Shipping Info sample gathers customer information needed prior to processing an order. Users must provide their name, shipping address, telephone number, credit card number, and delivery method. A customer can optionally provide an e-mail address to receive notifications of specials, promotions, and so on.

The RDCs provide subdialogs and templates that are combinations of pre-written VoiceXML code, grammars, and ECMAScript files that perform discrete functions by prompting for and verifying specific types of input. Examples of subdialogs and templates are number, confirmation (yes/no), currency, and street, date, time, name, address, and credit card information. The Shipping Info VoiceXML application uses the name, address, creditcardinfo, confirmation, emailaddress, and telephone RDCs.

All subdialogs have an associated ECMAScript object file. ECMAScript is a programming language similar to JavaScript, and is the official client-side scripting language of VoiceXML. In the RDCs, ECMAScript object files are used to pass customizable parameters to the VoiceXML subdialogs and templates for parameterizable strings, such as prompt text or grammar selections. The ECMAScript files also are used to simulate a database, so you can see how the values of variables set in VoiceXML forms can be used by real-world applications.


Major features of VoiceXML

VoiceXML

VoiceXML is an industry standard defined by the VoiceXML Forum, of which AT&T, Lucent, IBM, and Motorola are founding members. It has been accepted for submission by the World Wide Web Consortium (W3C) as a standard for voice markup on the Web.

Before discussing the Voice Toolkit for WebSphere Studio in detail, I'll first review some major features of the VoiceXML language. VoiceXML is an XML-based markup language for creating distributed voice applications that users can access from any telephone, with a combination of speech and pressing telephone keypad keys.

VoiceXML provides the benefits of using a standard markup style language and Web server-side logic to deliver content and access to Web applications through a telephone interface. The VoiceXML applications you create can interact with your existing back-end business data and logic, much as HTML and JSP pages can. For instance, Java servlets can be invoked and passed parameter values from a VoiceXML document, and results can be sent to be dynamically spoken to the user from a VoiceXML document.

VoiceXML supports dialogs that feature spoken input, DTMF (telephone keypad) input, recording of spoken input, synthesized speech output (text-to-speech, or TTS), recorded audio output, telephony features such as call transfer and disconnect, validation of spoken user responses using grammars, and dialog flow control.

VoiceXML dialogs

VoiceXML documents are composed primarily of top-level elements called dialogs. The two types of dialogs defined in the language are <form> and <menu>. Forms let users provide voice or DTMF input by responding to one or more <field> elements. Each field can contain one or more <prompt> elements that guide users to provide the desired input. Prompts also provide a count attribute to vary the prompt text based on the number of times that the prompt has been played. Fields can also specify a type attribute, or a <grammar> or <dtmf> element to define the valid input values for the field, and any <catch> elements necessary to process the events that might occur. Fields may also contain <filled> elements, which specify code to execute when a value is assigned to a field. You can reset one or more form fields using the <clear> element.

VoiceXML grammars

VoiceXML provides grammars to specify valid selections in respose to a prompt. The Voice Toolkit for WebSphere Studio provides a Grammar Editor to help construct VoiceXML grammars. When the Grammar Editor launches, it opens a skeleton grammar file and the valid selections for the prompt responses can be entered in the grammar panel. Formats for VoiceXML grammars include Java Speech Grammar Format (JSGF), BNF, and XML formats. JSGF is the most popular grammar format; it provides formatting and selection syntax elements as follows:

  • (parentheses) to identify words that the user must say
  • [brackets] to identify optional words
  • The "|" symbol to signify 'or' type groupings of selections
  • {braces} to specify application-specific information for a selection that can be used in places such as VoiceXML IF statements
  • "<" and ">" to group and compose compound grammar elements.

Below is an example of a JSGF grammar for specifying valid responses to a "Delivery Method" prompt in the ShippingInfo VoiceXML application.

public <deliverymethod> = [<beginning>] <type> [<end>];
<beginning> = [PLEASE] USE  | I WOULD LIKE | I NEED;
<end> = PLEASE | IS OK;
<type> = standard {std-shipping} | express {express-shipping} | express 
   plus {overnigh-shipping};

VoiceXML field types

When accepting user input, it is possible to accept specific types of data or certain types of input and store their values in VoiceXML variables. This is done by using the <field>type attribute to specify a built-in grammar for one of the base types. The <field> attribute also specifies a style for how it will be spoken if subsequently used in a prompt. For example, a field type of digits indicates that input will be spoken of keyed digits. The result is stored as a string, and rendered as digits, such as 123, not "one hundred twenty-three." Below is an example of a field attribute.

 <field name="ticket_num"  type="digits">
 <prompt>Read the number from your ticket.</prompt>
 <help>The number is to the lower left.</help>
 <filled>
 <assign name="ticket_number" expr="ticket_num"/>
 </filled>
 </field>


VoiceXML built-in data types that can be used in field elements include: boolean, date, digits, currency, number, phone, and time. The currency type supports standard ISO currency names, and defaults to the local operating system's currency for the system the Voice Server is running on.

VoiceXML events

VoiceXML defines a mechanism for handling events not capable of being handled by simple forms. Events are thrown by the voice browser for a variety of exception-type occurrences, such as when the user does not respond, doesn't respond with a proper selection, requests help, and so on. The voice browser also throws events if it finds a syntax error in the VoiceXML document, cannot access VoiceXML elements contained in missing external VoiceXML documents, and so forth. VoiceXML elements can specify catch elements that an application can use to handle these events instead of the default behavior, as described below. The built-in VoiceXML catch elements that are the default handlers for common events are:

  • The <error> element catches all events of type error:
    <error>An error has occurred -- please call again later.<exit/>,</error>
  • The <help> element can provide help text when the user asks for help:
    <help>Please say one of the valid selections.</help>
  • The <noinput> element:
    <noinput>I didn't hear anything, please try again.</noinput>
  • The <nomatch> element:
    <nomatch>Your response wasn't a known city.</nomatch>

These elements all take the attributes of count, to specify how many times to allow the event to occur before catching it, and cond, an optional condition to test if the event is caught by this element.

Using the "digits" field example above, the <filled> action can use a VoiceXML <if> statement to test the number of digits required to be in the response, and will automatically test the field response to see if it has the required number of digits. If not, the user hears the error message, and a VoiceXML event named nomatch is thrown. Typically this event will cause a reprompt, but VoiceXML handle clauses can be used to take other actions in the event of a nomatch event (or, in general, any other VoiceXML events). Below is a VoiceXML field example showing this usage:

<field name="ticket_num"  type="digits">
<prompt>Read the 12 digit number from your ticket.</prompt>
<help>The 12 digit number is to the lower left.</help>
<filled>
<if cond="ticket_num.length !=12">
<prompt>Sorry,  I didn't hear exactly 12 digits.</prompt>
<assign name="ticket_num" expr="undefined"/>
</if>
</filled>
</field>

VoiceXML subdialogs

Another type of form is the <subdialog> element, which creates a separate execution context to gather information and return it to the form. If your form requires prompts or computation that do not involve user input (for example, welcome information), you can use the <block> element. This element is also a container for the <submit> element, which specifies the next URI to visit after the user has completed all the fields in the form. You can also jump to another form item in the current form, another dialog in the current document, or another document using the <goto> element.


Using the VoiceXML editor

The VoiceXML editor is automatically launched when creating a new VoiceXML file or editing an existing VoiceXML file in the Voice Toolkit for WebSphere Studio. To enter VoiceXML code, you can write the code yourself or use the toolkit's built-in RDCs and Content Assist feature. In this example, I'll use a combination of writing my own code and importing subdialogs and templates from the RDCs.

To use the Content Assist feature of the toolkit, press Ctrl+Space inside any VoiceXML tag or element (such as block, form, prompt, and so on) to bring up Content Assist for a list of VoiceXML tags valid within that element, or VoiceXML tag attributes valid inside the current VoiceXML tag. See Figure 1 for an example of adding a VoiceXML form element.


Figure 1. Adding VoiceXML form element with the VoiceXML Editor's Content Assist
Adding VoiceXML form element with the VoiceXML Editor's Content Assist

To add a form with a block to prompt for the caller's name, press Ctrl-Space. The Content Assist pop-up list of VoiceXML tags will appear. You can use Content Assist anywhere in your voice application; the VoiceXML tags in the pop-up list are dynamic, based on only those tags that are valid for the current VoiceXML element they are being added to. Scroll through the tags, and select and double-click the <T>Form tag. Provide a name for this form, so it can be returned to later in the application if necessary, by entering the text id = "GetUserName" inside the form tag. Between the starting and ending <form> tags press Ctrl-Space and select the <T>Block tag. Enter your prompt text in the body of the Block element. For example, the code for a partially completed form will appear as follows:

<form id="GetUserName">
<block>
Answer the following questions to provide your name, Say skip if any part 
   of the name is not needed.
</block>
</form>


Adding and customizing RDCs

You can use the IBM RDCs to add common functions to your VoiceXML file. The RDCs included in the Voice Toolkit for WebSphere Studio are a basic set of subdialogs and templates that provide VoiceXML source code for common functions, letting you quickly and easily add these functions to your applications. The RDCs wizard, as shown in Figure 2, provides a simple interface for importing these components (subdialogs or templates) into your project, customizing them with application-specific prompts and grammar selections, and generating the VoiceXML code that calls the component.


Figure 2. Adding an RDC to the VoiceXML application
Adding an RDC to the VoiceXML application

In the GetUserName form, after the ending block tag, right-click and select Add Dialog Component, which will bring up the Reusable Dialog Component Wizard. In the Component Type dialog, select the Template radio button, select the "name" Template, and press Finish. The template is a set of dialogs that contain prebuilt prompts and grammars to gather elements of a full name such as first name, last name, title, and so on. The caller's responses, both the text strings, and utterances are assigned to variables within the dialogs. To access these variables in your application, you can uncomment and modify the following lines that were automatically placed in your application:

<!-- <value expr="name.returnFullName"/> -->
<!-- <value expr="name.returnFirstName"/> -->
<!-- <value expr="name.returnFirstNameUtterance"/> -->

Simply uncommenting these lines will only replay the spoken response (returnFirstNameUtternance) or the TTS interpretation of the response (returnFirstName). To assign these values to a variable in your application so that it can be used in other ways, such as to send to Web applications as parameters, modify the code as follows:

<assign name="CustomerName" expr="name.returnFullName"/>


This assigns the value of the response for the full name to the VoiceXML variable CustomerName in your application.

The code for the completed form now is:

<form id="GetUserName">
<block>
Answer the following questions to provide your name, Say skip if any part 
   of the name is not needed.
<subdialog name="name" src="reusable_comp/templates/name/en_US/name.vxml">
<filled>
<!-- To hear return values uncomment the following lines -->
<assign name="CustomerName" expr="name.returnFullName"/>
</filled>
</subdialog>
</block>
</form>

The example above used an RDC of type Template. They are composed of individual subdialogs that can also be added to your application, and let you easily customize elements such as prompts and grammars that each contain. For example, if you want to add a prompt to ask the caller if the information they provided was correct, you can do so with the Confirmation RDC:

  1. Right-click within a form element to bring up the context menu and select Add Dialog Component. This brings up the Reusable Dialog Component Wizard.
  2. In the Component Type dialog, select the Subdialog radio button, select the confirmation Subdialog, and click Next. The Customize Parameters window of the RDC wizard appears.
  3. Check the checkbox for paramPromptQuestion and replace the defaultprompt with your customized prompt, such as 'Is this information correct?'. Be sure to include single quotes, as shown in Figure 3 below.
  4. Click Next. Note that you can also easily customize the help messages and noinput messages, as shown in Figure 4. Click Finish.

Figure 3. Adding and customizing an RDCt in the VoiceXML application
Adding and customizing an RDCt in the VoiceXML application

Figure 4. Customizing an RDC in the VoiceXML application
Customizing an RDC in the VoiceXML application

Three additional lines will be added to the start of the file to reference the ECMAScript file for the subdialog. Lines in the VoiceXML file have been renumbered due to this reference. All subdialogs have an ECMAScript object file. ECMAScript is a programming language similar to JavaScript, and is the official client-side scripting language of VoiceXML. In the RDCs, ECMAScript object files are used to pass customizable parameters, such as prompt text, parameter length restrictions, dialog options, and grammar selections to the VoiceXML subdialogs. You can also see how additional applications can be integrated with the VoiceXML application by invoking scripts and passing parameters.

It would be appropriate at this point in the confirmation dialog to check if the caller responded positively or not, and to direct control in the application to the next part if they confirmed the information was correct, or to a different form that can be used to ask for the correct information. VoiceXML provides <if> and <goto> tags for this purpose. You can add the code to perform this type of flow control and conditional navigation using Content Assist in the VoiceXML Toolkit for WebSphere Studio. The code for this type of form would appear as:

<form id="AccountConfirmation">
<block>You provided the following information:
<break msecs="300"/>
Name:
<value expr="CustomerName"/>
<break msecs="300"/> 
</block>
	<subdialog name="confirmation" src="reusable_comp/subdialogs/
	   confirmation/en_US/confirmation.vxml">
		<param name="paramSubdialogObj" expr="objConfirmation"/>
		<param name="paramPromptQuestion" expr="'Is this 
		   information correct?'"/>
		<filled>
		<if cond="confirmation.returnConfirmation == true">
		Thank you. Your order will be processed for delivery.
		<exit/>
		</if>
		<goto next="#CorrectInfo"/>
</filled>
</subdialog>
</form>



Testing the VoiceXML application

You can test the VoiceXML application in several ways -- within the Voice Toolkit for WebSphere Studio, in text or audio mode, and outside of the toolkit using the Voice Server SDK.

Testing the application in audio mode

The IBM WebSphere Voice Server SDK must be installed to test the VoiceXML application in audio mode. You must also have a microphone and speakers (or microphone headset) installed. You might want to set the audio level before testing by clicking the Windows Start button, selecting Programs > IBM WebSphere Voice Server SDK > Audio Setup, and following the onscreen instructions.

To test the VoiceXML application in audio mode:

  1. Select File > Save all to make sure you have saved any changes in all of your files. Only saved versions of files are used in the test.
  2. Select your VoiceXML file in the Navigator Window of the Voice Toolkit for WebSphere Studio.
  3. From the toolkit's menu bar, select Run > Run in audio mode. Once you hear a prompt from the browser, interact with the application by saying phrases in response to dialogs you want to test. Continue saying test phrases until you finish your test.
  4. To stop the browser, click on the command window so it has keyboard focus, and press Ctrl+C.

Testing the application in text mode

Testing VoiceXML applications in audio mode can be tedious if there are many dialogs and prompts, responses are not recognized by the speech recognition engine, or grammar entries are unknown to the Voice Server's dictionary. Testing in text mode can provide a more efficient method to verify the basic navigation, control flow, event handling, and other critical elements of the application. (You should always, however, test in audio mode before deploying an application to your production environment.)

To test the VoiceXML application in text mode:

  1. Select File > Save all to make sure you have saved any changes in all of your files. Only saved versions of files are used in the test.
  2. Select your VoiceXML file in the Navigator Window of the Voice Toolkit for WebSphere Studio.
  3. From the toolkit's menu bar, select Run > Run in text mode. A command window will appear where the Voice Server engine will load and initialize the application. Textual prompts will appear in this command window, as shown in Figure 5 below. You can either type in your responses to these prompts or use the DTMF simulator, where appropriate.
  4. To stop the browser, press Ctrl+C.

Figure 5. Testing the VoiceXML application in text mode
Testing the VoiceXML application in text mode

Testing the application outside the toolkit in the Voice Server SDK

To use the Shipping Info VoiceXML application with the IBM WebSphere Voice Server SDK outside the toolkit, you must export your VoiceXML Application and all of the files and resources it references (reusable dialogs, ECMAScript files, external grammar files, prerecorded audio files, and so on) from the toolkit. To do this:

  1. Select the ShippingInfo project in the Navigator window of the Voice Toolkit for WebSphere Studio, and press File > Export
  2. Select desired location type File system, select the ShippingInfo project and its checkbox, then press Select all
  3. Enter the destination location (directory) to export the files to, and press Finish.

You can then run the batch file provided in the WebSphere Voice Server SDK for testing a VoiceXML application (include the double quotes):

"%ibmvs%\bin\vsaudio_en_us"  <path>shippinginfo.vxml


where <path> is the location where you exported all of the project files from the Voice Toolkit, and %ibmvs%, is the system environment variable that is automatically set to the install location of the Voice Server SDK when it is installed.


Creating and tuning pronunciations

The speech recognizer that is part of the IBM WebSphere Voice Server SDK has a built-in dictionary of many thousands of words, but not all possible words. In most cases this does not matter, because the Voice Server's browser can automatically generate recognition and TTS pronunciations for words with unknown pronunciations. These automatically generated pronunciations are usually correct, but sometimes they are not. When the WebSphere Voice Server encounters a word not in its built-in dictionary, it must dynamically synthesize the approximate pronunciation using the TTS engine. This can slow down your application if it contains many words unknown to the built-in Voice Server dictionary.

Creating custom predefined entries with pronunciations for unknown words can make your VoiceXML application more acceptable to users, and improve performance. The pronunciation builder features are available only if the WebSphere Voice Server SDK is installed. The VoiceXML 1.0 specification does not provide tags for pronunciation. Two extensions to the official VoiceXML 1.0 specification have been provided in the Voice Toolkit for WebSphere Studio and the Voice Server to support tags for pronunciation:

<word>
Specifies the pronunciation for a single dictionary entry. Also specifies the intended pronunciation of the new dictionary entry by using one of two available attributes: pronunciation or sounds-like.
  • Pronunciation - A pronunciation of the word specified in International Phonetic Alphabet (IPA) format. If you have the WebSphere Voice Server SDK installed, you can create and edit pronunciations by selecting the desired phonemes in the IPA Composer of the Voice Toolkit's Pronunciation Builder. The Pronunciation text field of the Pronunciation Dialog dialog shows the code point representations for the IPA characters.
  • Sounds-like - A pronunciation of the word specified in phonetic format -- an alternative spelling that indicates how the word should be pronounced. For example, in English, a sounds-like spelling of "I triple E" may be specified for a spelling of IEEE. For Japanese and Chinese, the sounds-like attribute would specify a phonetic dialect spelling such as Hiragana, Pin Yin, or Zhu Yin.
<ibmlexicon>
A container for one or more <word> tags. The element and the words it contains are associated with your specific VoiceXML document, and its runtime scope is limited to the application using the VoiceXML document.

Because the <word> and <ibmlexicon> tags are specific to the WebSphere Voice Server, you should only use these tags in VoiceXML applications that will be run using the IBM WebSphere Voice Server.

Creating a pronunciation entry in the toolkit for unknown words

To create a new pronunciation entry for a word that cannot be properly pronounced by the Voice Server or Voice Server SDK's TTS engine, use the following steps:

  1. In the Voice Toolkit for WebSphere Studio, open the VoiceXML file or the JSGF grammar file that contains the word to provide a pronunciation for. From the Voice Toolkit menu bar, select Pronunciation > Verify Pronunciations. If the Pronunciation option is not available in the toolkit menu bar, open the file in VoiceXML or Grammar editor, select the word, right-click it and select Compose Pronunciation, and proceed to step 4.
  2. To review any words that the system has identified as having an unknown recognition pronunciation, click the Unknown Pronunciations tab (next to the Outline tab in the lower left corner of the toolkit). This search procedure checks only the words that the system identifies as requiring a pronunciation for recognition (for example, words that appear in menus or grammars). Words not requiring recognition, such as words in prompts or blocks that are just played back via TTS, are not automatically searched. They can still have custom pronunciations defined in the toolkit. The word that needs a custom TTS pronunciation cannot be searched for; it must be selected explicitly, making sure to select only the word and not any of the surrounding punctuation.
  3. To check and create a pronunciation, right-click on the word to enter a pronunciation for it in the Unknown Pronunciation list. From the contextual menu that appears, select Compose Pronunciations.
  4. The Pronunciation Builder dialog appears, as shown in Figure 6, with the target word filled in and the default pronunciation already generated.
  5. Click Apply, and a file selection dialog will appear. Select your toolkit's project name in the first window, and in the File: prompt enter the name of the VoiceXML file where the <word> and <ibmlexicon> tags will be generated by the Voice Toolkit for WebSphere Studio.
  6. To create a custom pronunciation, you can enter phonetic text in the Sounds Like Pronunciation box as shown in Figure 7, or use the IPA Composer by pressing Show IPA Composer. It is much easier to use the Sounds Like Pronunciation option, since there is an obvious phonetic pronunciation for this new word. Press Sounds-like pronunciation and enter text in the Sounds Like Pronunciation box to use for pronouncing the new word (for instance, "Easy Ship" as the sounds-like pronunciation for "EZShip"), and press Apply.
  7. A file selection dialog appears. Select your project name and the pronunciation's VoiceXML filename in the next prompts. The <ibmlexicon> and <word> VoiceXML tags for the pronunciation are generated in this VoiceXML file.

Figure 6. Adding a custom pronunciation for a dictionary entry
Adding a custom pronunciation for a dictionary entry

Figure 7. Adding a sounds-like pronunciation for a dictionary entry
Adding a sounds-like pronunciation for a dictionary entry

If you prefer to use the IPA composer to provide a custom pronunciation for your unknown word, select IPA Composer on the Compose Pronunciation dialog and select the individual phonetic elements of the word's pronunciation, as shown in Figure 8.


Figure 8. Using the IPA Composer to define a pronunciation for a dictionary entry
Using the IPA Composer to define a pronunciation for a dictionary entry

Adding a custom prerecorded prompt

Synthesized speech (TTS) is useful as a placeholder during application development, when the data to be spoken is not known in advance, or is dynamically generated or retrieved from external datasources or applications, making it impossible to record. Professionally recorded prompts can provide high-quality recorded speech that has natural pronunciation and prosody. The Voice Toolkit for WebSphere Studio includes a basic audio recorder for this purpose. The basic recorder is not a substitute for a professional digital recorder, but can be useful for recording and testing recorded prompts that are placeholders for professionally recorded audio files. Using the following steps, we will record an audio file to use for one of the prompts we built previously.

  1. Select File > New > Audio File. Select the ShippingInfo voice project, enter myprompt.au for the audio file name, and click Finish.
  2. As shown in Figure 9, click Record (the microphone icon) to begin recording your audio. Say the phrase for your prompt. When finished, click Stop. The tool automatically saves the audio file in your voice project. To listen to the recording, click Play.
  3. To use the audio file, return to your VoiceXML file in the Voice Toolkit VoiceXML editor. In the appropriate block in the file, where your prerecorded prompt will be used, replace the prompt text with an audio tag, and specify the URI for the VoiceXML file that the prerecorded prompt is encoded in. For example, <audio src="myprompt.au"/>. You can also provide default text in the audio tag in case the prerecorded audio file cannot be played (for instance, if it is not found on the Voice Server system), as in this example:
    <audio src="myprompt.au">
    	Text that the Voice Server's TTS engine will use if it 
    	   cannot access the audio source file
    	</audio>


Figure 9. Recording a custom audio prompt
Recording a custom audio prompt

Summary

This article showed you how to get started developing VoiceXML applications. I provided a brief look at the functions provided by the WebSphere Voice Server, its architecture, the VoiceXML language, and the Voice Toolkit for WebSphere Studio. I gave you a detailed walk through developing a VoiceXML application using the Voice Toolkit for WebSphere Studio, using some of the basic capabilities of the toolkit and VoiceXML. In future articles I'll explore more advanced capabilities, such as access to Web applications, integrating JavaBeans components into the application, integrating access to databases, and debugging the application using the Voice Toolkit for WebSphere Studio voice application debugger.


Resources

About the author

Author photo

Mark Weber is a Senior Software Engineer in the e-business Architecture and Integration Consulting team within Developer Relations. He provides technical consulting, architecture, and skills enablement for IBM Business Partners building solutions using IBM software. The software products and technologies that Mark focuses on includes the WebSphere family of products, J2EE, storage management software, and DB/2, with a special emphasis on Pervasive Computing and Wireless including WebSphere Everyplace Access and the WebSphere Voice family of products. You can contact him at weberm@us.ibm.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Sample IT projects
ArticleID=86837
ArticleTitle=Developing VoiceXML applications: Using the IBM Voice Toolkit for WebSphere Studio
publish-date=04012003
author1-email=weberm@us.ibm.com
author1-email-cc=jaloi@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).