Skip to main content

Reusable Dialog Components

Mainstreaming speech-enabled Web applications

Juan Huerta (huerta@us.ibm.com), Research Staff Member, T. J. Watson Research Center, IBM
Juan Huerta writes articles for IBM developerWorks.
David Lubensky (davidlu@us.ibm.com), Manager, Advanced Conversational Services, T. J. Watson Research Center, IBM
David Lubensky writes articles for IBM developerWorks.
David Nahamoo (nahamoo@us.ibm.com), DGM Human Language Technologies, T. J. Watson Research Center, IBM
David Nahamoo writes articles for IBM developerWorks.
Roberto Pieraccini (rpieracc@us.ibm.com), Multilingual NL components, T. J. Watson Research Center, IBM
Roberto Pieraccini writes articles for IBM developerWorks.
T.V. Raman (tvraman@us.ibm.com), Speech/Voice Recognition Research Human Language Technologies, T. J. Watson Research Center, IBM
Raman writes articles for IBM developerWorks.
Charlie Wiecha (wiecha@us.ibm.com), Senior Manager, Interaction Middleware and Standards for Portal Server, T. J. Watson Research Center, IBM
Charlie Wiecha writes articles for IBM developerWorks.

Summary:  Speech application development is evolving to dynamically generated VoiceXML. Now companies can cost-effectively add speech to Web apps and not sacrifice the quality of the resulting Voice User Interface. Reusable Dialog Components, a component framework based on JavaServer Pages, are central to this evolution. Explore this roadmap for driving down the overall cost of creating, deploying, and managing speech solutions. Also, learn how complex speech applications built with today's technologies can interoperate with speech-enabled Web applications for a smooth transition and a seamless user experience.

Date:  07 Oct 2004
Level:  Introductory
Activity:  328 views

The speech-technology industry took its first step toward the adoption of a Web programming model by standardizing VoiceXML, Version 2.0. First-generation voice-enabled Web applications were mostly built of static VoiceXML pages.

The next step is a move to complex applications deployed on standard Web servers and implemented through programs that deliver dynamically generated VoiceXML markup. To add speech-enabled Web applications to the mainstream is to adopt uniform programming models to create and deploy these speech-enabled Web applications.

Jumpstart the RDC effort

IBM is spearheading this effort with the initial donation of a set of reference Reusable Dialog Components (RDCs) and the supporting framework integrated within the Java 2 Platform, Enterprise Edition (J2EE) and JavaServer Pages (JSP) programming models. The reference implementation of this framework is available as open source through the Apache Jakarta Taglibs project (see Resources).

The initial set of components and the underlying framework is the start of a community effort toward the evolution of a common programming model for voice interaction based on J2EE and JSP technology. With this framework, developers can avoid compartmentalizing speech into its own application-development niche. Members of the community supporting this initiative can use the framework and reference implementation according to their business models.

Unifying the visual and voice Web can lead to a common framework that consists of collecting and presenting information. Visual Web applications perform user interaction through widgets assembled on HTML pages. Specialized components that deal with specific types of data, such as money, time, dates, and addresses, help reduce the cost of implementing complex interactive applications and greatly accelerate the development of visual Web applications through the process of customization and reuse.

On the visual Web, creating sophisticated user interaction is mediated by component libraries that ease the generation of complex HTML pages. The move to dynamically generated VoiceXML requires similar component libraries that capture best practices in Voice User Interface (VUI) design. Mainstreaming voice access to the Web changes today's practice of developing entire speech applications to a model where voice access is achieved by replacing the visual view layer with a high-quality VUI. In this model, you develop Web applications using standard application frameworks such as Struts; you achieve voice access by creating appropriate views that are assembled from a set of reusable and configurable components. You need to create such components within a framework that encourages interoperability across components to help unify the speech applications market.

Whereas the visual Web can rely on a persistent visual display backed by error-free user input, the speech medium is temporal and nonpersistent. Speech interaction is characterized by a sequence of turns where requests or pieces of information are alternatively spoken by the system and by the user. Although it is advancing at a fast pace, speech-recognition technology is still error-prone and needs to be backed up by confirmation, correction and reprompting. With prepackaged dialog components, Web developers can more efficiently handle these aspects of conversational interaction and ease the overall task of speech enablement.

For effective use by nonspeech specialists, speech components must embed much of the specific knowledge that enables the creation of high quality speech interfaces. Thus, you must incorporate grammars, prompts, confirmation, and correction strategies into these components. You must also ensure that the components are sufficiently configurable to allow reuse within a wide range of applications. Finally, you should be able to put together sophisticated components from simpler ones.

The Reusable Dialog Component (RDC) framework embodies all of these features. RDCs are interoperable components within the J2EE and JSP framework that offer a means to bring speech-specific knowledge to . Each RDC component is composed of a data model, speech-specific assets like grammar and prompts, configuration files, and the dialog logic needed to collect a piece of information. The VoiceXML that performs the VUI is generated by the component implementation. A developer writes an application by instantiating these components and specifying their run-time behaviors through component attributes and configuration files. The data model is where components store the values collected from the user interaction; and components handle data validation and normalization.

Component data models are implemented as Java beans. Each component implements a set of tasks including data collection, confirmation, validation and disambiguation. Component authors can provide custom implementations for all of these tasks. Atomic RDCs collect simple data values such as a time, the name of a place, or an alphanumeric string; you can put atoms together to form composite RDCs. You also can aggregate composite and atomic RDCs to form more complex components. The resulting composite RDCs are structured in the same way as atomic ones. They also have a data model, implement sets of tasks, and include speech-specific assets. Their behavior is specified by attributes and configuration files. The framework provides a container tag to facilitate the construction of composite RDCs. The container implementation invokes a pluggable dialog-management strategy that controls the activation of the constituent RDCs. The framework provides a default-directed dialog strategy that a developer can override.

Remove the cost from development of speech solutions

Building on standardized programming models creates the opportunity to develop mainstream tools for speech enablement. This section outlines the roadmap for how IBM sees today's world of speech-oriented applications and the evolution toward a world where speech enablement is just another aspect of overall application development.

Adopt the Web programming model for voice interaction

The speech-technology industry took one of its first steps toward integrating voice interaction with mainstream applications when it adopted VoiceXML and the associated Web programming model built around HTTP and distributed resources that are identified through URLs. This adoption allowed the speech-technology industry to move away from speech applications written as executable programs that link directly to the underlying speech engines. Today you can develop voice applications using standards-compliant VoiceXML, Version 2.0, which avoids tying the final application to any specific vendor's engine application programming interfaces (APIs).

From static to dynamic VoiceXML

To continue this evolution means creating Web applications that emit standards-compliant VoiceXML. This follows the same evolutionary pattern as seen on the visual Web; static HTML pages have been replaced over time by server-side Web application frameworks that emit HTML. Creation of standardized Web programming models that abstract the details of back-end integration, as well as the underlying business logic that determines the transitions among different stages in an application, have facilitated server-side deployment of Web applications. These standardized models help developers integrate user tasks into ever-larger applications. As the speech-enabled Web evolves in an analogous manner, voice application development moves from today's voice-specific programming model and associated tools to one in which voice interaction is authored as a specialized view that binds to a common underlying Web application.

Tools for voice enablement

Tools for speech-enabling Web applications can integrate seamlessly with mainstream Web application tools. An example is the Struts builder available within the IBM WebSphere® Studio Application Developer tool, as shown in Figure 1.


Figure 1. The Struts builder
The Struts builder

Click to view a larger version of Figure 1.

With this Struts builder, speech specialists can focus on the task of creating high-quality voice user interaction without having to develop the complete application. These VUI components can incorporate best practices of VUI design and help ensure that speech-enabling Web applications do not sacrifice the quality of the user experience. Finally, during this transition period, you can still integrate existing speech-enabled applications created within today's voice-centric programming models into the overall application flow by using the underlying Web framework defined by HTTP. As an example, a voice-enabled financial portal created by binding a VUI to an underlying Web application might choose to invoke a pre-existing speech bank application through a URL, or more generally, as a Web service. (Struts allows the separation of the presentation layer from the underlying application flow. To produce the voice view, you can voice-enable Struts applications by replacing visual-view JSP pages with RDC-based JSP pages.)

The goal: Drive cost out of voice applications

When the transition to speech-enabling Web applications is complete, IBM expects the overall cost of voice-enablement to be significantly reduced from today's levels. Each link in the overall end-to-end value chain of speech application deployment can focus on a specific core competency.


Value propositions and business opportunities

Next, we outline how the mainstreaming of speech solutions by adding speech-enablement to the overall portfolio of Web technologies creates new business opportunities for different segments of the speech industry. The end-to-end value chain that makes up the creation, deployment, and delivery of voice applications comprises several parts. At present, vendors play in more than one part of this value chain -- some of them in at least two or three neighboring sectors. IBM's long-term goal is to help each class of vendors focus on their particular core competencies, while relying on interoperability that comes from using standards.

Voice platform vendors

The momentum behind VoiceXML Version 2.0 has created an exponential growth in the software industry, and IBM expects this trend to be enhanced by the speech-enablement of J2EE Web applications using a standardized programming model that provides robust access, while controlling overall total cost of ownership (TCO). The ability of the mainstream Web programmer to generate high-quality VUIs expressed in VoiceXML can significantly enhance the value of robust VoiceXML browsers.

Hosting

A standardized deployment environment based on the widely used and tested J2EE Web application architecture helps control the overall cost of hosting and maintaining speech-enabled applications.

Speech-recognition and text-to-speech (TTS) engines

J2EE Web developers can leverage the evolution of speech technologies to deliver on-demand spoken access to Web services. This can create more volume in the market request for speech technology, which can become part of the standard assets for Web applications. Engine vendors might be enticed to add advanced functionality and technological improvement in their core technologies to support advanced requirements defined by component creators and Web developers.

Development tools

Tools that are consistent with interoperable components encourage developers to create libraries of speech-enabling building blocks. These libraries can lead to rapid application development (RAD) and free developers to focus on more-sophisticated user interactions.

Enterprises and service providers

As developers bring speech to standard J2EE Web applications, using the widely available skill set of J2EE and JSP Web development, they can add spoken access to businesses quickly and cost-effectively plus help control TCO.

Application developers

As developers create dynamic voice access to Web applications and services, a standardized Web-programming model and associated tools help reduce the cost of developing on demand voice-enabled solutions. Speech-enabling J2EE applications through JSP technology and use of dialog components can create demand for application development services based on this standard programming model.


In conclusion

Speech-recognition technology is mature, and mainstream deployment of speech solutions can drive down costs in key areas like customer care. To reduce the cost of creating, managing, and deploying mainstream speech applications, developers must build on standardized Web-programming models. This can turn speech-enablement into yet another access channel to mainstream Web applications. To enable this evolution without sacrificing the overall quality of the user experience requires the packaging of speech-interaction expertise into standardized components that can be integrated into mainstream Web development environments.


Resources

About the authors

Juan Huerta writes articles for IBM developerWorks.

David Lubensky writes articles for IBM developerWorks.

David Nahamoo writes articles for IBM developerWorks.

Roberto Pieraccini writes articles for IBM developerWorks.

Raman writes articles for IBM developerWorks.

Charlie Wiecha writes articles for IBM developerWorks.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=18035
ArticleTitle=Reusable Dialog Components
publish-date=10072004
author1-email=huerta@us.ibm.com
author1-email-cc=htc@us.ibm.com
author2-email=davidlu@us.ibm.com
author2-email-cc=htc@us.ibm.com
author3-email=nahamoo@us.ibm.com
author3-email-cc=htc@us.ibm.com
author4-email=rpieracc@us.ibm.com
author4-email-cc=htc@us.ibm.com
author5-email=tvraman@us.ibm.com
author5-email-cc=htc@us.ibm.com
author6-email=wiecha@us.ibm.com
author6-email-cc=htc@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers