Everyone needs a good calendar tool during these busy times, so why not a voice-enabled one? With VoiceXML, you can create a calendar that you can manipulate using your own speech. Along the way you will also learn to:
- Create a menu-based application
- Accept input
- Write the input to a script for further processing
- Read a data file and output VXML
Voice, and audio in general, is more and more popular on the Web. Examples include the plethora of music and webcasts currently available online. This series shows several ways to combine voice and XML to develop the following useful applications:
- Part 1 —a voice-enabled RSS reader.
- Part 2—a voice-enabled calendar.
- Part 3 —a voice-enabled blogging and Twitter application.
- Part 4 —an voice-enabled Yahoo search application.
For your calendar workflow you will have a very simple structure with, at least initially, just two options:
- List your existing appointments
- Add an appointment
For convenience, you'll provide a third option that doesn't need a special type of handler. When the user speaks "finish", the call will then disconnect.
Figure 1 shows the basic layout of the main menu with the two options.
Figure 1. Main application menu
As you can see in Figure 1 this is a relatively straightforward system. If you say "diary," then you move to the application that will output all of the current diary entries. If you say "appointment," then you move to the application that will accept the input.
For the storage of the calendar information you'll use an XML document, and that means that you need to be able to dynamically output the content of the XML document as VXML. To accept the appointment information—that is, the day, month, year and appointment type—you can model that entirely in VXML, although you will need a dynamic component to save the input.
For that latter option, the structure itself is simple; it needs to accept from the user—by speech or Dual Tone Multi-Frequency (DTMF)—the date and time information. Figure 2 shows the detail of the structure.
Figure 2. Accepting the appointment selection
Now look at the implementation of this system.
The main calendar menu is a simple VXML file that provides the main options as a starting point for the rest of the application. You might do this several ways, but for this example you'll use the option tag. This can set a value into an associated field automatically and will accept a single word as the audio to be recognized and a DTMF tone from the phone pad.
This allows the user to speak the command or use the keypad to make the selection, so you have to provide a prompt that spells out the combination of selection options available.
Once the field is populated with the selection of the user, an if block selects the next step. In the case of the first
two options, that means linking to a different URL. For the final option, the disconnect
tag hangs up the phone.
Listing 1 shows the VXML of the main menu.
Listing 1. The main menu of the Calendar application
<?xml version="1.0" encoding ="UTF-8"?>
<!DOCTYPE vxml PUBLIC "-//W3C//DTD VOICEXML 2.1//EN"
"http://www.w3.org/TR/voicexml21/vxml.dtd">
<vxml version="2.1">
<form id="MM">
<field name="FMM">
<prompt>
Please choose.
<break strength="medium"/>
Press one or say diary for your current diary,
press two or say appointment to add a new appointment.
Say finish if you want to finish the session.
</prompt>
<option value="diary" dtmf="1">
diary
</option>
<option value="appointment" dtmf="2">
appointment
</option>
<option value="finish" dtmf="0">
finish
</option>
<filled>
<if cond="FMM =='diary'">
<goto next="dumpdate.cgi"/>
<elseif cond="FMM =='appointment'"/>
<goto next="entry.vxml"/>
<elseif cond="FMM =='save'"/>
<goto next="savedate.cgi"/>
<elseif cond="FMM =='finish'"/>
<prompt>Thank you</prompt>
<disconnect/>
</if>
</filled>
</field>
</form>
</vxml>
|
At this stage you are not concerned with the identity of the user, but in an enterprise environment you'll want to add a level of identification that allows a user to enter a unique identification number (or say his name), and a PIN number or password to validate the entry. From then on, you can pass the ID number between VoiceXML and the dynamic scripts.
Also, although you'll use XML as the storage format of the information, if you had a unique ID available then you can more easily to use that unique ID to extract the information directly from a database.
With those considerations aside, let's examine the VXML to accept a new appointment entry.
To add an appointment to the existing file, you must first get the information from the user about the date, time and the appointment type that they want to add.
Again, you can accept this information in many ways, but for the first five fields (day, month, year, hour, minute) you ultimately need to capture numeric information.
The field tag accepts a type attribute that specifies the type of information that you want to store, and can be used to specify the grammar rules to be used for accepting. Some input types are predefined, and one of those is the numeric input (digits) input type, which accepts a specification for the minimum and maximum number of digits: <field name="Day" type="digits?minlength=1;maxlength=2">.
Note that these are digits, not numbers, so when speaking a value the user must say the digits. Although this is mildly less flexible, it ensures that you get the right information, especially when specifying the year, which can potentially be spoken several ways.
You can re-use this process for all five of the numeric values you need, as shown here in Listing 2. For each field definition, the embedded prompt will be read out and the system will wait for the appropriate response before it moves on to the next field. Most VoiceXML browsers include some automated elements that will re-prompt the user if the input does not match the expected data types.
Listing 2. Accepting input values
<?xml version="1.0" encoding ="UTF-8"?>
<!DOCTYPE vxml PUBLIC "-//W3C//DTD VOICEXML 2.1//EN"
"http://www.w3.org/TR/voicexml21/vxml.dtd">
<vxml version="2.1" xmlns="http://www/w3/org/2001/vxml" xml:lang="en-US">
<form id="MyForm">
<prompt>Adding a new appointment.<break time="1000"/></prompt>
<field name="Day" type="digits?minlength=1;maxlength=2">
<prompt>Say the day of the month (using digits). For example, for 12th say one,
two.</prompt>
</field>
<field name="Month" type="digits?minlength=1;maxlength=2">
<prompt>Say the month (in numbers)</prompt>
</field>
<field name="Year" type="digits?minlength=4;maxlength=4">
<prompt>Say the year (in numbers, using four digits)</prompt>
</field>
<field name="Hour" type="digits?minlength=1;maxlength=2">
<prompt>Say the hour (using the 24-hour clock)</prompt>
</field>
<field name="Minute" type="digits?minlength=1;maxlength=2">
<prompt>Say the minutes</prompt>
</field>
|
For the type of the meeting, you need to specify a specific list of accepted words or phrases that the voice browser can identify when the user speaks them. You can chose from a number of different standards and methods to specify the grammar rules: One is based on a text format, the other an XML specification.
You can easily embed the text specification into a VXML document by using a CDATA
block. The format enables you to specify the words permitted to be spoken and how this is interpreted and rendered to a text or other string into the VoiceXML field. For example, if you want to accept the spoken word "meeting" and then assign the corresponding word to the field you would use this: [meeting] {<TypeOfMeeting "meeting">}.
You can add further options, as shown here in Listing 3, where you provide dentist, doctor and party options.
Listing 3. Accepting a meeting type
<field name="TypeOfMeeting">
<prompt>Say the type of appointment. Options are meeting, dentist,
doctor, party. </prompt>
<grammar type="text/gsl">
<![CDATA[[
[meeting] {<TypeOfMeeting "meeting">}
[dentist] {<TypeOfMeeting "dentist">}
[doctor] {<TypeOfMeeting "doctor">}
[party] {<TypeOfMeeting "party">}
]]]>
</grammar>
</field>
|
In Listing 3, note that the target field for the destination of the information was specified, as well as the eventual data to be written into the field.
Once the user supplies the information required to fill out all of the fields, the filled block of the VoiceXML is called. For your application, you'll use it to re-specify what the date, time and type are for the appointment.
When speaking specific data types it can help to tell the text-to-speech (TTS) system of the voice browser the type of information to convert into speech. For example, if you consider a typical date, "17/5/2007", it might be read out as "seventeen-slash-five-slash-two thousand and seven". As that string it is meaningless. But, if the TTS system identifies it as a date, it can instead read the string as "seventeenth of May two thousand and seven".
The say-as tag can tell the TTS parser how to speak an item,
the interpret-as tag indicates the data type, and the format tag indicates the order of the information. For example, "dmy" would indicate that the date was in the format of the day, followed by the month, and then the year.
You can apply the same tag to speak the time (which will make it identify times such as 11:30 as "half past eleven", rather than just bare numbers). You can see the VXML to generate this description in Listing 4.
Listing 4. Re-confirming the output
<filled>
<prompt>
You have specified a date of:
<say-as interpret-as="date" format="dmy">
<value expr="Day"/>/<value
expr="Month"/>/<value expr="Year"/>
</say-as>
At:
<say-as interpret-as="time" format="hm">
<value expr="Hour"/>:<value expr="Minute"/>
</say-as>
<break/>
</prompt>
<prompt>Appointment type of
<value expr="TypeOfMeeting"/>
</prompt>
|
Finally, with all of the information relayed, you need to send the user input to the script for handling. The script will then save the information within the XML calendar file that stores all of the appointments.
The submit tag within VXML can be used to exchange field information between VXML and scripts that can handle the information. The namelist attribute should contain a space-separated list of the fields that you want to supply.
This is then translated by the voice browser into a standard http field/value string that you can parse and extract using the same methods as you would use for any parameters supplied to a script from a form within a standard HTML page. Listing 5 shows the VXML required to send the response information from the variables to the script that will save your appointment.
Listing 5. Submitting the input field data to an external script
<submit
next="savedate.cgi"
namelist="Day Month Year Hour Minute TypeOfMeeting"
method="post"/>
</filled>
</form>
</vxml>
|
You can improve on the interaction with this VoiceXML application. Simply, for example, to improve the method of accepting input, you might write a grammar rule for the month so you can speak the month (see Listing 6).
Listing 6. Grammar rule for the month
<grammar type="text/gsl">
<![CDATA[[
[january] {<TypeOfMeeting "1">}
[february] {<TypeOfMeeting "2">}
[march] {<TypeOfMeeting "3">}
...
[december] {<TypeOfMeeting "12">}
]]]>
</grammar>
|
Other improvements are to restart the input process if the user selects to submit the information. In terms of validation, that is something best handled by a script that can be more selective about the input values and check them against known values. Those responsibilities, and those of saving the appointment, are part of the script.
The appointments will be saved in an XML file, and you'll read them from the
VoiceXML form, which will submit them as standard HTTP parameters. The sample script
uses Perl to read the information, and the XML::DOM module to
load the existing XML. Add the new appointment and then write out the updated file,
although you can just as easily use a Java™, Python or PHP script to handle the input.
The format of the XML file is shown in Listing 7.
Listing 7. Perl script to read the information
<diary>
<meeting date="26/3/2007" time="12:30" type="Party"/>
</diary>
|
Everything is self-contained in a meeting tag, with the attributes storing the actual data. This simplifies the writing and updating of the information. To ensure that you receive valid data, and don't write bad information to the XML, you access the parameters and set the ok variable accordingly.
You might, at this point, validate the supplied date by trying to create a DateTime object using the supplied data and also set the ok variable to false. A false value for the ok variable triggers the script to output a VXML fragment that redirects the user to the addentry.vxml file that will ask the user to re-specify their appointment information. If the information is valid, you update the XML diary file and write out a VXML fragment that redirects the user to the main menu.
Listing 8 shows the entire script.
Listing 8. Saving the appointment to your XML file
#!/usr/bin/perl
use CGI qw/:standard/;
use XML::DOM;
print header(-type => 'text/xml');
my $ok = 1;
foreach my $param (qw/Day Month Year Hour Minute TypeOfMeeting/)
{
$ok = 0 if (!defined(param($param)));
}
if ($ok)
{
my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("dates.xml");
my $meeting = $doc->createElement('meeting');
$meeting->setAttribute('date', sprintf('%s/%s/%s',
param('Day'),
param('Month'),
param('Year')));
$meeting->setAttribute('time', sprintf('%s:%s',
param('Hour'),
param('Minute')));
$meeting->setAttribute('type', param('TypeOfMeeting'));
my $diary = $doc->getElementsByTagName ("diary")->item(0);
$diary->appendChild($meeting);
open(DATA,">dates.xml");
print DATA $doc->toString;
close(DATA);
print <<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">
<form>
<block>
<prompt>Appointment saved.<break time="2000"/></prompt>
<goto next="calmenu.vxml"/>
</block>
</form>
</vxml>
EOF
}
else
{
print <<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">
<form>
<block>
<prompt>Sorry, there was a problem with your appointment.
Please try again.<break time="2000"/></prompt>
<goto next="entry.vxml"/>
</block>
</form>
</vxml>
EOF
}
|
You need one more script—the script that will read the existing diary XML file and speak the existing appointments.
The diary script reads the diary.xml file and generates a VXML fragment that will read out the contents of the current diary. It is very simple and straightforward and does nothing more than you have seen before when generating VXML output. At the end of the diary list, it returns the user back to the main menu VXML file. The full script is in Listing 9.
Listing 9. Outputting VXML from your diary file
#!/usr/bin/perl
use CGI qw/:standard/;
use XML::DOM;
print header(-type => 'text/xml');
my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("dates.xml");
my $nodes = $doc->getElementsByTagName ("meeting");
my $n = $nodes->getLength;
print <<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">
<form>
<block><prompt>Your current diary.<break
time="2000"/></prompt></block>
EOF
for (my $i = 0; $i < $n; $i++)
{
my $node = $nodes->item ($i);
my $daten = $node->getAttributeNode ("date");
my $timen = $node->getAttributeNode ("time");
my $typen = $node->getAttributeNode ("type");
my $date = $daten->getValue;
my $time = $timen->getValue;
my $type = $typen->getValue;
print <<EOF;
<block>
<prompt>
<say-as interpret-as="date" format="dmy">$date</say-as>,
at
<say-as interpret-as="time" format="hm">$time</say-as>,
<break time="500"/>
$type
<break time="1000"/>
</prompt>
EOF
if ($i == ($n-1))
{
print '<prompt><break time="1000"/>End of diary.
Returning to main menu.<break time="2000"/></prompt><goto
next="calmenu.vxml"/>';
}
print '</block>';
}
print <<EOF;
</form>
</vxml>
EOF
|
Listing 10 shows a sample of the output.
Listing 10. A sample diary output in VXML
Content-Type: text/xml; charset=ISO-8859-1
<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">
<form>
<block>
<prompt>Your current diary.<break time="2000"/></prompt>
</block>
<block>
<prompt>
<say-as interpret-as="date" format="dmy">26/3/2007</say-as>,
at
<say-as interpret-as="time" format="hm">12:30</say-as>,
<break time="500"/>
Party
<break time="1000"/>
</prompt>
</block>
<block>
<prompt>
<say-as interpret-as="date" format="dmy">1/3/2007</say-as>,
at
<say-as interpret-as="time" format="hm">12:30</say-as>,
<break time="500"/>
doctor
<break time="1000"/>
</prompt>
</block>
<block>
<prompt>
<say-as interpret-as="date" format="dmy">2/2/2007</say-as>,
at
<say-as interpret-as="time" format="hm">10:30</say-as>,
<break time="500"/>
party
<break time="1000"/>
</prompt>
<prompt>
<break time="1000"/>
End of diary. Returning to main menu.
<break time="2000"/>
</prompt>
<goto next="calmenu.vxml"/>
</block>
</form>
</vxml>
|
You can improve and expand many elements of the current script in an updated version of the application. You might provide an additional menu that lets the user select the period or duration of the output. For example, you might let the user speak the following phrases (through an additional VoiceXML menu):
- Today—Appointments for the current day. Because the output and selection is a based on a script, you can determine the current day and filter out XML dynamic.
- Tomorrow—Appointments for the next day.
- Specific date.
Using the principles shown elsewhere in this article, the fundamentals of this process should be clear.
In this article, you looked at some extended interactivity between a VoiceXML application and the scripts used to support it. The key here is the submit tag, which enables you to submit information to a script in the same way that you submit fields to any normal Web script. This exchange of information opens up a world of possibilities in terms of the interactivity between your applications, existing data and a voice-based browser or interface.
Be sure to read the next part, Part 3 in this series, where you will learn how to develop a simple blogging application that takes VoiceXML as input and saves the data into your online blog. You will also learn how to use VoiceXML with "tweets", or Twitter entries.
| Description | Name | Size | Download method |
|---|---|---|---|
| Part 2 sample code | x-voicexml-cal.zip | 4KB | HTTP |
Information about download methods
Learn
-
Create VoiceXML pages within
a Java Web developer framework, Part 1: Generate VoiceXML using Java servlets and JSPs
(Brett McLaughlin, developerWorks, January 2006): Learn how Java servlets could easily power a VoiceXML application.
-
Create VoiceXML pages
within a Java Web developer framework, Part 2: Expanding Java-driven VoiceXML
applications (Brett McLaughlin, developerWorks, January 2006): Learn how to use servlets to go beyond single-page applications.
-
VoiceXML 2.1
specification: Read about the set of features commonly implemented by Voice Extensible Markup Language platforms.
-
IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
-
XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
-
developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology
bookstore: Browse for books on these and other technical topics.
Get products and technologies
-
Rome RSS/Atom syndication: Download these open source
Java tools and libraries for parsing, generating and publishing RSS and Atom feeds.
-
XML::FeedPP: Get this module from Perl's module repository, CPAN.
-
JDOM
library: Download this DOM-based XML parser for Java programming. It is required by the Rome RSS library.
-
Voxeo: Find a wealth of information and a hosting solution for VoiceXML applications that provides access through traditional, VoIP and Skype.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
-
XML zone discussion
forums: Participate in any of several XML-related discussion forums.
-
developerWorks blogs: Check out these blogs and get involved in the developerWorks community.

Martin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS/X and more -- as well as Web programming, systems management and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft. He can be contacted through his Web site at http://www.mcslp.com.
Comments (Undergoing maintenance)





