Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Multimodal interaction and the mobile Web, Part 3: User authentication

Secure user authentication with voice and visual interaction

Gerald McCobb (mccobb@us.ibm.com), Advisory Software Engineer, IBM
Gerald McCobb has worked for IBM for over 14 years. He currently works in the embedded voice development group putting multimodal interaction into small devices. He is also IBM's representative to the W3C Multimodal Interaction Working Group.

Summary:  User authentication is an essential feature of transactional applications, including those for the mobile Web. See how you can create a multimodal user authentication service for use by mobile device applications.

View more content in this series

Date:  10 Jan 2006
Level:  Intermediate

Comments:  

In "Multimodal interaction and the mobile Web, Part 1," I introduced a typical scenario for a multimodal mobile application: a user who wants to use a cell phone to find a restaurant, view a menu, order a meal, and pay for it. Over the Internet, you would normally secure this type of transaction by having the user send a unique name and password over a secure (SSL or TLS) connection to the application. Unfortunately, user names and passwords are much less convenient for the wireless device user, being harder to enter and more frequently forgotten. It's also not a good idea to store such information in a mobile device's Web browser, because such devices are easily lost or stolen.

User authentication is an essential aspect of securing your multimodal interactions for mobile devices, and implementing it through a Web service is both easy and sensible. In this article, I'll introduce you to a user-configurable multimodal Web service that lets you securely authenticate users. Like other examples in this series, the user authentication service is written in the XHTML+Voice (X+V) multimodal markup language.

Multimodal user authentication

Why Web services?

Creating multimodal interactions as configurable Web services frees you from being concerned with the general problem of adding multimodal interaction to applications, while allowing you to develop simple solutions that enhance user experience. A multimodal user authentication service makes it easy for users to be authenticated, even when using small wireless devices to access Web applications. As a result, users are more likely to use their small devices to run Web applications, and will at last take advantage of the high-bandwidth networks being deployed by wireless carriers.

Multimodal user authentication is based on voice authentication, a biometric technology that analyzes a speaker's utterance to find his or her unique vocal characteristics. When a user first enrolls with the service he or she speaks one or more paragraphs. The analyzed recording of these paragraphs containing the speaker's unique vocal characteristics is stored as a voice print. The voice print is made available later for comparison with a recording entered when the user is asked to be authenticated. Current voice authentication technology is reliable, fast, and cost-effective and several vendors now supply the technology commercially.

A multimodal authentication interface helps the user both when enrolling over the Internet with a voice authentication system and later when asked by an application for proof of identity. Enrollment is easier because the paragraphs to be spoken are presented as visual text. Unlike enrollment over the telephone, for example, the user doesn't have to remember the paragraphs, but can read them instead. Later, when the user is to be authenticated, he or she is asked to speak a random group of words or numbers, which are also presented visually. The words are randomly chosen each time the authentication page is presented so that a tape recording of the user's voice cannot be used to fool the application.


The authentication service

A user can enroll from either a desktop PC or a small client device. The desktop PC is recommended because it is generally in a more comfortable, safe, and quiet environment. The user first registers with the multimodal authentication service with a unique user identification (ID). After successfully registering with the service, the user enrolls by speaking one or more paragraphs recorded as audio data, for example as Pulse Code Modulation (PCM) format data.

The audio data may be either stored first on the PC and then submitted with the enrollment Web page or streamed to the service while the user is speaking into the microphone. The voice authentication service extracts from the raw audio data a voice print that contains meaningful characteristics unique to the user's voice. The voice print is then stored in a database along with the user's supplied unique ID. Figure 1 shows the steps from accessing the enrollment page of the authentication service using a secure connection to saving the analyzed audio recording in a remote voice print database.


Figure 1. Creating a voice print with a PC
Creating a voice print

A user calling into a Web application to place an order (as in the case of the restaurant example) would first be presented with the authentication service; for example, the Internet address (URI) of the service could be included in a "cookie" sent from the user's Web browser, or it could be selected by the user from a list of well-known authentication providers. In Figure 2 you can see the interactions between the small client device and the Web application and the Web application and the authentication service.


Figure 2. User authentication with a small device
User authentication with a small device

When the Web application accesses the remote authentication service it receives a list of random words to present to the user to speak. Figure 3 shows an example user authentication Web page, login.xhtml, presented by the Opera multimodal browser that asks the user to read a list of city names.


Figure 3. Login Web page with voice authentication
Login Web page with voice authentication

The X+V user login page

Listing 1 shows the X+V source of the authentication Web page. Note that the VoiceXML <record> tag performs the actual recording.


Listing 1. X+V user login form
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:vxml="http://www.w3.org/2001/vxml" 
      xmlns:ev="http://www.w3.org/2001/xml-events"
      xmlns:xv="http://www.voicexml.org/2002/xhtml+voice">
<head><title>Voice Verification Login</title>

  <style type="text/css">
h4 { font-size: 18px; color: #990202; font-family: "sans-serif"; }
td, p { font-size: 14px; color: #990202; font-family: "sans-serif"; }
hr { height: 5px; color: #990202; border-style: groove; }
p.box { border: 2px solid #0077ee; margin: 1px 1px 1px 1px; 
  padding: 10px 12px 10px 12px; font-size: 16px; }
input { border: 2px groove #990202; margin: 2px 4px 2px 4px; 
  padding: 4px 6px 4px 6px; font-size: 16px; }
  </style>

  <!-- voice handler -->
  <vxml:form id="voice_record">
     <vxml:record name="recording">
       <vxml:prompt timeout="10s" xv:src="#intro"/>
       <vxml:filled>
         <vxml:assign name="document.recordForm.record.value" expr="recording"/>
       </vxml:filled>
     </vxml:record>
  </vxml:form>

</head>
<body id="mainbody" ev:event="load" ev:handler="#voice_record">
 
  <form id="recordForm" action="jsps/login.jsp" method="post" 
    enctype="multipart/form-data">
    <h4>Application Login Instructions</h4>
    <p id="intro">Please enter your account number and speak the 
    cities from left to right in the
       verification box.  Thank you.
    </p>
    <table width="40%">
	<tbody>
	  <tr><td colspan="2">	<hr/></td></tr>
	  <tr><td>Account Number</td>
	      <td><input type="text" name="accountno"/>
		  <input type="file" name="record" id="record" 
		    style="display:none"/>
              </td>
	  </tr>
	  <tr><td>Verification Box</td>
              <td><p class="box">
Las Vegas Detroit Budapest Miami London Bismarck
              </p></td>
          </tr>
          <tr><td colspan="2"><hr/></td></tr>
	  <tr><td colspan="2">
            <input style="border-style: outset" type="submit" value="Login" 
              name="Submit"/>
              </td>
	  </tr>
	</tbody>
    </table>
  </form>
</body>
</html>

The authentication service extracts the physical characteristics from the recording of the random words received from the user. These characteristics are compared with the voice print stored in its database with the supplied user ID. If the two sets of physical characteristics match, the Web application is informed that the user can proceed to the next step of the transaction. Otherwise, the user is denied access to the application.


In conclusion

As you've seen in this and previous articles in this series, it's relatively simple to provide multimodal interaction to a Web application as a Web service. In this case, you've learned how multimodal user authentication works and seen the underlying X+V code for the voice-driven user login page.



Download

DescriptionNameSizeDownload method
Sample code from this articlewi-mobweb3.zip3KBHTTP

Information about download methods


Resources

Learn

Get products and technologies

Discuss

About the author

Gerald McCobb has worked for IBM for over 14 years. He currently works in the embedded voice development group putting multimodal interaction into small devices. He is also IBM's representative to the W3C Multimodal Interaction Working Group.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services
ArticleID=101493
ArticleTitle=Multimodal interaction and the mobile Web, Part 3: User authentication
publish-date=01102006
author1-email=mccobb@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).