PHP bees and audio honey: Accessible agent-based audio alerts and feedback

Use PHP agents to feed information to your audio system

This article describes a system that uses open source tools to collect, edit, and funnel information to a central database, where it is arranged appropriately for presentation, not on the screen, but announced via the audio system for the benefit of users such as those with visual impairments. The system uses a number of PHP agents that operate independently to generate, edit, arrange, and announce information.

Colin Beckingham (colbec@start.ca), Writer, Researcher

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields, including banking, horticulture, horse racing, teaching, civil service, retail, and travel/tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming and voice-control applications on Linux. You can reach Colin at colbec@start.ca.



13 October 2009

Also available in Chinese Japanese

Your team leader has asked you to prepare a position paper relating to your company's acquisition of XYZ Co. Ltd. The paper will be used at a meeting that will take place at exactly 4:00 p.m. local time, at which time a decision will be made whether to proceed. At 3:50 p.m., you are at your desk quietly putting the finishing touches to your paper when your computer suddenly announces—verbally through your headset—"Just a moment, just a moment, there is a message from BCD feed related to XYZ Ltd." You access your feed reader, note the information, tweak your position paper, and impress the meeting attendees by having the most up-to-date information. On this occasion, your visual impairment was not a factor at all.

Frequently used acronyms

  • FIFO: First in, first out
  • HTML: Hypertext Markup Language
  • RDBMS: Relational database management system
  • RSS: Really Simple Syndication
  • SQL: Structured Query Language
  • TTS: Text-to-speech
  • XML: Extensible Markup Language

Your advantage is that you have multiple agents working for you, collecting and organizing information. Everybody else happened not to read that feed, because the news is only minutes old. And rather than add another pop-up window to the screen, the computer gained your attention through your ears, which only makes sense, as a screen is not much use to you.

Audible computer output is a useful alternative and accessible presentation mechanism, particularly for people with visual impairments. The idea of a computer voice is hardly new—movie enthusiasts will have noted the reference above to "Just a moment," which comes from 2001—A Space Odyssey (1968), in which the computer announces a fault in a component on the communications array. However, the vast majority of today's computer output still goes to the screen.

Why would Arthur C. Clarke and Stanley Kubrick choose voice interaction and not a screen or printed presentation? You don't have to have a visual impairment to prefer to receive some kinds of information aurally rather than visually.

Saying, not printing

HAL's voice is soothing and friendly, a welcome change in a fast-paced world. A voice follows you around in a way a screen cannot, and often, your hearing can work more reliably than your vision. Maybe you think better when you are walking around. In some cases, a sound alert has advantages even for those without a visual impairment. Imagine you are monitoring line voltage from your power supply. A constantly refreshed line on the screen can go unnoticed when it changes. The agent in this case can play a low sound while voltage is within correct parameters and a high-pitched sound when a problem occurs. This is what bees do: Disturb them, and the pitch changes. Provided that your headphones or speakers are not switched off, it is much more likely that you will get the message through your ears.

The Festival TTS engine

Festival is a text-to-speech (TTS) engine. Given a text string, the engine tries to enunciate the words through your computer audio system. (See Resources for details.) To quickly verify that your installation of Festival is working correctly, run $ festival, then run festival> (SayText "Hello world").

Announcing does pose some problems, however. In a message that contains the character string "PM late," does this mean late in the afternoon or that the Prime Minister was late? A human reader can pause for thought and place in context, but what would your PHP agent do with it? Such a situation is clearly complex, but you can handle each situation as it arises, just as bees produce honey from different kinds of nectar.

HAL, the computer in the movie, was not exactly speaking the truth on that occasion, which should alert us to the need to examine carefully how the announcements are prepared. Assuming for the moment that audio is better than print for this special purpose, first let's get an overview of how this system works.


Overview of the mashup

Imagine an RDBMS database in the cloud that contains messages. These messages, which can be either text messages or file names identifying a sound, are scanned by a PHP script that intelligently selects either a TTS engine such as Festival or an audio player such as MPlayer to render the content. New messages are fed to the database by scripts in PHP that examine various sources of information such as news feeds (XML), Web pages (HTML), output from bash scripts, and so on, with one specific job in mind: Build a meaningful message to be placed in the queue. Figure 1 provides a diagram of this system.

Figure 1. The audio system
A diagram of the audio system showing agents, devices database storage and servers all interacting with each other through the cloud.

Figure 1 shows one possible arrangement. On the left are the inputs; on the right are the outputs. In the center is a cloud that implies that any part of the system can be physically isolated from the rest as long as it is logically connected. Above and below the cloud are special services such as the database storage and HTTP servers required to deliver the message coherently. The figure indicates multiple outputs—for example, where several users are receiving the same messages.

It sounds straightforward, and in fact it is—except that there are many details you must attend to at each stage of the process. Let's examine those details now.


Back-end storage

The sample back-end database contains only one table with a few simple fields. This database is explained in Listing 1.

Listing 1. Back-end MySQL data definition
CREATE TABLE IF NOT EXISTS `queue` (
  `qid` int(11) NOT NULL AUTO_INCREMENT,
  `enunc` text COLLATE latin1_general_ci NOT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `sys` tinyint(1) NOT NULL DEFAULT '0',
  `engine` tinyint(4) NOT NULL DEFAULT '0',
  `procd` tinyint(4) NOT NULL DEFAULT '0',
  `pause` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`qid`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=607 ;

In this code, which was exported from MySQL (make your own adjustments for other RDBMSs), the table is created if it does not already exist. The first field is the highly desirable unique ID identifier for each record. This is followed by a text-type field (which could alternatively be a varchar type, if you prefer) that is the actual message. The next field is a time stamp that defaults to the time at the moment the entry was recorded. Sys is a logical field that can be either yes or no. It indicates whether this is a high-priority message from the system that must be presented before all other messages—for example, that a key script has stopped functioning. The next field, engine, is a number that represents which program is going to announce the information, whether it be Festival or MPlayer or some other sound manager. Next comes a field, procd, which notes whether this message has been announced. This field allows you to keep a record of all messages but only announce new messages as required. The final field, pause, allows the system to put a message on hold while some other condition exists. You can add other fields as required—particularly other procd fields to indicate that the message has been announced on other machines. I have elected to use the InnoDB database engine to allow for later table joining and relationship creation. In a simple situation, the default MyISAM engine would do as well.


Announcing

With the back end in place and ready to store messages, you need an announcer. This script looks at the records in the queue table and reads them if the record is not paused and has not yet been announced in FIFO order, with all system messages taking priority regardless of age.

Listing 2 shows a short, simplified set of instructions in PHP to extract the data required.

Listing 2. PHP code snippet for Announcer
<?php
$mysqli = new mysqli($server,$user,$password,$database);
$sql = "select qid,enunc,engine from queue
	  where procd=0 and pause=0
	  order by sys desc,qid asc";
$result = $mysqli->query($sql) or die("Error:".$mysqli->error);
if ($result->num_rows > 0) {
  $row = $result->fetch_array();
  echo $row['enunc']."\n";
  switch ($row['engine']) {
    case 0: // festival TTS
      saytext($row['enunc']);
    break;
    case 1: // mplayer sounds/music
      playsound($row['enunc']);
    break;
    default: // must be something else
      echo "I m sorry Dave, I can t do that, 
	  you did not define an engine for ".$row['enunc']."\n";
    break;
  }
  $qid = $row['qid'];
  set_pron($qid);
}
?>

In the above code, the SQL statement calls for the list of records that have not been announced and are not paused. It then loops through these records and takes action in the switch statement based on which player is used; in this case, the number 0 indicates the Festival engine, and the number 1 tells MPlayer to play a sound. There are also a number of points where diagnostic information is printed to the screen to help in debugging. When each message has been played or enunciated, set_pron() calls a simple function that sets the field procd in the back end to non-zero so that when the script runs again, this particular message will not be repeated.

The functions that call the players to read off the messages depend on the player you use. Listing 3 shows an example for Festival.

Listing 3. PHP function calling Festival
function saytext($phrase) {
global $debug;
    if ($debug) echo $phrase."\n";
    exec('festival -b \'(SayText "'.$phrase.'")\'');
}

Note that this function calls the exec() function in PHP. You're relying on this function to stop the execution of the script while the announcement is made, which in turn allows the voice rendering to finish before attempting the next record. In addition, the phrase passed to Festival must be checked to ensure that it is syntactically acceptable to the TTS engine and that what is enunciated will make sense to the listener (more on this issue later). Also, it might be the task of the announcer to reduce the volume on one channel so that another channel will be audible.

Note: Although you can run the script manually as required, it may make more sense to put it in as a cron job to run repeatedly, with appropriate time delay between repeats.


Hard-working agents

With the announcer running in the background waiting eagerly for new messages, it is now necessary to put something interesting in the queue. The initial example referred to an agent scanning an RSS feed, so the code in Listing 4 follows up on that idea. RSS feeds use the XML standard, so you can use the PHP XML functions to pull out the relevant records from the feed.

Listing 4. Basic PHP code for an RSS scanning agent
<?php
    $mysqli = new mysqli($server,$user,$password,$database);
    $rss = "URI_of_feed_goes_here";
    $xmlstr = file_get_contents($rss);
    $xml = new SimpleXMLElement($xmlstr);
    $crit = "XYZ";
    $jam = "Just a moment";
    foreach ($xml->channel->item as $item) {
	$thistitle = $item->title;
	if (strpos($thistitle,$crit) > 0) {
	    // $titl = speakize($item->title,'news');
	    // $detl = speakize($item->description,'news');
	    $alert = "$jam, $jam, there is a news item in the feed";
	    echo "$titl : $detl\n";
	    // table has `qid`, `enunc`, `ts`, `sys`, `engine`, `procd`, `pause`
	    $sql = "insert into queue values(NULL,'$alert',NULL,0,0,0,0)";
	    $result = $mysqli->query($sql) or die($mysqli->error);
	}
    }
?>

The scanning agent performs the following tasks:

  • After setting up the mysqli connection to the database, it defines which RSS feed will be examined and fetches it into a string variable.
  • This string is then read into an XML object. The variable $crit is set to XYZ, which is your criterion—any feed items about XYZ are what we are looking for.
  • The foreach statement loops through the news items looking for anything to do with XYZ in the title.
    Note: This task assumes that the reference will be in the title, but you can be more thorough by expanding the logical condition into an OR statement and searching more of the feed.
  • If an item is found, then the script goes to work adding a record to the queue.
    In this case, three alternative strings could be added to the database: the title of the feed item, the description, or a generic line that says there is a feed item that fits the criterion. The last is the safest, because it contains known words that Festival will be able to render. In this case, the title and description are output to the screen for debugging purposes.
  • Finally, notice the series of arguments NULL,'$alert',NULL,0,0,0,0, where:
    • NULL is whatever ID you like.
    • '$alert' substitutes the content of the alert variable.
    • NULL is the time stamp from the server.
    • 0 indicates that the message is not a priority message.
    • 0 indicates that the engine is Festival.
    • 0 indicates that the message has not been read.
    • 0 indicates that the message is not paused.

Careful readers of Listing 4 will have noted the use of a function, speakize(), which is not detailed here. It is sufficient at this point to say that it turns text designed for a screen into honey for the ear, making necessary changes as a result of the alternative output.

At this point, the cycle is complete: An agent is feeding information to the database, an announcer is watching for something to say, and all you need is for the news feed to allow the single agent to pick up XYZ items and post a message. When that happens, you get an audio alert. In a working situation, there will likely be multiple agents, all taking their turn and presenting as required.


Manager

It is easy to see that once this system is running, a means of managing the records in the back-end database would be helpful. It might be necessary to repeat the last item; mark the last 10 items not read so that on the next announce, they would be repeated; fast-forward to the most recent item; read all items from the beginning; and so on. Although these management functions are beyond the scope of the current article, you can see that the structure of the back-end database would permit the reordering and review of records using the appropriate SQL operations.


Issues

The nature of spoken audio on computer systems is such that there are many traps to catch the unwary. Among these traps are:

  • Odd text characters and acronyms. The purpose of the speakize() function is to scan the text that comes in and translate it to a readable form. As noted earlier, does "PM" mean afternoon and evening or Prime Minister? Because one specific feed is involved, you can customize the agent to treat abbreviations in a manner that takes the context into account. Speakize is one of those functions you will be tweaking as requirements change.
  • What if the system is busy using TTS to enunciate a long set of sentences and an urgent system sound file is popped into the queue? Because the exec function stops the execution of that particular script, the system may be unaware of the new message. One possibility is to have two announcers, one focusing on verbal Festival operations and the other on system sounds. This solution would call for multiple sound output devices (on many systems, programs such as Festival and MPlayer can direct the output to a specific device) or coordinated input to one device. Humans have no problems listening to speech and processing sounds simultaneously (think music and chords). However, two simultaneous TTS channels could be a problem.
  • What if you like to listen to heavy metal music while using the computer? Won't the incoming message be drowned out? With full control of volume levels handed to the system, the computer could automatically lower the din on one channel and allow the speech to come through as a pleasant relief.
  • Editors of feeds do not necessarily format their titles and descriptions so that they can be processed by TTS applications. The assumption is that they will be printed, and frequently these items will be derived from a printed page context, which makes both title and description unintelligible unless seen in the context of the full article. "He does it again!" Who did what? Choose your news sources carefully, and be prepared for surprises.

Conclusion

This article described how PHP, in combination with other open source technologies, can form a useful mashup to send informative messages to the audio system. It is only one of a number of ways of getting a person's attention. It is particularly useful where the user has a visual impairment or is using a system or process where consulting a screen is either inconvenient or impossible. Agents can often perform reading functions more quickly and reliably than human readers, although they have much greater difficulty placing items in a context.

These PHP agents really are like little bees, each with its specific job to do, collecting information, digesting and storing it in concentrated form in the honeycomb of the database, and then allowing other agents to spin out the honey as required. But if one day your computer buzzes in your ear, "I'm sorry Dave, I'm afraid I can't do that," it is time to be very worried.

Resources

Learn

Get products and technologies

  • Festival: Learn more about the Festival Speech Synthesis System from its creator, the University of Edinburgh.
  • MPlayer: Learn more about the MPlayer movie player.
  • Code examples from PHP Builder: Your resource for PHP code examples.
  • IBM product evaluation versions: Download these versions today and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=435503
ArticleTitle=PHP bees and audio honey: Accessible agent-based audio alerts and feedback
publish-date=10132009