Produce 60-second radio theatre with XML, PHP and Festival

Create minimalist audio art using computer-generated voices

Produce and record a 60-second theatre sound play using XML, PHP, and Festival, and provide stage directions, inject sound effects, and control dialogue flow, with a cast of dynamically allocated Festival voices.


Colin Beckingham, Writer and Researcher, Freelance

Colin Beckingham is a freelance researcher, writer, and programmer who lives in eastern Ontario, Canada. Holding degrees from Queen's University, Kingston, and the University of Windsor, he has worked in a rich variety of fields including banking, horticulture, horse racing, teaching, civil service, retail, and travel and tourism. The author of database applications and numerous newspaper, magazine, and online articles, his research interests include open source programming, VoIP, and voice-control applications on Linux. You can reach Colin at

15 June 2010

Also available in Chinese Japanese

What is 60-second theatre?

You have 60 seconds (approximately) to tell your story. Establish the basic situation, introduce a conflict, build the picture very quickly, and finally resolve the conflict in some way with all the ends tidied up. You can use recorded sounds for effects and, in this case, I'm using synthesized voices with the text-to-speech (TTS) engine Festival. The utterances will be flat and expressionless, and—sometimes—difficult to understand because the voices are not perfect.

Download the complete sample play as an audio .ogg file, then read on to learn how the play was generated. Shakespeare is not at all concerned about losing his preeminent status as the Bard, but in this context, it's not the play that's the thing, it's how it can be programmed. I generated this play entirely with open source and free software, and it uses the Festival TTS engine with some of the distinctive voices it provides.

Playing the play

You play the play with two files: an XML data file that contains the play itself and a producer script in PHP. The XML data contains the cast list, the title and credits, a list of files to use for effects, and each character's lines (the dialogue). The producer renders the play to an audio device according to the instructions in the XML data, which makes it easy to create a different play or edit the current one and play it with the same producer.

The basic structure of the play

The play has a number of acts and each act is divided into scenes in which each scene is a series of events such as noises, music, dialogue, and grand speeches. In a live theatre, you can see the curtain going up at the beginning of each act and down at the end, which allows the stage hands to change the scenery for the upcoming stage of the play. Acts help divide the play into sections and often indicate that time has passed or you have moved to a different location.

In an audio play, you don't have the dramatic visual effect when the curtain is raised. The drama has to come from a sound or what a person says.

You can provide a sound to indicate the curtain going up and down to provide a marker you can hear. In addition, you can announce the title of the play and the playwright. At the beginning of each act, someone might say, "Act One. On the steps of the Forum." At the end of the play, that person could roll the credits and provide any explanation that you feel the audience needs ("The real-life Joe Blow was given 70 years in jail . . . "). And, sometimes, you need a comment during the play: If you hear the sound of a slap and only two actors are within earshot, it's helpful to know who slapped whom.

Dramatis personae: the cast list

The cast can't be huge; otherwise, each character will have little to say in a minute or so. Festival offers a basic set of nine different voices—some male, some female, some older, some younger—so let's limit the cast to nine. Maybe the play has a role for Fred, and you want to have Fred represented by the Festival voice voice_don_diphone (for more on Festival voices, see Resources). You can declare Fred as:

<role voice="voice_rab_diphone">Fred</role>

Here, the role is Fred, and the voice is an attribute to the XML element role. Each time you mention Fred in the play, the producer knows what voice to use to speak Fred's lines. If you decide to use a different voice for Fred, you just change the attribute in one place.

Assigning the narrator

The narrator is a special and important character. This voice announces the name of the play and the writer, injects commentary, and lists the end credits. So this voice ties much of the play together. You could declare the narrator in the cast list as:

<role voice="voice_don_diphone">Narrator</role>

Now, each time the narrator says something, the producer uses a voice other than Fred's.

Sound effect and dialogue events

The play is a series of events. As each event is presented, the context either becomes clearer or more complex. Here's an example of two successive events: a sound effect immediately followed by dialogue spoken by a character:

<event type="effect" player="mplayer">gunshot.wav</event>
<event type="dialogue" player="Bozo">Freeze, turkey</event>

The first event has an attribute effect, which indicates that it's a sound and not to be rendered with the TTS engine. The second attribute says that you want Mplayer to play the sound, which is the file gunshot.wav.

The second event is dialogue spoken by the character Bozo, who says, "Freeze, turkey." There is only one TTS engine in this context so there's no attribute to specify which engine to use. The producer always uses the same one.

In this structure, events as XML elements only occur inside scenes; however, you can still have sounds and utterances outside the act and scene structure, such as the opening announcements by the narrator, end credits, and applause.

Acts and scenes

Because the play is a number of acts and each act is divided into scenes, your basic flow structure will be something like Listing 1.

Listing 1. Acts, scenes, and associated events
    <event type="dialogue" player="five">...</event>
    <event type="dialogue" player="nine">...</event>
    <event type="dialogue" player="five">...</event>
    <event type="dialogue" player="two">...</event>

Here, you have just one act with two scenes, and each scene has two spoken lines involving the characters five, nine, and two—each of which has a unique voice defined in the cast list, as explained above. You also have instructions for a curtain up and curtain down sound with the files to be played at the start and end of the act. In my example, I have borrowed some system sounds from the K Desktop Environment (KDE).

Dynamically allocating voices

The job of getting a character to say something through the loudspeakers or headphones is given to the producer. The producer PHP code contains the function shown in Listing 2.

Listing 2. Calling voices dynamically
function deliver($phrase,$voice) {
  exec('festival -b \'(begin ('.$voice.') 
         (SayText "'.$phrase.'"))\' >/dev/null',$out);

In this function, the arguments are the phrase (what is to be said) and the voice (which Festival voice will be used to render it). The exec function calls on Festival in batch mode to do two things: Set up the voice and enunciate the phrase using the specified voice. The begin instruction indicates to Festival that there are multiple things to be done.

The complete play

Listing 3 shows a possible, complete simple play data file in XML.

Listing 3. The complete play data in XML
<?xml version="1.0" encoding='UTF-8'?>
    <role voice="voice_don_diphone">muchi</role>
    <role voice="voice_kal_diphone">dad</role>
    <role voice="voice_rab_diphone">narra</role>
    <role voice="voice_nitech_us_awb_arctic_hts">mscot</role>
    <role voice="voice_nitech_us_bdl_arctic_hts">spare</role>
    <role voice="voice_nitech_us_clb_arctic_hts">matron</role>
    <role voice="voice_nitech_us_jmk_arctic_hts">fuzzy</role>
    <role voice="voice_nitech_us_rms_arctic_hts">uncle</role>
    <role voice="voice_nitech_us_slt_arctic_hts">filly</role>
    <theatre>Sixty second theatre with XML and Festival</theatre>
    <title>Todays play - The demonstration effect</title>
      <!-- event type="effect" player="mplayer">tmp.wav</event -->
      <event type="dialogue" player="dad">The doctor is taking a long time</event>
      <event type="dialogue" player="matron">Yes but it is worth the wait</event>
      <event type="dialogue" player="dad">Looks like you broke your arm</event>
      <event type="dialogue" player="dad">Did you have a bad fall</event>
      <event type="dialogue" player="matron">Yes one of those silly falls</event>
      <event type="dialogue" player="matron">Icy steps</event>
      <event type="dialogue" player="dad">Could happen to anybody</event>
      <!-- event type="effect" player="mplayer">tmp.wav</event -->
      <event type="dialogue" player="dad">It is really cold out there</event>
      <event type="dialogue" player="uncle">Yes the cold gives me chill blains</event>
      <event type="dialogue" player="dad">Hands or feet</event>
      <event type="dialogue" player="uncle">Both</event>
      <event type="dialogue" player="dad">That is bad luck</event>
      <!--event type="effect" player="mplayer">tmp.wav</event -->
      <event type="dialogue" player="dad">Thats a bad cough</event>
      <event type="dialogue" player="filly">Yes it hurts when I breathe</event>
      <event type="dialogue" player="dad">I am sorry to hear that</event>
      <event type="dialogue" player="filly">What is your ailment</event>
      <event type="dialogue" player="dad">Oh I am not actually sick</event>
      <event type="dialogue" player="dad">But I do not feel well unless I surround
            myself with people who are a lot worse off</event>
    <credits>Thanks to Festival, PHP, Audacity and XML</credits>

In this XML data file, the root element is <play>. In addition to the acts and scenes in the middle of the data file, the play begins with the declaration of the roles and their voices in the element <dramatisp>; an intro section in which the narrator or announcer gives the title of the play; and an end section in which you have music, the credits, and (perhaps) applause.

The producer

Apart from a few minor points, you now have all the parts in place to play the play. The producer script iterates over the introduction, acts, scenes, and end, playing the events in order using Mplayer or Festival, as required. Listing 4 shows the entire producer script, which is programmed to be run from the command line.

Listing 4. The producer script in PHP
// sixty second theatre player
echo "60 second theater player\n";
if ($argc < 2) die("No play specified\n");
$playxml = $argv[1];
$xml = simplexml_load_file($playxml);
// load players' voices
$roles = $xml->dramatisp->role;
foreach ($roles as $rolevoice) {
  $rolev["$rolevoice"] = $rolevoice['voice'];
$announcer = $rolev["narra"];
$timestart = time();
// now the introduction
deliver((string) $xml->intro->theatre,$announcer);
deliver((string) $xml->intro->title,$announcer);
// now the acts
$anum = 0;
$snum = 0;
foreach ($xml->act as $A) {
  deliver("Act $anum",$announcer);
  foreach ($A->scene as $s) {
    //deliver("Scene $snum",$announcer);
    $events = $s->event;
    foreach ($events as $e) {
      switch ($e['type']) {
            case "effect":
              $engine = $e['player'];
            case "dialogue":
              $plyr = $e['player'];
              // echo "Trying $e with $plyr\n";
              die("Invalid type");
  $snum = 0;
// end of the play
$timeend = time();
$length = $timeend - $timestart;
echo("Total length is $length seconds.\n");
// functions
function play_effect($effect,$engine) {
exec("$engine $effect",$out);
function deliver($phrase,$voice) {
  // echo "$phrase with $voice\n";
  exec('festival -b \'(begin ('.$voice.') 
            (SayText "'.$phrase.'"))\' >/dev/null',$out);

In this producer file, you first load the play (as detailed in the XML data file myplay.xml) into memory and declare it to be an XML object. Next, you look for the cast of players and load them into an array with the voices they are to use. Then, you select the voice to be used for the narrator or announcer, and note the beginning time so that you can get a measure of how long the play is when it has finished running. After the title of the play is announced, you immediately launch into a loop through the acts—however many the play might have—and, while inside each act, loop through the scenes, following the event instructions they contain.

First rehearsal

To play the play to the speakers, start the producer and provide the data file:

$ php producer.php myplay.xml

You can also record the play by having the producer pipe the output into a recorded file:

$ php producer.php myplay.xml | arecord (options) myplay.wav

You can then edit this file in an audio editor such as Audacity® (see Resources for a link) or manipulate it with an audio utility such as sox.

Tidying up

Although you can listen to the recording, the output from this procedure can be improved. For example:

  • It might be helpful to have certain dialogues fade in and out or to overlap sounds with dialogue. Although you can't achieve this directly with this producer as the play proceeds, you can do it post-production with a tool such as Audacity. But the more you improve the play in this way, the less minimalist it becomes.
  • A slow computer will take a second or two to start and stop the audio engines, leaving unnecessarily large silences between events in the output. These long silences are helpful in any postediting because they show very clearly the breaks between events, but, for the final version, shorten them to improve the pace of the final production. You can achieve this with Audacity using the Truncate silence effect. Note that using this non-discriminatory silence truncation method removes any screenplay beats (deliberate silences) you have inserted, so such beats are best inserted after this process.
  • Use whatever sound effects you like and a wide variety of voices, but don't let things get too complicated in such a short time period. It is helpful to use well-known ideas such as an ambulance siren or a gunshot, because these sounds carry a lot of context with them.


You might of course use a relational database to store a play and its events instead of the XML approach. However, in this context, XML is much clearer to the reader and editor, because it's in the form of a flat file. If you are unhappy with a dialogue line spoken, you can make a quick edit, re-run the producer, and there is your new final product.

60-second theatre with flat, emotionless, synthesized voices makes this a minimalist art form. Without the interpretation imposed by trained and experienced actors, the audience has a different kind of experience. The listener is obliged to add details such as phrasing and lilt mentally. But it is possible, using XML and Festival or another TTS engine, to produce a "listenable" production.


The 60-second playwell.ogg.zip1000KB



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into XML on developerWorks

Zone=XML, Open source
ArticleTitle=Produce 60-second radio theatre with XML, PHP and Festival