 | Level: Advanced Sam Lantinga, lead programmer, Loki Entertainment Software Lauren MacDonell, technical writer, SkilledNursing.com
01 May 2000 In this installment we look at two useful tools in the arsenal of any Linux programmer: lex and yacc. These tools have enabled us to easily build the scripting language and GUI framework that we used in Pirates Ho!, our SDL-based Linux game.
In designing Pirates Ho!, we needed an easy way to describe the interface and
dialog options to the player. For the description we needed a simple, consistent, and flexible language, so we looked for tools that would help us build our scripting language. Who wants another slice of bison?
The word "yacc" struck fear into my heart in school. It evoked images of disheveled and pale students muttering about compilers and symbol tables. So I carefully managed to avoid taking a compiler class. But while playing around with our game, I plucked up my courage and tackled yacc in the hope that it would make the scripting a little easier. In the end, yacc not only made scripting easier, it made it fun as well.
Starting with the basics
Yacc is actually fairly straightforward to use. You give it a list of rules describing your grammar, and it
parses tokens and takes actions based on what it sees. For our scripting language we wanted to start out simple, and initially just specified numbers and their logical operations: eval.y
%{
/* This first section contains C code which will be included in the output
file.
*/
#include <stdlib.h>
#include <stdio.h>
/* Since we are using C++, we need to specify the prototypes for some
internal yacc functions so that they can be found at link time.
*/
extern int yylex(void);
extern void yyerror(char *msg);
%}
/* This is a union of the different types of values that a token can
take on. In our case we'll just handle "numbers", which are of
C int type.
*/
%union {
int number;
}
/* These are untyped tokens which are recognized as part of the grammar */
%token AND OR EQUALS
/* Here we are, any NUMBER token is stored in the number member of the
union above.
*/
%token NUMBER
/* These rules all return a numeric value */
%type
expression
%type
logical_expression and or equals
%%
/* Our language consists either of a single statement or of a list of statements.
Notice the recursivity of the rule, this allows us to have any
number of statements in a statement list.
*/
statement_list: statement | statement_list statement
;
/* A statement is simply an expression. When the parser sees an expression
we print out its value for debugging purposes. Later on we'll
have more than just expressions in our statements.
*/
statement: expression
{ printf("Expression = %d\n", $1); }
;
/* An expression can be a number or a logical expression. */
expression: NUMBER
| logical_expression
;
/* We have a few different types of logical expressions */
logical_expression: and
| or
| equals
;
/* When the parser sees two expressions surrounded by parenthesis and
connected by the AND token, it will actually perform a C logical
expression and store the result into $$, which is the "value" of
this statement.
*/
and: '(' expression AND expression ')'
{ if ( $2 && $4 ) { $$ = 1; } else { $$ = 0; } }
;
or: '(' expression OR expression ')'
{ if ( $2 || $4 ) { $$ = 1; } else { $$ = 0; } }
;
equals: '(' expression EQUALS expression ')'
{ if ( $2 == $4 ) { $$ = 1; } else { $$ = 0; } }
;
%%
/* This is a sample main() function that just parses standard input
using our yacc grammar. It allows us to feed sample scripts in
and see if they are parsed correctly.
*/
int main(int argc, char *argv[])
{
yyparse();
}
/* This is an error function used by yacc, and must be defined */-
void yyerror(char *message)
{
fprintf(stderr, "%s\n", message);
}
|
 |
How do we feed the beast?
Now that we had
a simple grammar that recognizes sequences of tokens, we needed to find a way to
feed tokens to the parser. Lex is a tool that takes input, turns it into tokens,
and passes them to yacc. Here we describe the expressions that lex turns into
tokens: eval.l
%{
/* Again, this is C code that is inserted into the beginning of the output */
#include
#include "y.tab.h" /* Include the token definitions generated by yacc */
%}
/* Prevent the need for linking with -lfl */
%option noyywrap
/* This next section is a set of regular expressions that describe input
tokens that are passed back to yacc. The tokens are defined in y.tab.h,
which is generated by yacc.
*/
%%
\/\/.* /* ignore comments */
-[0-9]+|[0-9]+ { yylval.number=atoi(yytext); return NUMBER; }
[ \t\n] /* ignore whitespace */
&& { return AND; }
\|\| { return OR; }
== { return EQUALS; }
. return yytext[0];
%%
|
Now that we had our parsing source in the current directory, we needed a
Makefile to build them: Makefile
all: eval
y.tab.c: eval.y
yacc -d $<
lex.yy.c: eval.l
lex $<
eval: y.tab.o lex.yy.o
$(CC) -o $@ $^
|
By default, yacc outputs to y.tab.c and lex outputs to lex.yy.c, which is why
we used those names as our source files. The Makefile contains rules to build
source from the parser description files. When we were done, we could type "make"
to build a parser that we could run and type scripts into to check our logic.
shift/reduce conflicts
shift/reduce conflicts occur when yacc must choose between either continuing to parse a set
of tokens, or resolving it into a rule. For example, if you create a
grammar that consists of:
expression: NUMBER | plus
;
plus: expression '+' expression
;
|
when the yacc parser sees a number, it doesn't know whether to
resolve it into an expression immediately, or to wait for "X + Y". In
this case, yacc warns of a shift/reduce conflict, and by default waits
for the full "X + Y" expression. You can see exactly what is going on
when using GNU bison by giving it the '-v' command line option, which
creates a *.output file containing the rules and conflicts that have arisen.
We can easily resolve the ambiguity by
requiring parentheses around the plus expression:
expression: NUMBER | plus
;
plus: '(' expression '+' expression ')'
;
|
|
Using lex and yacc with C++
There are a few
caveats when using lex and yacc with C++. Lex and yacc output to C files, so for
C++ we used the GNU equivalents, flex and bison. These tools allow you to
specify the names of the output files. We also added general rules to the
Makefile so GNU Make automatically built our C++ source files from the lex
and yacc sources. This required us to rename the lex and yacc sources to
"lex_eval.l" and "yacc_eval.y" respectively, so that Make generated
different C++ source files for each of them. We also needed to change the file
lex source includes for the yacc token definitions. The header file that bison
outputs is the name of the output file with an .h suffix, or in our case
"yacc_eval.cpp.h". Here is our new Makefile: Makefile
all: eval
%.cpp: %.y
bison -d -o $@ $<
%.cpp: %.l
flex -o$@ $<
yacc_eval.o: yacc_eval.cpp
lex_eval.o: lex_eval.cpp
eval: yacc_eval.o lex_eval.o
$(CXX) -o $@ $^
|
Parsing from strings
The default lex code
reads its input from standard input, but for our game we wanted to be able to
parse strings in memory. This is easily done with flex by redefining the macro
YY_INPUT
at the top of the lex source file:
extern int eval_getinput(char *buf, int maxlen);
#undef YY_INPUT
#define YY_INPUT(buf, retval, maxlen) (retval = eval_getinput(buf, maxlen))
|
We wrote the actual code
for eval_getinput() in a separate file and made it very flexible, so it could take input from a file
pointer or a string in memory. To use the actual code, we first set up the global data source
variable and then called the yacc function yyparse(), which calls our input
function and parses it.
Using multiple parsers
In our game we
wanted to have separate parsers for the scripting language and the GUI
descriptions, since they used different grammar rules. It is possible to do
this, but we had to play a couple of tricks with flex and bison. First, we needed
to change the prefix for the parser from "yy" to something unique in order to avoid name
clashes. We did this by using command line options to flex and bison,
-P for flex, and -p
for bison. Next, we had to change
the place in our code where we used the "yy" prefix to use our chosen prefix.
This included the references to
yylval
in our lex source, and the
definition of
yyerror()
since we placed it in a separate file for
the final game. The final Makefile looks like this: Makefile
all: eval
YY_PREFIX = eval_
%.cpp: %.y
bison -p$(YY_PREFIX) -d -o $@ $<
%.cpp: %.l
flex -P$(YY_PREFIX) -o$@ $<
yacc_eval.o: yacc_eval.cpp
lex_eval.o: lex_eval.cpp
eval: yacc_eval.o lex_eval.o
$(CXX) -o $@ $^
|
Our scripting language
We started with the code shown above (available for download in the Resources) and went
on to add support for functions, variables, and simple flow control, until we ended up with a fairly complete interpreted language for our game. Here is a sample of one possible script: example.txt
function whitewash
{
if ( $1 == "Blackbeard" ) {
print("Pouring whitewash on Blackbeard!")
if ( $rum >= 3 ) {
print("Blackbeard doesn't care anymore...")
mood = "happy"
} else {
print($1, "says Grr....")
mood = "angry"
print("Have some more rum?")
++rum
}
}
}
pirate = "Blackbeard"
rum = 0
mood = "angry"
print($pirate, "is walking by...")
while ( $mood == "angry" ) {
whitewash($pirate)
}
return "there was much rejoicing"
|
Building a GUI with yacc
We built our GUI as a set of widgets that inherit their attributes from base classes. This maps very nicely to the way yacc parses its input. We defined a set of rules that correspond to the attributes of the base classes, and then defined the rules for the widgets as specific to each one, which we also did for the base classes' rules. When the parser matches a rule for a widget attribute, we can safely cast the widget pointer to the appropriate class and set the desired attribute. Here is a simple example for the button widget: yacc_gui.y
%{
#include <stdlib.h>
#include <stdio.h>
#include "widget.h"
#include "widget_button.h"
#define PARSE_DEBUG(X) (printf X)
#define MAX_WIDGET_DEPTH 32
static int widget_depth = -1;
static Widget *widget_stack[MAX_WIDGET_DEPTH];
static Widget *widget;
static void StartWidget(Widget *the_widget)
{
widget_stack[widget_depth++] = widget = the_widget;
}
static void FinishWidget(void)
{
Widget *child;
--widget_depth;
if ( widget_depth >= 0 ) {
child = widget;
widget = widget_stack[widget_depth];
widget->AddChild(child);
}
}
%}
[tokens and types skipped for brevity]
%%
widget: button
{ FinishWidget();
PARSE_DEBUG(("Completed widget\n")); }
;
widget_attribute:
widget_area
;
/* Widget area: x, y, width, height */
widget_area:
AREA '{' number ',' number ',' number ',' number '}'
{ widget->SetArea($3, $5, $7, $9);
PARSE_DEBUG(("Area: %dx%d at (%d,%d)\n", $7, $9, $3, $5)); }
;
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
/* The button widget */
button:
button_tag '{' button_attributes '}'
{ PARSE_DEBUG(("Completed button\n")); }
;
button_tag:
BUTTON name
{ StartWidget(new WidgetButton($2));
PARSE_DEBUG(("Starting a button: %s\n", $2));
free($2); }
;
/* The button widget attributes */
button_attributes:
button_attribute
| button_attributes button_attribute
;
button_attribute:
widget
| widget_attribute
| button_normal_image
| button_hover_image
;
button_normal_image:
IMAGE file
{ ((WidgetButton *)widget)->LoadNormalImage($2);
PARSE_DEBUG(("Button normal image: %s\n", $2));
free($2); }
;
button_hover_image:
HOVERIMAGE file
{ ((WidgetButton *)widget)->LoadHoverImage($2);
PARSE_DEBUG(("Button hover image: %s\n", $2));
free($2); }
;
|
 |
An example GUI
Here is our main menu to serve as an example of the
kind of GUI you can build with this technique: main_menu.gui
background "main_menu" {
image "main_menu"
button "new_game" {
area { 32, 80, 370, 64 }
image "main_menu-new"
hover_image "main_menu-new_hi"
#onclick [ new_gui("new_game") ]
onclick [ new_gui("character_screen") ]
}
button "load_game" {
area { 32, 152, 370, 64 }
image "main_menu-load"
hover_image "main_menu-load_hi"
onclick [ new_gui("load_game") ]
}
button "save_game" {
area { 32, 224, 370, 64 }
image "main_menu-save"
hover_image "main_menu-save_hi"
onclick [ new_gui("save_game") ]
}
button "preferences" {
area { 32, 296, 370, 64 }
image "main_menu-prefs"
hover_image "main_menu-prefs_hi"
onclick [ new_gui("preferences") ]
}
button "quit_game" {
area { 32, 472, 370, 64 }
image "main_menu-quit"
hover_image "main_menu-quit_hi"
onclick [ quit_game() ]
}
}
|
In this screen description, the widgets and
attributes are parsed by the GUI parser, and the button callbacks are
interpreted by the script parser. new_gui() and quit_game() are internal
functions exported to the script mechanism. Main menu

Conclusion
Lex and yacc were
invaluable tools in the design of our scripting and GUI design languages. They
seemed daunting at first, but after playing with them a little, they became
comfortable and easy to use. Join us next month when we start to put it all
together and give you a glimpse into the world of Pirates Ho!
Resources
About the authors  | |  | Sam Lantinga is the author of the Simple DirectMedia Layer library, and is currently employed as lead programmer at Loki Entertainment Software, a company dedicated to bringing best-selling games to Linux. His involvement with Linux and games began in 1995 with various DOOM! tool ports and the port of the Macintosh game Maelstrom to Linux. |
 | |  | Lauren MacDonell is a technical writer with SkilledNursing.com and co-developer of "Pirates Ho!". When she isn't working, writing, or belly dancing, she takes care of her tropical fish. |
Rate this page
|  |