Skip to main content

skip to main content

developerWorks  >  Linux  >

Using SDL, Part 4: lex and yacc

Building parsers for scripts and GUI design

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Advanced

Sam Lantinga, lead programmer, Loki Entertainment Software
Lauren MacDonell, technical writer, SkilledNursing.com

01 May 2000

In this installment we look at two useful tools in the arsenal of any Linux programmer: lex and yacc. These tools have enabled us to easily build the scripting language and GUI framework that we used in Pirates Ho!, our SDL-based Linux game.

In designing Pirates Ho!, we needed an easy way to describe the interface and dialog options to the player. For the description we needed a simple, consistent, and flexible language, so we looked for tools that would help us build our scripting language.

Who wants another slice of bison?

The word "yacc" struck fear into my heart in school. It evoked images of disheveled and pale students muttering about compilers and symbol tables. So I carefully managed to avoid taking a compiler class. But while playing around with our game, I plucked up my courage and tackled yacc in the hope that it would make the scripting a little easier. In the end, yacc not only made scripting easier, it made it fun as well.



Back to top


Starting with the basics

Yacc is actually fairly straightforward to use. You give it a list of rules describing your grammar, and it parses tokens and takes actions based on what it sees. For our scripting language we wanted to start out simple, and initially just specified numbers and their logical operations:


eval.y

%{
/* This first section contains C code which will be included in the output
   file.
*/
#include <stdlib.h>
#include <stdio.h>
/* Since we are using C++, we need to specify the prototypes for some 
   internal yacc functions so that they can be found at link time.
*/
extern int yylex(void);
extern void yyerror(char *msg);
%}

/* This is a union of the different types of values that a token can
   take on.  In our case we'll just handle "numbers", which are of
   C int type.
*/
%union {
	int number;
}

/* These are untyped tokens which are recognized as part of the grammar */
%token AND OR EQUALS

/* Here we are, any NUMBER token is stored in the number member of the
   union above.
*/
%token  NUMBER

/* These rules all return a numeric value */
%type 
 expression
%type 
 logical_expression and or equals
%%

/* Our language consists either of a single statement or of a list of statements.
   Notice the recursivity of the rule, this allows us to have any
   number of statements in a statement list.
*/
statement_list: statement | statement_list statement
	;
	
/* A statement is simply an expression.  When the parser sees an expression
   we print out its value for debugging purposes.  Later on we'll
   have more than just expressions in our statements.
*/
statement: expression
	{ printf("Expression = %d\n", $1); }
	;
	
/* An expression can be a number or a logical expression. */
expression: NUMBER
	|   logical_expression
	;
	
/* We have a few different types of logical expressions */
logical_expression: and
	|           or
	|           equals
	;
	
/* When the parser sees two expressions surrounded by parenthesis and
   connected by the AND token, it will actually perform a C logical
   expression and store the result into $$, which is the "value" of
   this statement.
*/
and: '(' expression AND expression ')'
	{ if ( $2 && $4 ) { $$ = 1; } else { $$ = 0; } }
	;
	
or: '(' expression OR expression ')'
	{ if ( $2 || $4 ) { $$ = 1; } else { $$ = 0; } }
	;
	
equals: '(' expression EQUALS expression ')'
	{ if ( $2 == $4 ) { $$ = 1; } else { $$ = 0; } }
	;
	
%%

/* This is a sample main() function that just parses standard input
   using our yacc grammar.  It allows us to feed sample scripts in
   and see if they are parsed correctly.
*/
int main(int argc, char *argv[])
{
	yyparse();
	
}
/* This is an error function used by yacc, and must be defined */-
void yyerror(char *message)
{
	fprintf(stderr, "%s\n", message);
}



Back to top


How do we feed the beast?

Now that we had a simple grammar that recognizes sequences of tokens, we needed to find a way to feed tokens to the parser. Lex is a tool that takes input, turns it into tokens, and passes them to yacc. Here we describe the expressions that lex turns into tokens:


eval.l

%{
/* Again, this is C code that is inserted into the beginning of the output */
#include 

#include "y.tab.h"  /* Include the token definitions generated by yacc */
%}

/* Prevent the need for linking with -lfl */
%option noyywrap

/* This next section is a set of regular expressions that describe input
   tokens that are passed back to yacc.  The tokens are defined in y.tab.h,
   which is generated by yacc.
 */
%%
\/\/.*		/* ignore comments */
-[0-9]+|[0-9]+	{ yylval.number=atoi(yytext); return NUMBER; }
[ \t\n]		/* ignore whitespace */
&&		{ return AND; }
\|\|		{ return OR; }
==		{ return EQUALS; }
.		return yytext[0];
%%

Now that we had our parsing source in the current directory, we needed a Makefile to build them:


Makefile

all: eval 

y.tab.c: eval.y
	yacc -d $<
	
lex.yy.c: eval.l
	lex $<
	
eval: y.tab.o lex.yy.o
	$(CC) -o $@ $^

By default, yacc outputs to y.tab.c and lex outputs to lex.yy.c, which is why we used those names as our source files. The Makefile contains rules to build source from the parser description files. When we were done, we could type "make" to build a parser that we could run and type scripts into to check our logic.

shift/reduce conflicts
shift/reduce conflicts occur when yacc must choose between either continuing to parse a set of tokens, or resolving it into a rule. For example, if you create a grammar that consists of:
expression: NUMBER | plus
	;
plus: expression '+' expression
	;

when the yacc parser sees a number, it doesn't know whether to resolve it into an expression immediately, or to wait for "X + Y". In this case, yacc warns of a shift/reduce conflict, and by default waits for the full "X + Y" expression. You can see exactly what is going on when using GNU bison by giving it the '-v' command line option, which creates a *.output file containing the rules and conflicts that have arisen.

We can easily resolve the ambiguity by requiring parentheses around the plus expression:

expression: NUMBER | plus
	;
plus: '(' expression '+' expression ')'
	;



Back to top


Using lex and yacc with C++

There are a few caveats when using lex and yacc with C++. Lex and yacc output to C files, so for C++ we used the GNU equivalents, flex and bison. These tools allow you to specify the names of the output files. We also added general rules to the Makefile so GNU Make automatically built our C++ source files from the lex and yacc sources. This required us to rename the lex and yacc sources to "lex_eval.l" and "yacc_eval.y" respectively, so that Make generated different C++ source files for each of them. We also needed to change the file lex source includes for the yacc token definitions. The header file that bison outputs is the name of the output file with an .h suffix, or in our case "yacc_eval.cpp.h". Here is our new Makefile:


Makefile

all: eval 

%.cpp: %.y
	bison -d -o $@ $<
	
%.cpp: %.l
	flex -o$@ $<
	
yacc_eval.o: yacc_eval.cpp
lex_eval.o: lex_eval.cpp

eval: yacc_eval.o lex_eval.o
	$(CXX) -o $@ $^



Back to top


Parsing from strings

The default lex code reads its input from standard input, but for our game we wanted to be able to parse strings in memory. This is easily done with flex by redefining the macro YY_INPUT at the top of the lex source file:

extern int eval_getinput(char *buf, int maxlen);
#undef YY_INPUT
#define YY_INPUT(buf, retval, maxlen)	(retval = eval_getinput(buf, maxlen))

We wrote the actual code for eval_getinput() in a separate file and made it very flexible, so it could take input from a file pointer or a string in memory. To use the actual code, we first set up the global data source variable and then called the yacc function yyparse(), which calls our input function and parses it.



Back to top


Using multiple parsers

In our game we wanted to have separate parsers for the scripting language and the GUI descriptions, since they used different grammar rules. It is possible to do this, but we had to play a couple of tricks with flex and bison. First, we needed to change the prefix for the parser from "yy" to something unique in order to avoid name clashes. We did this by using command line options to flex and bison, -P for flex, and -p for bison. Next, we had to change the place in our code where we used the "yy" prefix to use our chosen prefix. This included the references to yylval in our lex source, and the definition of yyerror() since we placed it in a separate file for the final game. The final Makefile looks like this:


Makefile

all: eval 

YY_PREFIX = eval_

%.cpp: %.y
	bison -p$(YY_PREFIX) -d -o $@ $<
	
%.cpp: %.l
	flex -P$(YY_PREFIX) -o$@ $<
	
yacc_eval.o: yacc_eval.cpp
lex_eval.o: lex_eval.cpp

eval: yacc_eval.o lex_eval.o
	$(CXX) -o $@ $^



Back to top


Our scripting language

We started with the code shown above (available for download in the Resources) and went on to add support for functions, variables, and simple flow control, until we ended up with a fairly complete interpreted language for our game. Here is a sample of one possible script:


example.txt

function whitewash
{
        if ( $1 == "Blackbeard" ) {
                print("Pouring whitewash on Blackbeard!")
                if ( $rum >= 3 ) {
                        print("Blackbeard doesn't care anymore...")
                        mood = "happy"
                } else {
                        print($1, "says Grr....")
                        mood = "angry"
                        print("Have some more rum?")
                        ++rum
                }
        }
}

pirate = "Blackbeard"
rum = 0
mood = "angry"
print($pirate, "is walking by...")
while ( $mood == "angry" ) {
        whitewash($pirate)
}
return "there was much rejoicing"



Back to top


Building a GUI with yacc

We built our GUI as a set of widgets that inherit their attributes from base classes. This maps very nicely to the way yacc parses its input. We defined a set of rules that correspond to the attributes of the base classes, and then defined the rules for the widgets as specific to each one, which we also did for the base classes' rules. When the parser matches a rule for a widget attribute, we can safely cast the widget pointer to the appropriate class and set the desired attribute. Here is a simple example for the button widget:


yacc_gui.y

%{
#include <stdlib.h>
#include <stdio.h>

#include "widget.h"
#include "widget_button.h"

#define PARSE_DEBUG(X)	(printf X)

#define MAX_WIDGET_DEPTH 32

static int widget_depth = -1;
static Widget *widget_stack[MAX_WIDGET_DEPTH];
static Widget *widget;

static void StartWidget(Widget *the_widget)
{
	widget_stack[widget_depth++] = widget = the_widget;
}
static void FinishWidget(void)
{
	Widget *child;
	
	--widget_depth;
	if ( widget_depth >= 0 ) {
		child = widget;
		widget = widget_stack[widget_depth];
		widget->AddChild(child);
	}
}

%}

[tokens and types skipped for brevity]

%%

widget: button
	{ FinishWidget();
	  PARSE_DEBUG(("Completed widget\n")); }
	;
	
widget_attribute:
	widget_area
	;
	
/* Widget area: x, y, width, height */
widget_area:
	AREA '{' number ',' number ',' number ',' number '}'
	{ widget->SetArea($3, $5, $7, $9);
	  PARSE_DEBUG(("Area: %dx%d at (%d,%d)\n", $7, $9, $3, $5)); }
	;
	
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
/* The button widget */

button:
	button_tag '{' button_attributes '}'
	{ PARSE_DEBUG(("Completed button\n")); }
	;
	
button_tag:
	BUTTON name
	{ StartWidget(new WidgetButton($2));
	  PARSE_DEBUG(("Starting a button: %s\n", $2));
	  free($2); }
	;
	
/* The button widget attributes */
button_attributes:
	button_attribute
|	button_attributes button_attribute
	;
	
button_attribute:
	widget
|	widget_attribute
|	button_normal_image
|	button_hover_image
	;
	
button_normal_image:
	IMAGE file
	{ ((WidgetButton *)widget)->LoadNormalImage($2);
	  PARSE_DEBUG(("Button normal image: %s\n", $2));
	  free($2); }
	;
	
button_hover_image:
	HOVERIMAGE file
	{ ((WidgetButton *)widget)->LoadHoverImage($2);
	  PARSE_DEBUG(("Button hover image: %s\n", $2));
	  free($2); }
	;



Back to top


An example GUI

Here is our main menu to serve as an example of the kind of GUI you can build with this technique:


main_menu.gui

background "main_menu" {
	image "main_menu"
	button "new_game" {
		area { 32, 80, 370, 64 }
		image "main_menu-new"
		hover_image "main_menu-new_hi"
		#onclick [ new_gui("new_game") ]
		onclick [ new_gui("character_screen") ]
	}
	button "load_game" {
		area { 32, 152, 370, 64 }
		image "main_menu-load"
		hover_image "main_menu-load_hi"
		onclick [ new_gui("load_game") ]
	}
	button "save_game" {
		area { 32, 224, 370, 64 }
		image "main_menu-save"
		hover_image "main_menu-save_hi"
		onclick [ new_gui("save_game") ]
	}
	button "preferences" {
		area { 32, 296, 370, 64 }
		image "main_menu-prefs"
		hover_image "main_menu-prefs_hi"
		onclick [ new_gui("preferences") ]
	}
	button "quit_game" {
		area { 32, 472, 370, 64 }
		image "main_menu-quit"
		hover_image "main_menu-quit_hi"
		onclick [ quit_game() ]
	}
}

In this screen description, the widgets and attributes are parsed by the GUI parser, and the button callbacks are interpreted by the script parser. new_gui() and quit_game() are internal functions exported to the script mechanism.


Main menu
main menu screenshot


Back to top


Conclusion

Lex and yacc were invaluable tools in the design of our scripting and GUI design languages. They seemed daunting at first, but after playing with them a little, they became comfortable and easy to use. Join us next month when we start to put it all together and give you a glimpse into the world of Pirates Ho!



Resources



About the authors

Sam Lantinga is the author of the Simple DirectMedia Layer library, and is currently employed as lead programmer at Loki Entertainment Software, a company dedicated to bringing best-selling games to Linux. His involvement with Linux and games began in 1995 with various DOOM! tool ports and the port of the Macintosh game Maelstrom to Linux.


Lauren MacDonell is a technical writer with SkilledNursing.com and co-developer of "Pirates Ho!". When she isn't working, writing, or belly dancing, she takes care of her tropical fish.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top