The yacc grammar file

To use the yacc command to generate a parser, provide it with a grammar file that describes the input data stream and what the parser is to do with the data.

The grammar file includes rules describing the input structure, code to be invoked when these rules are recognized, and a subroutine to do the basic input.

The yacc command uses the information in the grammar file to generate a parser that controls the input process. This parser calls an input subroutine (the lexical analyzer) to pick up the basic items (called tokens) from the input stream. A token is a symbol or name that tells the parser which pattern is being sent to it by the input subroutine. A nonterminal symbol is the structure that the parser recognizes. The parser organizes these tokens according to the structure rules in the grammar file. The structure rules are called grammar rules. When the parser recognizes one of these rules, it executes the user code supplied for that rule. The user code is called an action. Actions return values and use the values returned by other actions.

Use the C programming language to write the action code and other subroutines. The yacc command uses many of the C language syntax conventions for the grammar file.

main and yyerror subroutines

You must provide the main and yyerror subroutines for the parser. To ease the initial effort of using the yacc command, the yacc library contains simple versions of the main and yyerror subroutines. Include these subroutines by using the -ly argument to the ld command (or to the cc command). The source code for the main library program is as follows:
#include <locale.h>
main()
{
     setlocale(LC_ALL, "");
     yyparse();
}
The source code for the yyerror library program is as follows:
#include <stdio.h>
yyerror(s)
        char *s;
{
        fprintf( stderr, "%s\n" ,s);
}

The argument to the yyerror subroutine is a string containing an error message, usually the string syntax error.

Because these programs are limited, provide more function in these subroutines. For example, keep track of the input line number and print it along with the message when a syntax error is detected. You may also want to use the value in the external integer variable yychar. This variable contains the look-ahead token number at the time the error was detected.

yylex Subroutine

The input subroutine that you supply to the grammar file must be able to do the following:

  • Read the input stream.
  • Recognize basic patterns in the input stream.
  • Pass the patterns to the parser, along with tokens that define the pattern to the parser.

For example, the input subroutine separates an input stream into the tokens of WORD, NUMBER, and PUNCTUATION, and it receives the following input:

I have 9 turkeys.

The program could choose to pass the following strings and tokens to the parser:

String Token
I WORD
have WORD
9 NUMBER
turkeys WORD
. PUNCTUATION

The parser must contain definitions for the tokens passed to it by the input subroutine. Using the -d option for the yacc command, it generates a list of tokens in a file called y.tab.h. This list is a set of #define statements that allow the lexical analyzer (yylex) to use the same tokens as the parser.

Note: To avoid conflict with the parser, do not use subroutine names that begin with the letters yy.

You can use the lex command to generate the input subroutine, or you can write the routine in the C language.