Using the yacc grammar file

A yacc grammar file consists of the following sections:

  • Declarations
  • Rules
  • Programs
Two adjacent %% (percent sign, percent sign) separate each section of the grammar file. To make the file easier to read, put the %% on a line by themselves. A complete grammar file looks like the following:
declarations
%%
rules
%%
programs
The declarations section may be empty. If you omit the programs section, omit the second set of %%. Therefore, the smallest yacc grammar file is as follows:
%%
rules

The yacc command ignores blanks, tabs, and new line characters in the grammar file. Therefore, use these characters to make the grammar file easier to read. Do not, however, use blanks, tabs or new line characters in names or reserved symbols.

Using comments

To explain what the program is doing, put comments in the grammar file. You can put comments anywhere in the grammar file that you can put a name. However, to make the file easier to read, put the comments on lines by themselves at the beginning of functional blocks of rules. A comment in a yacc grammar file looks the same as a comment in a C language program. The comment is enclosed between /* (backslash, asterisk) and */ (asterisk, backslash). For example:

/* This is a comment on a line by itself. */

Using literal strings

A literal string is one or more characters enclosed in '' (single quotes). As in the C language, the \ (backslash) is an escape character within literals, and all the C language escape codes are recognized. Thus, the yacc command accepts the symbols in the following table:

Symbol Definition
'\a' Alert
'\b' Backspace
'\f' Form-feed
'\n' New line
'\r' Return
'\t' Tab
'\v' Vertical tab
'\'' Single quote (')
'\"' Double quote (")
'\?' Question mark (?)
'\\' Backslash (\)
'\Digits' The character whose encoding is represented by the one-, two-, or three-digit octal integer specified by the Digits string.
'\xDigits' The character whose encoding is represented by the sequence of hexadecimal characters specified by the Digits string.

Because its ASCII code is zero, the null character (\0 or 0) must not be used in grammar rules. The yylex subroutine returns 0 if the null character is used, signifying end of input.

Formatting the grammar file

To help make the yacc grammar file more readable, use the following guidelines:
  • Use uppercase letters for token names, and use lowercase letters for nonterminal symbol names.
  • Put grammar rules and actions on separate lines to allow changing either one without changing the other.
  • Put all rules with the same left side together. Enter the left side once, and use the vertical bar to begin the rest of the rules for that left side.
  • For each set of rules with the same left side, enter the semicolon once on a line by itself following the last rule for that left side. You can then add new rules easily.
  • Indent rule bodies by two tab stops and action bodies by three tab stops.

Errors in the grammar file

The yacc command cannot produce a parser for all sets of grammar specifications. If the grammar rules contradict themselves or require matching techniques that are different from what the yacc command provides, the yacc command will not produce a parser. In most cases, the yacc command provides messages to indicate the errors. To correct these errors, redesign the rules in the grammar file, or provide a lexical analyzer (input program to the parser) to recognize the patterns that the yacc command cannot.