yacc rules

The rules section of the grammar file contains one or more grammar rules. Each rule describes a structure and gives it a name.

A grammar rule has the following form:
A : BODY;

where A is a nonterminal name, and BODY is a sequence of 0 or more names, literals, and semantic actions that can optionally be followed by precedence rules. Only the names and literals are required to form the grammar. Semantic actions and precedence rules are optional. The colon and the semicolon are required yacc punctuation.

Semantic actions allow you to associate actions to be performed each time that a rule is recognized in the input process. An action can be an arbitrary C statement, and as such, perform input or output, call subprograms, or alter external variables. Actions can also refer to the actions of the parser; for example, shift and reduce.

Precedence rules are defined by the %prec keyword and change the precedence level associated with a particular grammar rule. The reserved symbol %prec can appear immediately after the body of the grammar rule and can be followed by a token name or a literal. The construct causes the precedence of the grammar rule to become that of the token name or literal.

Repeating nonterminal names

If several grammar rules have the same nonterminal name, use the | (pipe symbol) to avoid rewriting the left side. In addition, use the ; (semicolon) only at the end of all rules joined by pipe symbols. For example, the following grammar rules:
A  :  B  C  D  ;
A  :  E  F  ;
A  :  G  ;
can be given to the yacc command by using the pipe symbol as follows:
A  :  B  C  D
   |  E  F
   |  G
   ;

Using recursion in a grammar file

Recursion is the process of using a function to define itself. In language definitions, these rules normally take the following form:
rule    :        EndCase
        |        rule EndCase

Therefore, the simplest case of the rule is the EndCase, but rule can also consist of more than one occurrence of EndCase. The entry in the second line that uses rule in the definition of rule is the recursion. The parser cycles through the input until the stream is reduced to the final EndCase.

When using recursion in a rule, always put the call to the name of the rule as the leftmost entry in the rule (as it is in the preceding example). If the call to the name of the rule occurs later in the line, such as in the following example, the parser may run out of internal stack space and stop.
rule    :       EndCase
        |       EndCase rule

The following example defines the line rule as one or more combinations of a string followed by a newline character (\n):

lines   :        line
        |        lines line
        ;

line    :        string '\n'
        ;

Empty string

To indicate a nonterminal symbol that matches the empty string, use a ; (semicolon) by itself in the body of the rule. To define a symbol empty that matches the empty string, use a rule similar to the following rule:
empty   :  ;
        | x;
OR
empty   :
        | x
        ;

End-of-input marker

When the lexical analyzer reaches the end of the input stream, it sends an end-of-input marker to the parser. This marker is a special token called endmarker, which has a token value of 0. When the parser receives an end-of-input marker, it checks to see that it has assigned all input to defined grammar rules and that the processed input forms a complete unit (as defined in the yacc grammar file). If the input is a complete unit, the parser stops. If the input is not a complete unit, the parser signals an error and stops.

The lexical analyzer must send the end-of-input marker at the appropriate time, such as the end of a file, or the end of a record.