yacc grammar file declarations

The declarations section of the yacc grammar file contains the following:

  • Declarations for any variables or constants used in other parts of the grammar file
  • #include statements to use other files as part of this file (used for library header files)
  • Statements that define processing conditions for the generated parser

You can keep semantic information associated with the tokens that are currently on the parse stack in a user-defined C language union, if the members of the union are associated with the various names in the grammar file.

A declaration for a variable or constant uses the following syntax of the C programming language:
TypeSpecifier Declarator ;

TypeSpecifier is a data type keyword and Declarator is the name of the variable or constant. Names can be any length and consist of letters, dots, underscores, and digits. A name cannot begin with a digit. Uppercase and lowercase letters are distinct.

Terminal (or token) names can be declared using the %token declaration, and nonterminal names can be declared using the %type declaration. The %type declaration is not required for nonterminal names. Nonterminal names are defined automatically if they appear on the left side of at least one rule. Without declaring a name in the declarations section, you can use that name only as a nonterminal symbol. The #include statements are identical to C language syntax and perform the same function.

The yacc program has a set of keywords that define processing conditions for the generated parser. Each of the keywords begin with a % (percent sign), which is followed by a token or nonterminal name. These keywords are as follows:

Keyword Description
%left Identifies tokens that are left-associative with other tokens.

%nonassoc Identifies tokens that are not associative with other tokens.

%right Identifies tokens that are right-associative with other tokens.

%start Identifies a nonterminal name for the start symbol.

%token Identifies the token names that the yacc command accepts. Declares all token names in the declarations section.

%type Identifies the type of nonterminals. Type-checking is performed when this construct is present.

%union Identifies the yacc value stack as the union of the various type of values desired. By default, the values returned are integers. The effect of this construct is to provide the declaration of YYSTYPE directly from the input.

%{
Code
%}
Copies the specified Code into the code file. This construct can be used to add C language declarations and definitions to the declarations section.
Note: The %{ (percent sign, left bracket) and %} (percent sign, right bracket) symbols must appear on lines by themselves.
The %token, %left, %right, and %nonassoc keywords optionally support the name of a C union member (as defined by %union) called a <Tag> (literal angle brackets surrounding a union member name). The %type keyword requires a <Tag>. The use of <Tag> specifies that the tokens named on the line are to be of the same C type as the union member referenced by <Tag>. For example, the following declaration declares the Name parameter to be a token:
%token [<Tag>] Name [Number] [Name [Number]]...

If <Tag> is present, the C type for all tokens on this line are declared to be of the type referenced by <Tag>. If a positive integer, Number, follows the Name parameter, that value is assigned to the token.

All of the tokens on the same line have the same precedence level and associativity. The lines appear in the file in order of increasing precedence or binding strength. For example, the following describes the precedence and associativity of the four arithmetic operators:
%left '+' '-'
%left '*' '/'

The + (plus sign) and - (minus sign) are left associative and have lower precedence than * (asterisk) and / (slash), which are also left associative.

Defining global variables

To define variables to be used by some or all actions, as well as by the lexical analyzer, enclose the declarations for those variables between %{ (percent sign, left bracket) and %} (percent sign, right bracket) symbols. Declarations enclosed in these symbols are called global variables. For example, to make the var variable available to all parts of the complete program, use the following entry in the declarations section of the grammar file:

%{
int var = 0;
%}

Start conditions

The parser recognizes a special symbol called the start symbol. The start symbol is the name of the rule in the rules section of the grammar file that describes the most general structure of the language to be parsed. Because it is the most general structure, the parser starts in its top-down analysis of the input stream at this point. Declare the start symbol in the declarations section using the %start keyword. If you do not declare the name of the start symbol, the parser uses the name of the first grammar rule in the grammar file.

For example, when parsing a C language function, the most general structure for the parser to recognize is as follows:
main()
{
        code_segment
}

The start symbol points to the rule that describes this structure. All remaining rules in the file describe ways to identify lower-level structures within the function.

Token numbers

Token numbers are nonnegative integers that represent the names of tokens. If the lexical analyzer passes the token number to the parser, instead of the actual token name, both programs must agree on the numbers assigned to the tokens.

You can assign numbers to the tokens used in the yacc grammar file. If you do not assign numbers to the tokens, the yacc grammar file assigns numbers using the following rules:
  • A literal character is the numerical value of the character in the ASCII character set.
  • Other names are assigned token numbers starting at 257.
    Note: Do not assign a token number of 0. This number is assigned to the endmarker token. You cannot redefine it.

To assign a number to a token (including literals) in the declarations section of the grammar file, put a positive integer (not 0) immediately following the token name in the %token line. This integer is the token number of the name or literal. Each token number must be unique. All lexical analyzers used with the yacc command must return a 0 or a negative value for a token when they reach the end of their input.