yacc grammar file declarations
The declarations section of the yacc grammar file contains the following:
- Declarations for any variables or constants used in other parts of the grammar file
- #include statements to use other files as part of this file (used for library header files)
- Statements that define processing conditions for the generated parser
You can keep semantic information associated with the tokens that are currently on the parse stack in a user-defined C language union, if the members of the union are associated with the various names in the grammar file.
TypeSpecifier Declarator ;
TypeSpecifier is a data type keyword and Declarator is the name of the variable or constant. Names can be any length and consist of letters, dots, underscores, and digits. A name cannot begin with a digit. Uppercase and lowercase letters are distinct.
Terminal (or token) names can be declared using the %token declaration, and nonterminal names can be declared using the %type declaration. The %type declaration is not required for nonterminal names. Nonterminal names are defined automatically if they appear on the left side of at least one rule. Without declaring a name in the declarations section, you can use that name only as a nonterminal symbol. The #include statements are identical to C language syntax and perform the same function.
The yacc program has a set of keywords that define processing conditions for the generated parser. Each of the keywords begin with a % (percent sign), which is followed by a token or nonterminal name. These keywords are as follows:
Keyword | Description |
---|---|
%left | Identifies tokens that are left-associative with other tokens. |
%nonassoc | Identifies tokens that are not associative with other tokens. |
%right | Identifies tokens that are right-associative with other tokens. |
%start | Identifies a nonterminal name for the start symbol. |
%token | Identifies the token names that the yacc command accepts. Declares all token names in the declarations section. |
%type | Identifies the type of nonterminals. Type-checking is performed when this construct is present. |
%union | Identifies the yacc value stack as the union of the various type of values desired. By default, the values returned are integers. The effect of this construct is to provide the declaration of YYSTYPE directly from the input. |
|
Copies the specified Code into the code
file. This construct can be used to add C language declarations and
definitions to the declarations section. Note: The %{ (percent
sign, left bracket) and %} (percent sign, right bracket)
symbols must appear on lines by themselves.
|
%token [<Tag>] Name [Number] [Name [Number]]...
If <Tag> is present, the C type for all tokens on this line are declared to be of the type referenced by <Tag>. If a positive integer, Number, follows the Name parameter, that value is assigned to the token.
%left '+' '-'
%left '*' '/'
The + (plus sign) and - (minus sign) are left associative and have lower precedence than * (asterisk) and / (slash), which are also left associative.
Defining global variables
To define variables to be used by some or all actions, as well as by the lexical analyzer, enclose the declarations for those variables between %{ (percent sign, left bracket) and %} (percent sign, right bracket) symbols. Declarations enclosed in these symbols are called global variables. For example, to make the var variable available to all parts of the complete program, use the following entry in the declarations section of the grammar file:
%{
int var = 0;
%}
Start conditions
The parser recognizes a special symbol called the start symbol. The start symbol is the name of the rule in the rules section of the grammar file that describes the most general structure of the language to be parsed. Because it is the most general structure, the parser starts in its top-down analysis of the input stream at this point. Declare the start symbol in the declarations section using the %start keyword. If you do not declare the name of the start symbol, the parser uses the name of the first grammar rule in the grammar file.
main()
{
code_segment
}
The start symbol points to the rule that describes this structure. All remaining rules in the file describe ways to identify lower-level structures within the function.
Token numbers
Token numbers are nonnegative integers that represent the names of tokens. If the lexical analyzer passes the token number to the parser, instead of the actual token name, both programs must agree on the numbers assigned to the tokens.
- A literal character is the numerical value of the character in the ASCII character set.
- Other names are assigned token numbers starting at 257. Note: Do not assign a token number of 0. This number is assigned to the endmarker token. You cannot redefine it.
To assign a number to a token (including literals) in the declarations section of the grammar file, put a positive integer (not 0) immediately following the token name in the %token line. This integer is the token number of the name or literal. Each token number must be unique. All lexical analyzers used with the yacc command must return a 0 or a negative value for a token when they reach the end of their input.