Generating a lexical analyzer with the lex command
The lex command helps write a C language program that can receive and translate character-stream input into program actions.
To use the lex command, you must supply or write a specification file that contains:
- Extended regular expressions
- Character patterns that the generated lexical analyzer recognizes.
- Action statements
- C language program fragments that define how the generated lexical analyzer reacts to extended regular expressions it recognizes.
For information about the format and logic allowed in this file, see the lex command in Commands Reference, Volume 3.
cc lex.yy.c -ll
However, if the lexical analyzer must recognize more complex syntax, you can create a parser program to use with the output file to ensure proper handling of any input.
You can move a lex.yy.c output file to another system if it has a C compiler that supports the lex library functions.
- Reads an input stream of characters.
- Copies the input stream to an output stream.
- Breaks the input stream into smaller strings that match the extended regular expressions in the lex specification file.
- Executes an action for each extended regular expression that it recognizes. These actions are C language program fragments in the lex specification file. Each action fragment can call actions or subroutines outside of itself.
The lexical analyzer generated by the lex command uses an analysis method called a deterministic finite-state automaton. This method provides for a limited number of conditions in which the lexical analyzer can exist, along with the rules that determine the state of the lexical analyzer.
The automaton allows the generated lexical analyzer to look ahead more than one or two characters in an input stream. For example, suppose you define two rules in the lex specification file: one looks for the string ab and the other looks for the string abcdefg. If the lexical analyzer receives an input string of abcdefh, it reads characters to the end of the input string before determining that it does not match the string abcdefg. The lexical analyzer then returns to the rule that looks for the string ab, decides that it matches part of the input, and begins trying to find another match using the remaining input cdefh.
Compiling the lexical analyzer
- Use the lex program to change the specification file into a C language program. The resulting program is in the lex.yy.c file.
- Use the cc command with the -ll flag to compile and link the program with a library of lex subroutines. The resulting executable program is in the a.out file.
lex lextest
cc lex.yy.c -ll