Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Write text parsers with yacc and lex

Martin Brown (mc@mcslp.com), Freelance Writer, Consultant
Martin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS/X and more -- as well as Web programming, systems management, and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft. He can be contacted through his Web site at http://www.mcslp.com.

Summary:  Examine the processes behind building a parser using the lex/flex and yacc/bison tools, first to build a simple calculator and then delve into how you can adopt the same principles for text parsing. Parsing text -- that is, understanding and extracting the key parts of the text -- is an important part of many applications. Within UNIX®, many elements of the operating system rely on parsing text, from the shell you use to interact with the system, through to common tools and commands like awk or Perl, right through to the C compiler you use to build software and applications. You can use parsers in your UNIX applications (and others), either to build simple configuration parsers or even to build the ultimate: your own programming language.

Date:  31 May 2006
Level:  Intermediate PDF:  A4 and Letter (83 KB | 27 pages)Get Adobe® Reader®

Activity:  20645 views
Comments:  

Before you start

UNIX® programmers often find that they need to understand text and other structures with a flexible, but standardized format. By using the lex and yacc tools, you can build a parsing engine that processes text according to specific rules. You can then incorporate it into your applications for everything from configuration parsing right up to building your own programming language. By the end of this tutorial, you'll understand how to define lexical elements, write yacc rules, and use the rule mechanism to build and define a range of different parsing engines and applications.

About this tutorial

There are many ways to understand and extract text in UNIX. You can use grep, awk, Perl, and other solutions. But sometimes you want to understand and extract data in a structured, but unrestricted format. This is where the UNIX lex and yacc tools are useful. The previous tools, awk, Perl, along with the shell and many other programming languages, use lex and yacc to generate parsing applications to parse and understand text and translate it into the information, or data structures, that you need.

Lex is a lexical analysis tool that can be used to identify specific text strings in a structured way from source text. Yacc is a grammar parser; it reads text and can be used to turn a sequence of words into a structured format for processing.

In this tutorial, you'll examine how to use lex and yacc, first to build a calculator. Using the calculator as an example, you'll further examine the output and information generated by the lex and yacc system and study how to use it to parse other types of information.


Prerequisites

To use the examples in this tutorial, you will need to have access to the following tools:

  • Lex: This tool is a standard component on most UNIX operating systems. The GNU flex tool provides the same functionality.
  • Yacc: This tool is standard on most UNIX operating systems. The GNU bison tool provides the same functionality.
  • C compiler: Any standard C compiler, including Gnu CC, will be fine.
  • Make tool: This tool is required to use the sample Makefile to simplify building.

GNU tools can be downloaded from the GNU Web site, or your local GNU mirror.

1 of 8 | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=125657
TutorialTitle=Write text parsers with yacc and lex
publish-date=05312006
author1-email=mc@mcslp.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.