We determined which problems were best solved with a code-generating program, including:
- Programs that need to pre-generate data tables
- Programs that have a lot of boilerplate code that cannot be abstracted into functions
- Programs using techniques that are overly verbose in the language you are writing them in
We then looked at several metaprogramming systems and examples of their use, including:
- Generic textual-substitution systems
- Domain-specific program and function generators
- We then examined a specific instance of table-building
- We then wrote a code-generating program to build static tables in C
- Finally, we introduced Scheme and saw how it is able to tackle the issues we faced in the C language using constructs that were part of the Scheme language itself
This article gives you details on how Scheme macros are programmed and how they can make your large-scale programming tasks significantly easier.
syntax-case macros are not a standard part of Scheme, they are the most widely used macro types that allow both hygienic and non-hygienic forms and are very closely related to the standard
syntax-case macros follow the form in Listing 1:
Listing 1. The general form of syntax-case macros
(define-syntax macro-name (lambda (x) (syntax-case x (other keywords go here if any) ( ;;First Pattern (macro-name macro-arg1 macro-arg2) ;;Expansion of macro (one or multiple forms) ;;(syntax is a reserved word) (syntax (expansion of macro goes here)) ) ( ;;Second Pattern -- a 1-argument version (macro-name macro-arg1) ;;Expansion of macro (syntax (expansion of macro goes here)) ) )))
What this form does is defines
macro-name to be a keyword used for transformation. The function defined with
lambda is a function used by the macro transformer to convert the expression
x into its expansion.
syntax-case takes the expression
x as its first argument. The second argument is a list of keywords which are to be taken literally within the syntax patterns. The other identifiers used in the patterns will be used as template variables.
syntax-case then takes a sequence of pattern/transformer combinations. It proceeds through each one, trying to match the input form to the pattern and, if it matches, it produces the associated expansion.
Let's look at a simple example. Say we wanted to write a more verbose version of the
if statement than the one Scheme offers. And let's say that we want to find the greater of two variables and return it. The code would look like this:
(if (> a b) a b)
To a non-Scheme programmer, there are no textual indications to indicate which is the "then" branch and which is the "else" branch. To help with this, you can create your own custom
if statement that added the "then" and "else" keywords. It would look like this:
(my-if (> a b) then a else b)
Listing 2 demonstrates the macro to perform this operation:
Listing 2. Macro to define an extended if statement
;;define my-if as a macro (define-syntax my-if (lambda (x) ;;establish that "then" and "else" are keywords (syntax-case x (then else) ( ;;pattern to match (my-if condition then yes-result else no-result) ;;transformer (syntax (if condition yes-result no-result)) ) )))
When this macro executes, it will match the
my-if expression up to the template like this (in other words, matching a macro invocation to a macro definition pattern):
(my-if (> a b) then a else b) | | | | | | | | | | | | v v v v v v (my-if condition then yes-result else no-result)
Therefore, in the transforming expression, anywhere where it says
condition it is replaced by
(> a b). It doesn't matter that
(> a b) is a list. It is a single element in the containing list so it is treated as a unit in the pattern. The resulting
syntax expression simply rearranges each of these parts into a new expression.
This transformation happens before execution, during what is known as macro-expansion time. On many compiler-based Scheme implementations, macro-expansion time occurs during compile time. This means that macros are only executed once, at the beginning of the program or at compile time, and never have to be re-evaluated again. Therefore, our
my-if statement has no runtime overhead whatsoever -- it is converted to a simple
if at runtime.
In the next example, we are going to perform the famous
swap! macro. This will be a simple macro designed to swap the values of two identifiers. Listing 3 gives an example of how the macro will be used.
Listing 3. Using the swap! macro to exchange indentifier values
(define a 1) (define b 2) (swap! a b) (display "a is now ")(display a)(newline) (display "b is now ")(display b)(newline)
This simple macro (Listing 4) implements the swap by introducing a new temporary variable:
Listing 4. Defining our swap! macro
;;Define a new macro (define-syntax swap! (lambda (x) ;;we don't have any keywords this time (syntax-case x () ( (swap! a b) (syntax (let ((c a)) (set! a b) (set! b c))) ) )))
This introduces a new variable called
c. But what if one of the arguments to be swapped is called
syntax-case solves this problem by replacing
c with a unique, unused variable name when the macro expands. Therefore, the syntax transformer will take care of this all its own.
syntax-case does not replace
let. This is because
let is a globally-defined identifier.
The idea of replacing introduced variable names with non-conflicting names is called hygiene; the resulting macros are called hygienic macros. Hygienic macros can be safely used anywhere without fear of stomping on existing variable names. For a wide variety of metaprogramming tasks, this feature makes macros more predictable and easier to work with.
While hygienic macros make introducing variable names within macros safe, there are cases in which you will want your macros to be non-hygienic. For example, let's say that you wanted to create a macro that introduced a variable into a scope that could be used by the person calling the macro. This would be a non-hygienic macro because the macro is polluting the namespace of the user's code. However, there are many times when this ability is useful.
As a simple example, let's say that we wanted to write a macro which introduced the definitions of several math constants for use within the macro (yes, this could be better accomplished using other means, but I'm using this for a simple example). Let's say we wanted to define
e using a macro invocation like Listing 5:
Listing 5. Invocation of a math constant macro
(with-math-defines (* pi e))
If we tried to set this up like the previous macros it would fail:
Listing 6. Math constant macro that doesn't work
(define-syntax with-math-defines (lambda (x) (syntax-rules x () ( (with-math-defines expression) (syntax (let ( (pi 3.14) (e 2.71828) ) expression)) ) )))
This formulation won't work. The reason is that, as mentioned earlier, Scheme will rename
e so they don't conflict with other names in enclosing or nested scopes. Therefore, they will get new names and the code
(* pi e) will be referencing undefined variables. We need a way to introduce literal symbols which can be used by the developer invoking the macro.
In order to introduce code into a macro that won't be modified by Scheme's automatic hygiene, the code must be converted from a list of symbols into a syntax object which can then be assigned to a pattern variable and inserted in the transformed expression. To make this happen, we will use
with-syntax which is essentially a "let" statement for macros. It has the same basic form, but is used for assigning syntax objects to template variables.
In order to be able to create a new template variable, you need to be able to translate symbols and expressions back and forth between list representation (the way syntax is written) and the more abstract syntax object representation. The following functions do these conversions:
datum->syntax-objectconverts a list to the more abstract syntax object representation.
The first parameter to this function is usually
(syntax k)which is a little magic formula that helps the syntax converter get the context correct.
- The second parameter is the expression that needs to be converted into a syntax object.
The result is a syntax object that can be assigned to a template variable using
- The first parameter to this function is usually
syntax-object->datumis the reverse process of the
datum->syntax-object. This takes a syntax object and converts it into an expression that can be manipulated using normal Scheme list-processing functions.
syntaxtakes a transformation expression consisting of template variables and constant expressions and returns the resulting syntax object.
For this example, to get the literal values in a template variable, you would use
syntax-object->datum in combination. You could then manipulate the expression and use
datum->syntax-object to get it back into a syntax object which can be assigned to a template variable using
with-syntax. Then, in the final transformation expression, the new template variable can be used like any other.
In effect, you are converting the Scheme syntax to a list you can manipulate, manipulating that list, and then converting it back into a Scheme syntax expression for output.
Listing 7 shows the macro definition to define math symbols using these functions:
Listing 7. Math constant macro that works
(define-syntax with-math-defines (lambda (x) (syntax-case x () ( ;;Pattern (with-math-defines expression) ;;with-syntax defines new pattern variables (with-syntax ( (expr ;;the new pattern variable ;;convert expression into a syntax object (datum->syntax-object ;;syntax domain magic (syntax k) ;;expression to convert `(let ( (pi 3.14) (e 2.72)) ;;Insert the code for the "expression" template ;;variable here. ,(syntax-object->datum (syntax expression)))))) ;;Use the newly-created "expr" pattern ;;variable as the resulting expression (syntax expr)) ) )))
If you are not familiar with Scheme, the backquote, called a quasiquote, is similar to the quote operator except that it allows non-quoted data to be included if it is preceded by a comma (called the unquote operator). This lets us splice the expression into our bit of boilerplate code, then the whole shebang is converted back into a syntax object as the final transformation.
Since we explicitly spliced the new variables into the existing syntax object, there is no chance for them to be renamed. Also note that the expression
(syntax k) in
datum->syntax-object is necessary but essentially meaningless. It is used to invoke a little bit of "magic" within the syntax processor so that the
datum->syntax-object function will know what context the expression should be processed in. It is always written as
The problem with non-hygienic macros is that the introduced variables can overwrite and be overwritten by other variables in the code. This makes mixing non-hygienic macros especially dangerous since the macros will not be aware of what variables the other macros are using and they may stomp on each other's variables. Therefore, non-hygienic macros should only be used when there is no other way to accomplish the same effect using normal functions or hygienic macros and in such a case the macro's symbol introductions should be carefully documented.
A lot of the code written in large applications is boilerplate code which is tedious to write and, if a bug is discovered in the boilerplate code, it is very, very difficult to find every instance where the boilerplate is used and rewrite the code. This means that boilerplate code is one of the few places where non-hygienic macros are useful.
A large part of boilerplate code is simply setting up variables that are going to be used within your function, therefore the boilerplate macros should be introducing a large set of common bindings, as well as perhaps other housekeeping tasks.
Let's say that we are building a CGI application consisting of many independent CGI scripts. In most CGI applications, much of the state is stored in a database, but only a session ID is passed to each script via a cookie.
However, in nearly every page we need to know the other standard information (such as the username, group number, the current job being worked on, whatever else information is pertinent). In addition, we need to redirect the user if they do not have an appropriate cookie. Listing 8 demonstrates some code that could be a standard boilerplate (hypothetical Web server functions will be prefixed with
Listing 8. Boilerplate code for Web application
(define (handle-cgi-request req) (let ( (session-id (webserver:cookie req "sessionid"))) (if (not (webserver:valid-session-id session-id)) (webserver:redirect-to-login-page) (let ( (username (webserver:username-for-session session-id)) (group (webserver:group-for-user username)) (current-job (webserver:current-job-for-user username))) ;;Code for processing goes here ))))
While some of that can be handled by a procedure, the bindings certainly cannot. However, we can turn most of it into a macro. The macro can be implemented like this:
Listing 9. Macro of the boilerplate code
(define-syntax cgi-boilerplate (lambda (x) (syntax-case x () ( (cgi-boilerplate expr) (datum->syntax-object (syntax k) `(let ( (session-id (webserver:cookie req "sessionid"))) (if (not (webserver:valid-session-id session-id)) (webserver:redirect-to-login-page) (let ( (username (webserver:username-for-session session-id)) (group (webserver:group-for-user username)) (current-job (webserver:current-job-for-user username))) ,(syntax-object->datum (syntax expr)))))) ) )))
We can now create new forms based on our boilerplate code by doing the following:
(define (handle-cgi-request req) (cgi-boilerplate (begin ;;Do whatever I want here )))
In addition, since we are not defining our variables explicitly, adding new variable definitions to our boilerplate won't affect its calling conventions, so new features can be added without having to create a whole new function.
In any large project, there are inevitably templates to follow which cannot be reduced to functions, usually because of the bindings being created. Using boilerplate macros can make maintenance of such templated code much easier.
Likewise, other standard macros can be created which make use of variables defined in the boilerplate. Using macros like this significantly reduces typing because you do not have to constantly be writing and rewriting variable bindings, derivations, and parameter passing. This also reduces the potential for errors in such code.
Realize though that boilerplate macros are not a panacea. There are many significant problems that can occur, including:
- Accidentally overwriting bindings by introducing a variable name that was previously defined in a macro.
- Difficulty tracing problems because the inputs and the outputs of the macros are implicit, not explicit.
These can be largely avoided by doing a few things in conjunction with your boilerplate macros:
Have a naming convention which clearly labels macros as such, as well as indicate that a variable came from boilerplate code. This could be done by affixing
-mto macros and
-bto variables defined within a boilerplate.
- Carefully document all boilerplate macros, especially the introduced variable bindings and all changes between versions.
- Only use boilerplate macros when the savings in repetitiveness clearly outweigh the negatives of implicit functionality.
In programming, many times what is really needed is a small domain-specific language. There are many examples of domain-specific languages in use today:
- Configuration files
- Web markup languages such as HTML
- Job control languages
These languages are not necessarily Turing-complete (if it has a computational power equivalent to a universal Turing machine -- in other words, the system and the universal Turing machine can emulate each other). The commonality between them is that they all have a lot of implicit assumptions and implicit state that would have to be dealt with explicitly in a general-purpose programming language. Scheme allows you to have the best of both worlds by being able to define macros which operate as specialized domain-specific languages.
For the first example, let's consider a security configuration file to detail different security domains within a configuration file. There will be several different security domains, each of which have different access controls and restrictions.
Many systems already have declarative security. Specifically, J2EE has some of the declarative security features we are going to look at, such as:
Listing 10. Declarative security features in J2EE
<![CDATA[ <security-constraint> <web-resource-collection> <web-resource-name>Test Resource</web-resource-name> <description>This is an example Resource</description> <url-pattern>/Test</url-pattern> </web-resource-collection> <auth-constraint> <role-name>USERS</role-name> </auth-constraint> </security-constraint> ]]>
In this code, we are limiting access to a certain URL based on a given user's role and telling which authentication mechanism to use for someone who is not logged in. This can be done in a similar way with a macro in Scheme. We could define a macro that would allow us to do something like this (a declarative security macro):
(resource "Test Resource" "This is an example resource" "/Test" (auth-constraints (role "USERS")))
Listing 10 is what the macro definition for the previous macro invocation might look like (all functions prefixed with
webserver: are hypothetical functions provided by the Web server):
Listing 11. Writing the declarative security macro
;;This macro creates expressions which check the validity ;;of the authentication credentials in the variable "credentials" ;;and reports and redirects unauthorized access. (define-syntax auth-constraints (lambda (x) (syntax-case x (auth-constraints time role) ( ;;This causes the constraints to be processed one at a ;;time within a (begin) clause. (auth-constraints constraint1 constraint2 ...) (syntax (begin (auth-constraints constraint1) (auth-constraints constraint2 ...))) ) ( ;;This gives the expansion for the role checking mechanism ;;(note that "credentials" is defined in the "resource" macro below) (auth-constraints (role rolename ...)) (syntax (if (not (webserver:is-in-role-list credentials (list rolename ...))) (webserver:report-unauthorized) #f)) ) ( ;;Allows a time-based checking (auth-constraints (time beginning ending)) (syntax (let ( (now (webserver:getunixtime))) (if (or (< now beginning) (> now ending)) (webserver:report-unauthorized) #f))) ) ( ;;Unknown case -- assume it is code or is transformed by ;;another macro (auth-constraints unknown) (syntax unknown) ) ))) ;;Each resource definition expands to a function to check ;;credentials. It piggy-backs onto the macros defined above, ;;which make up the body of the credential-checking function. ;;This sets up the "credentials" parameter which is used in the ;;expressions above (define-syntax resource (lambda (x) (syntax-case x () ( (resource name description url security-features) (with-syntax ( ;;This builds the function to check security information (security-function (datum->syntax-object (syntax k) `(lambda (credentials) ,@(syntax-object->daturm (syntax security-features)))) (syntax (webserver:add-security-function name description url security-function)))))))
These macros require a little bit of explanation. First of all, there is a new construct introduced,
.... This notation essentially means "repeating as before." It can be used both in the macro pattern and in the expansion.
resource macro basically builds a function for processing security credentials and then passes that as an argument to
webserver:add-security-function. It defines a function with a single argument,
credentials, which will be used by the
auth-constraints macro is a little more complicated. It can take one of two forms -- either having a single constraint to process or a list of constraints to process. The first section of the macro breaks down the list of constraints case into multiple single constraints cases. The
... is used to indicate possible continuation of similar forms. We are taking advantage of the fact that after a macro expansion occurs, the result is then macro-expanded again, continuing until no more expansions take place. If you follow the iterated expansions of
auth-constraints, you will see that it will indeed expand into a list of individual
auth-constraints macros which will then be processed individually using the remaining macro forms.
auth-constraints contains two extra features that aren't being used in the example. The first is a time-based authorization mechanism and the second is the ability to be further expanded by other macros and code. The time-based authorization mechanism is merely an example of how multiple types of constraints can be added in to this mechanism; the expansion option will be used in a later example.
These macros will expand our security declarations into what's shown in Listing 11:
Listing 12. Expansion of Scheme declarative security
(webserver:add-security-function "Test Resource" "This is an example resource" "/Test" (lambda (credentials) (begin (if (not (webserver:is-in-role-list credentials (list "USERS"))) (webserver:report-unauthorized) #f))))
This leads to the obvious questions:
- Why did we bother to implement this as a macro?
- What was wrong with the XML declaration used by Java?
There are two issues that make a macro set preferable to data languages like the XML declarative security file:
- The declarative information is transformed into an imperative form at compile-time, rather than each time it is used at runtime, resulting in faster code.
- More importantly, if the declarative language is not expressive enough for your needs, you can include imperative statements within your file as well, using the full expressive power of the programming language.
While the first feature is useful, the second feature is what makes it worthwhile. Since the macro expands to regular code anyway, you can always switch back to imperative programming if the declarative language doesn't suit your needs. In fact, if the transformation is well documented, you can even mix declarative and imperative statements within your configuration.
Let's say, for example, that you wanted to check the domain that the user was coming from against an external list of rogue IP addresses. Here is how we could do it using a mixture of declarative and imperative security features:
(resource "Test Resource" "This is an example resource" "/Test" (auth-constraints (role "USERS") (if (rogue-ip-list:contains (webserver:ip-address credentials)) (webserver:report-unauthorized) #f)))
This allows the ultimate in flexibility for programming. You can program declaratively, using domain-specific sub-languages, but still revert to your full-featured programming language if the sub-language does not meet your needs fully.
Metaprogramming has many uses in large-scale computer programming. In this article, I've touched on the tools needed to do metaprogramming in Scheme, as well as provide several metaprogramming examples. Metaprogramming techniques were applied to several application areas:
- Making the syntax nicer
- Automating boilerplate generation
- Writing declarative sub-programs
In Scheme, you can use the macro facility to define nearly any sort of domain-specific language you want. The tools are there. It's just a matter of deciding which features are implemented more easily and more clearly using macro expansions versus regular code.
- Using syntax-case
The main publication describing
syntax-caseis Kent Dybvig's Writing Hygienic Macros in Scheme with Syntax-Case.
- Dybvig further expands upon that description in Chapter 8. Syntactic Extension of The Scheme Programming Language.
- The main publication describing
JRM's Syntax-rules Primer for the Merely Eccentric is the one good guide for intermediate-level macro programming in Scheme.
On Lisp thoroughly describes the problems, solutions, and possibilities in writing macros in Lisp-like systems.
- Read other articles by Jonathan Bartlett on developerWorks.
The Code generation using XSLT tutorial (developerWorks, April 2003) provides a basic introduction to code generation concepts.
Replacing reflection with code generation (developerWorks, June 2004) demonstrates how you can use runtime classworking to replace reflection code with generated code.
This list of tutorials will get you started with Scheme macro programming using
This article answers the question: Are You Missing Out on Code Generation?
Refactoring as Meta Programming? (Journal of Object Technology) reflects on the use of refactoring as a type of metaprogram.
An interesting resource for the code generating aspect of programming is the Code Generation Network which provides code-generation info for the "Pragmatic Engineer."
Find more resources for Linux developers in the developerWorks Linux zone.
Get products and technologies
Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Build your next development project on Linux with IBM trial software, available for download directly from developerWorks.
Get involved in the developerWorks community by participating in developerWorks blogs.
Jonathan Bartlett is the author of the book Programming from the Ground Up, an introduction to programming using Linux assembly language. He is the lead developer at New Medio, developing Web, video, kiosk, and desktop applications for clients.