You're the engineering team lead on a new PHP project. The requirements are fulfilled, and the preliminary data model is approximately 150 tables. Now, it's time to think about the schedule. You estimate a day to write each database access class and unit test it, so 150 tables at one table per day is 150 days. Assuming about a 20-day work month, will you have the almost eight months required just to write the database access layer?
Probably not, so you need a way to reduce the time required. You might use a persistence framework, but even if such a framework cut the time in half you would still need four months. You might write a generic library, but such libraries can be complex to use, error prone, and difficult to debug.
Or you might write a sample class or two, then build the rest with a code generator. Worst-case scenario, it could take up to a month to write the generator itself. But after that, the generation of all the code would take only minutes. In addition, any bugs that occur when the SQL schema doesn't match the database access code are all but eliminated because the generator builds both the SQL schema and the access code.
Can such a plan work? It can, and this article shows you how to do it and explains how your application will be easier to maintain, more robust, and enjoyable to write. Part 1 of this two-part series lays down the basics of code generation and walks through building the portion of the generator that creates SQL for a given model. Part 2 will drill deeper into XSL and show you how to finish the generator by walking through the code that creates the PHP.
Before you begin, you'll need a generator that takes an abstract model of the database and builds both an SQL schema and PHP database access classes. Figure 1 illustrates this model and shows the potential for generating other database access code for other technologies, such as Java™ code.
Figure 1. Basic information flow for the generator
The dashed boxes in Figure 1 (SQL model and Database access model) indicate temporary models created from the abstract model. The dotted section (Java) indicates what you could potentially generate.
The next question is which technology to use. You can write the generator using Java code, Perl, Python, or Ruby. Given the simplicity of the task, you might get by with a template engine such as Velocity or XSLT. A template engine like XSLT 2.0 is a good place to start because it has a lot of power on its own and you can embed it within Java code to extend its reach even farther.
XSLT 2.0 offers expanded language capabilities to developers seeking a template engine for code generation. This article exploits these features in the context of building a robust generator for both SQL and PHP that works off an abstract table definition. In XSLT 1.x, only single-level transformations were possible; but with XSLT 2.0, you can build intermediate models of the code to be generated. This functionality makes understanding and maintaining the generator easier, and also makes it possible to repurpose the generator for other language targets.
As an XML translation technology, XSLT can translate XML into either XML or text. (I use both of these facilities in this code generator project.) XSLT 2.0 is a vast improvement over XSLT 1.x, and in this article I use three new features available in the second revision of the standard:
- Functions: You can now define custom functions. This is a big improvement over the first version of XSLT, which used a cumbersome template syntax for this purpose.
- Temporary trees: In the first version of XSLT, you had to operate on the input XML tree. XSLT 2.0 can build temporary trees in memory that you can use to drive other templates.
- Result documents: Now, a single template can generate multiple output files. I use this feature to generate the SQL file and the individual PHP files.
Now, I'll show you how to build the generator.
Start with the input to the generator. Listing 1 provides an example of a table definition for a simple book database.
Listing 1. The input abstract table definition
<?xml version="1.0" encoding="UTF-8"?>
<tables>
<table name="Author">
<field name="first" type="text"/>
<field name="last" type="text"/>
</table>
<table name="Publisher">
<field name="name" type="text"/>
<field name="last" type="text"/>
</table>
<table name="Book">
<field name="name" type="text"/>
<field name="author" type="id"/>
<field name="publisher" type="id"/>
</table>
</tables>
|
This simple XML script defines three database tables with their fields: Author, Publisher, and Book. The input for your full PHP application will be much larger and probably more complex, but this code serves as a reasonable starting set.
You can build a generator that turns this XML into SQL or PHP in a single pass, but I don't recommend doing so. The templates become complex and hard to maintain. Today's best practice for code generation is to have multiple levels of models, each with an increasing level of target specificity. This practice reduces the complexity level of any one transformation, and is similar to using multiple tiers in a Web server to reduce overall complexity and encourage reuse.
Referring back to Figure 1, you see how the abstract model generates one model for the SQL and another for the PHP through the Database access model. Those models, stored in temporary trees, are then used to build the code. Figure 2 tracks the Author table through the transformations in the generator.
Figure 2. The evolution of a single SQL table
The top box in Figure 2 shows the original abstract table model. The middle box is the SQL model built from that abstract model. The bottom box shows the resulting SQL output. As you can see, the SQL model specifies point-by-point what should be in the final SQL file. The code template's job is simply to translate the XML into SQL, which makes it possible to generate code for a variety of databases from a single model. Now, take a look at the XSL.
Listing 2 shows the first part of the main XSL stylesheet for the generator project.
Listing 2. The beginning of the main generator XSL sheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:gen="http://www.codegeneration.net/" version="2.0">
<!-- Output specifications -->
<xsl:output method="text"/>
<xsl:output method="xml" name="debug-xml" indent="yes"/>
<xsl:output method="text" name="sql" indent="yes"/>
<!-- SQL Templates -->
<xsl:include href="gen2-sql.xsl" />
<xsl:include href="gen2-sql-model.xsl" />
<xsl:include href="gen2-queries.xsl" />
<!-- Database Access Templates -->
<xsl:include href="gen2-dba.xsl" />
<xsl:include href="gen2-php.xsl" />
<!-- The generator main entry point -->
<xsl:template match="/">
<!-- Create the SQL model -->
<xsl:text>Building SQL model
</xsl:text>
<xsl:variable name="sql-model">
<xsl:call-template name="gen-sql-model">
<xsl:with-param name="model" select="."/>
</xsl:call-template>
</xsl:variable>
<!-- Dump it out for debugging -->
<xsl:text>Dumping SQL model
</xsl:text>
<xsl:result-document href="db/gen-tables.xml" format="debug-xml">
<xsl:copy-of select="$sql-model"/>
</xsl:result-document>
<!-- Generate the SQL from the SQL model -->
<xsl:text>Generating SQL
</xsl:text>
<xsl:result-document href="db/gen-tables.sql" format="sql" >
<xsl:apply-templates mode="sql" select="$sql-model/sql" />
<xsl:result-document>
|
The important parts of this script are where I create the sql-model variable, which holds the SQL model, and the result-document tag, which creates the SQL file using the contents of the sql-model variable. These tags show two valuable additions to XSLT 2.0: the ability to create temporary trees with the xsl:variable tag and the ability to create multiple output files with the xsl:result-document tag.
Now, I'll show you the template that generates the SQL model from the abstract table model. Listing 3 shows the gen-sql-model template.
Listing 3. The SQL model generator
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:gen="http://www.codegeneration.net/" version="2.0">
<!-- Builds the SQL model from the original data model -->
<xsl:template name="gen-sql-model">
<xsl:param name="model"/>
<sql>
<xsl:for-each select="$model/tables/table">
<create name="{lower-case(@name)}"
primary-key="{concat(lower-case(@name),'_id')}">
<field name="{concat(lower-case(@name),'_id')}"
type="{gen:model-type-to-sql('integer')}"/>
<xsl:for-each select="field">
<field name="{lower-case(@name)}"
type="{gen:model-type-to-sql(@type)}"/>
</xsl:for-each>
</create>
</xsl:for-each>
</sql>
</xsl:template>
</xsl:stylesheet>
|
This template is fairly simple: It takes the original model and builds some new tags for the SQL model.
Note the call to the gen:model-type-to-sql function. This call isn't part of the
standard XSLT function set -- it's a custom function that I define in Listing 4.
Listing 4. The model-to-SQL-type transform function
<xsl:function name="gen:model-type-to-sql">
<xsl:param name="type"/>
<xsl:choose>
<xsl:when test="$type eq 'text'">TEXT NOT NULL</xsl:when>
<xsl:when test="$type eq 'id'">INTEGER NOT NULL</xsl:when>
<xsl:when test="$type eq 'integer'">INTEGER NOT NULL</xsl:when>
</xsl:choose>
</xsl:function>
|
This ability to build new XPath functions is new with XSLT 2.0. Previously, you had to use the
xsl:call-template syntax, which can become cumbersome as the number of
parameters grows. Having templates act as functions also pollutes the template namespace.
The final step in generating the SQL is to format your SQL model as SQL. You do this using the two templates shown in Listing 5.
Listing 5. SQL generation templates
<!-- Template for SQL create tags -->
<xsl:template match="create" mode="sql">
DROP TABLE IF EXISTS <xsl:value-of select="@name" />;
CREATE <xsl:value-of select="@name" /> (
<xsl:apply-templates mode="sql" select="field" />
PRIMARY KEY ( <xsl:value-of select="@primary-key" /> )
);
</xsl:template>
<!-- Template for SQL field tags -->
<xsl:template match="field" mode="sql">
<xsl:value-of select="concat(@name,' ',@type,',')" /><xsl:text>
</xsl:text>
</xsl:template>
|
That's really all it takes. I have one template that responds to the create tag in
the SQL model. This template then uses the apply-templates tag to handle its
child field tags, which are formatted by the second template.
Refer back to Figure 2. Notice how the structure of the XML looks like the structure of the SQL. A
create tag includes several field tags. As with SQL, a
create command includes several field parameters. Structuring the
model the same way as the code can greatly simplify the code templates.
In this article, I have introduced you to some of the new features available in XSLT 2.0 -- in particular, the expanded language capabilities and the ability to build intermediate models of code to be generated -- and illustrated how you can use them to build a robust generator for SQL that works off an abstract table definition.
In Part 2, I will show you how to generate the PHP portion of the code that will provide database access for the Web server. I will also introduce a few more new features in XSLT 2.0.
- Check out Code Generation Network, your source for code generation information where you'll fin articles, interviews, book reviews, and e-mail lists for discussing code generation.
- Read Code Generation in Action by Jack D. Herrington, which covers generating code for a wide variety of targets not limited to database access.
- Download Saxon, an excellent XSLT processor that supports version 2 of the standard and was used in this article.
- Explore Generative Programming, the seminal work on generative programming. This field includes code generation as well as other automatic programming techniques.
- Want to know more about XSLT 2.0 and XPath 2.0? Read XSLT 2.0 Programmer's Reference by Michael Kay, the bible on this new standard. XPath 2.0 Programmer's Reference is Michael Kay's companion book on XPath 2.0.
- Find hundreds more XML resources on the
developerWorks XML zone.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action. You can contact him at jack_d_herrington@codegeneration.net.
Comments (Undergoing maintenance)





