Purely XML-based tools implement 85-90% or more of an application. Programming/Scripting languages -- whether it be Tcl, Perl, Python, whatever -- are important for some aspects of building applications -- more so for B2B apps -- but XSLT and XML Schemas are used increasingly for implementing the high-level logic -- business logic, configuration data, etc. Previously, this is where Tcl was used but it is moving down in the logic chain. At least, that's the practice of Steve Ball and Zveno Pty. Ltd. of Canberra, Australia (see Resources). With this article, developerWorks kicks off a series on Tool command language (Tcl) and Web services that focuses on the status of Tcl as a language for processing XML (see Resources). This article is not a guide to learning the Tcl programming language. Please look to the Resources section for links to other articles that provide a nice explanation of the basics of Tcl.
Part of the excitement of Web services comes from the diversity of problems it solves. The benefits of Web services are available to developers working in all sorts of environments. The Tcl high-level language offers particular advantages. This first installment of the series establishes an XML-based foundation. It answers the following questions: What Tcl tools are available for working with XML? Is Tcl a fit language for XML processing?
Dr. John Ousterhout invented Tcl in the late '80s. It was developed originally as an extension language for electronic design automation (EDA) tools (see Resources). Tcl almost immediately escaped from this niche and established its value in graphical user interface (GUI) construction, process management, and test automation. By the mid-'90s, it was well-established as a high-level, general-purpose development language, while continuing to be available in its original role as an embeddable "scripting" language.
By the time Ousterhout spun his own Tcl-focused company, then known as Scriptics Corporation, from Sun Microsystems at the beginning of 1998, XML was already a focus of Tcl attention. One immediately apparent reason had to do with its Unicode capabilities. Ousterhout and the other developers internal and external to Scriptics reworked Tcl to make its 8.0 release one of the first languages to handle Unicode (the standard XML encoding) properly and conveniently. For Steve Ball, the "most significant advantage" of Tcl compared to other programming languages remains internationalization: "Tcl rocks when it comes to Unicode and handling/manipulating character encodings."
Scriptics maintained its emphasis on XML throughout its brief history. In fact, when acquired in mid-2000 by Interwoven, Inc. (see Resources), the favorable price Ousterhout and other Scriptics investors received was based on his company's XML expertise, rather than any value imputed directly to Tcl. Ousterhout is now a scientist at Interwoven.
Also prominent in the development of Tcl is Marshall Rose (see Resources). Rose is the well-known author of over sixty of the Internet Engineering Task Force's (IETF) RFC standards, several books, and reference implementations of networking software having to do with electronic mail, network management, and directories (see Resources). More recently, the company he founded, Invisible Worlds (see Resources), also funded Zveno's initial development of TclXML. TclXML is the XML parser for Tcl, comparable to SAX or JAXP.
More precisely, TclXML is a collection of "tools for processing and manipulating XML documents using Tcl" (see Resources). TclXML includes two parsers:
- TclExpat, based on James Clark's (see Resources) high-performance Expat XML parser; and
- a "native" TclXML parser, coded entirely in Tcl.
Like many other WS tools, TclXML is open source and governed by a BSD-style license traditional in the Tcl community.
Since version 2.0theta last year, TclXML has supported XML Namespaces in a convenient abstraction away from XML syntax details. Tcl has its own namespace constructs, so it has been natural for Ball to attach XML Namespaces to Tcl namespaces. As he explains,
Then any reference to an element within that namespace is translated to an evaluation of a Tcl procedure with the same local-name in the Tcl Namespace.
Ball uses this technique in Zwik, his company's internal content managment system.
Layered over TclXML is TclDOM, a Tcl language binding to the Document Object Model (DOM). The natural visual representation of DOM marries well with the GUI capabilities of the Tk graphical extension to Tcl. TclDOM includes two Tk-coded megawidgets, DOMText, and TreeDOM that respect DOM events and expose XML Namespaces handily.
Standard Tcl is simply procedural, although a dozen different object-oriented extensions or supersets of Tcl are more or less in active use. According to the World Wide Web Consortium (W3C [see Resources]), "DOM API is heavily slanted towards O-O languages." And Advanced Rotorcraft Technology, Inc., Software Engineer Joe English reminds us that "it's to Steve's (Ball's) great credit that he was able to come up with an interpretation of this API that actually works well with Tcl."
Ball also introduced XPath support in TclDOM in the winter of 2001. XPath integrates XML content in a Web addressing scheme. The benefits to Tcl programmers include:
- XPath-based navigation through DOM trees;
- XPath-expressed construction of DOM trees; and
-
a TclXML package called
cgi2domwhich presents Web forms as DOM trees. Zveno usescgi2domin production systems to render form submissions into XML documents. XSLT applications process the form templates and their results with tiny Tcl evaluations connecting the inputs and outputs through a Web server.
TclXML users have applied the project in several different ways. Ousterhout's company concentrated on XML as a common language in business-to-business developments that often worked from legacy data. Zveno's clients, many of them in government or government suppliers, generally call on it for help publishing and managing content. Ball is enthusiastic about a "functional" approach that uses XSL and other XML tools to express the bulk of a transformation and "glues" the endpoints to such externals as a filesystem with Tcl. He explains it this way:
XSL can't (and isn't designed to) do everything. So what kinds of problems cannot be solved using XSLT? Usually where an interface needs to be built to an external system; the filesystem is a good example. I used to simply use a Tcl script to generate HTML, but the HTML index page must be styled differently for ourselves and our clients. Now, we use a very small Tcl script to generate a file called manifest.xml. This file simply lists the files and subdirectories in a filesystem directory in XML format. We then use an XSLT stylesheet to create the index page. Our clients just slot in a different stylesheet. The manifest file becomes useful for other purposes as well, like creating sitemaps, overviews, [and so on] of the database content.
Most TclXML contributors and consumers appear to be experienced engineers who recognize that several languages, including Java, Python, and Tcl, have roughly comparable algorithmic capabilities. They favor Tcl for the traditional strengths Ball summarizes as "simplicity, easy integration, and breadth of applicability." Its qualities include
- top-flight encoding management, including Unicode;
- succinct expression of high-level "piping" of data from one domain to another;
- integrated graphics in Tk; and
- expressive, consistent retrieval of legacy data both through database interfaces and Expect. Expect is a unique Tcl extension which automates interaction with difficult interfaces; think of it as a general-purpose "screen-scraper" for legacy programs (see Resources).
At the other end of the seniority range is third-year Newcastle University student Sam Aaron with his dissertation project on XML-encoded autonomous agents (see Resources). He selected Tcl as his implementation language despite the solid year of background he had in Java and its popularity in agent research. Tcl's simplicity and the robustness of TclXML contributed to his completion of half of his planned agent authoring tool in the first week of his exposure to the language. For him, "It's like a breath of fresh air after C++ and Java!"
A more polished showcase for Tcl advantages is e4graph. e4graph is Jacob Levy's general-purpose library for reliable, strikingly efficient, and portable persistent storage for graph-like data. e4graph's XML binding gives an instant-on, highly compact, remarkably portable format for persistent XML documents (see Resources). Ball hopes at some point to connect a TclDOM-e4graph interface to his company's "waX Me Lyrical" Tcl-coded XML editor for "instantly-opened pre-parsed documents" (see Resources).
Ball has it right: "XML is the future." Implementation language for procedural operations is secondary. Within that framework, though, Tcl is a fine vehicle for many applications. Tcl boasts such highlights as e4graph and several mission-critical systems already successfully moved to production.
Tcl certainly doesn't have the scale of talent that appears to be currently funded for such languages as Java, Visual Basic, and Python. Several good TclXML enhancements are dormant, waiting for a volunteer with time or a sponsor with money. Progress in specific areas continues, though. Ball and convention staffers expect good attendance at the four tutorials he'll present at this summer's Open Source Convention (see Resources). Future installments in this series will examine Tcl capabilities and applications with SOAP, WSDL, and UDDI.
- Zveno Pty Ltd. is Steve Ball's Australian
development and training house. It specializes in XML-based applications.
- Tcl is a popular high-level language
for a variety of development applications.
-
Find out about John Ousterhout, the creator of TCL.
- Interwoven, Inc. is a leading provider
in the area of XML-based content management, and John Ousterhout's employer.
- Invisible Worlds is developing
protocols designed to succeed HTTP. Its founder and CTO is Marshall Rose.
- Internet Engineering Task Force (IETF):
the IETF issues many of the fundamental Internet standards.
-
Visit TclXML home at SourceForge.
- James Clark's Home Page: Clark has
authored several very widely-used reference pieces of software having to
do with SGML and XML processes.
- W3C: the World-Wide Web Consortium is,
along with the IETF, the principle publisher of Internet standards.
-
Visit the home page of Joe English, the
principal author of the C-coded version of TclDOM.
-
Pick up free software at e4graph home. Software that represents and stores graphs.
- MetaKit home page: MetaKit
is the persistence library e4graph leverages.
-
The O'Reilly Open Source Convention is offering an XML track and several XML tutorials are available this summer.
- "waX Me Lyrical" (also WAX), the XML editor home page. Zveno originally wrote WAX.
Cameron is a full-time developer for independent consultancy Phaseit, Inc., based on the Texas Gulf Coast, just outside Houston. He frequently writes on Tcl and other "scripting" languages, including Python, Perl, ksh, and others less well known. In addition, he writes on GUI toolkits, data management, and testing methods. His own development projects most often are highly-reliable networked Web and industrial applications. He persists in regarding XML as a variant of SGML. He can be reached at Cameron@Lairds.com.