 | Level: Intermediate David Mertz (mertz@gnosis.cx), Developer, Gnosis Software, Inc.
04 Feb 2004 It's easy to get lost in the world of "little languages" -- quite a few have been written to scratch some itch of a company, individual, or project. Rexx is one of these languages, with a long history of use on IBM operating systems, and good current implementations for Linux and other Free Software operating systems. Rexx occupies a useful ecological niche between the relative crudeness of shell scripting and the cumbersome formality of full systems languages. Many Linux programmers and systems administrators would benefit from adding a Rexx implementation to their collection of go-to tools.
About Rexx
The Rexx programming language was first created in 1979, as a very
high level scripting language that had a particularly strong facility
for text processing tasks. Since Rexx's inception, IBM has included
versions of Rexx with most of its operating systems -- all the way from
its mainframes, to its mid-level systems, to end user OS's like OS/2
and PC-DOS. Other OS makers, such as Amiga, have also integrated Rexx
as an always-available system scripting language. A number of ISVs,
moreover, have created Rexx environments for many platforms. Somewhat
late in the game, ANSI officially adopted a standard for Rexx in 1996.
Nowadays (especially on Linux or BSD-derived OS's), most of those
older implementations of Rexx are primarily interesting as historical
footnotes. However, two currently maintained implementations of Rexx
remain available across a wide range of platforms, including Linux,
MacOSX and Windows: Regina and NetRexx. Regina is a native executable,
available as Free Software source code, or pre-compiled to a large
number of platforms -- install it pretty much as you would any other
programming language interpreter. NetRexx is an interesting hybrid. The language is a derivative of plain Rexx. Much like Jython or Jacl, NetRexx compiles Rexx-like source code into Java bytecodes, and (optionally) runs the resulting .class file within a JVM.
NetRexx is an IBM project for compiling Rexx-like code for a Java Virtual Machine. In capabilities and programming level, Rexx can be compared most
closely to bash plus the GNU text utilities
(throwing in grep and
sed for good measure); or maybe to awk or Perl. Certainly Rexx has
more of a quick-and-dirty feel to it than do, e.g., Python, Ruby, or
Java. The verbosity -- or rather, conciseness -- of Rexx is similar to
that of Perl, Python, Ruby or TCL. And Rexx is certainly
Turing-complete, enables modules and structured programming, and has
libraries for tasks such as GUI interfaces, network programming, and
database access. But its most natural target is in automation of
system scripting and text processing tasks. As with shell scripting,
Rexx allows very natural and transparent control of
application; but compared to bash (or tcsh, ksh, etc.), Rexx
contains a much richer collection of built-in control structures and
(text processing) functions.
Stylistically, the IBM/mainframe roots of Rexx show in its
case-insensitive commands; and to a lesser degree in the relative sparsity
of punctuation it uses (preferring keywords to symbols). I tend to find
that these qualities aid readability; but this is mostly a matter of
individual taste.
A start at streams and stacks
As a simple conceit, let me present a number of versions of a very
simple utility that lists files and numbers them. One feature that
Rexx has in common with shell scripting is that it has a relatively
impoverished collection of functions for working with the underlying
operating system -- mostly limited to the the ability to open, read, and
modify files. For most anything else, you rely on external utilities
to perform the job at hand. The utility numbered-1.rexx simply
processes STDIN:
Listing 1. numbered-1.rexx
#!/usr/bin/rexx
DO i=1 UNTIL lines()==0
PARSE LINEIN line
IF line\="" THEN
SAY i || ") " || line
END
|
The ubiquitous instruction PARSE can read
from various sources. In
this case, it puts the next line of STDIN into the variable line.
We also check if a line is blank, and skip showing and numbering it
if so. For example, combined with ls we can get:
Listing 2. Piping command to numbered-1
$ ls | ./numbered-1.rexx
1) ls-1.rexx
2) ls-2.rexx
3) ls-3.rexx
4) ls-4.rexx
5) ls-5.rexx
6) ls-6.rexx
7) numbered-1.rexx
8) numbered-2.rexx
|
You can just as easily pipe any other command in.
A concept at the core of Rexx is juggling multiple stacks or streams. In
bash-like fashion, anything in Rexx that is
not recognized as an
internal instruction or function is assumed to be an external utility.
There is no special function or syntax for calling an external
command. Taking advantage of the Regina utility rxqueue, which puts
output onto the Rexx stack, we can write a "numbered ls" utility as:
Listing 3. ls-1.rexx
#!/usr/bin/rexx
"ls | rxqueue"
DO i=1 WHILE queued() \= 0
PARSE PULL line
SAY i || ") " || line
END
|
Some instructions in Rexx may explicitly specify a stack to operate
on; but other instructions operate within an environment which you
configure with the ADDRESS
instruction. STDIN, STDOUT, STDERR,
files, and in-memory data stacks are all handled in a uniform and
elegant fashion. Above we used the external rxqueue utility, but we
can similarly redirect output of utilities right within Rexx. For
example:
Listing 4. ls-2.rexx
#!/usr/bin/rexx
ADDRESS SYSTEM ls WITH OUTPUT FIFO '' ERROR NORMAL
DO i=1 WHILE queued() \= 0
PARSE PULL line; SAY i || ") " || line; END
|
It might appear that the ADDRESS command is
grabbing the output of
just the ls utility; but it is actually
changing the general
execution environment for later external calls. These examples were run on a
case-retentive/case-insensitive filesystem; under many systems, you will
have to quote "ls" to maintain its case. This behaves
identically:
Listing 5. ls-5.rexx
#!/usr/bin/rexx
ADDRESS SYSTEM WITH OUTPUT FIFO '' ERROR NORMAL
ls
DO i=1 WHILE queued()\=0; PARSE PULL ln; SAY i||") "||ln; END
|
Any subsequent external commands, if they are run in the default
SYSTEM environment, will direct their output
to the default
FIFO.(first-in-first-out). You could also output to a LIFO instead
(either named or default) -- the difference being that a FIFO adds to
the "bottom" of the stack, and a LIFO to the "top." The instructions
PUSH and QUEUE
correspond to LIFO and FIFO operations on the
stack. The instruction PULL or PARSE PULL take a string off the
top of the stack.
Another useful stack to look at is that of the command-line arguments
to a Rexx script. For example, we might want to execute an arbitrary
command in our numbering utility, not always ls:
Listing 6. numbered-1.rexx
#!/usr/bin/rexx
PARSE ARG cmd
ADDRESS SYSTEM cmd WITH OUTPUT FIFO ''
DO i=1 WHILE queued()\=0; PARSE PULL ln; SAY i||") "||ln; END
|
The script in action:
Listing 7. Passing command to numbered-1
$ ./numbered-2.rexx ps -a -x
1) PID TT STAT TIME COMMAND
2) 1 ?? Ss 0:00.00 /sbin/init
3) 2 ?? Ss 0:00.19 /sbin/mach_init
4) 51 ?? Ss 0:01.95 kextd
[...]
|
PARSE PULL can be used to pull lines from
user input. Following the
example of the execution of the argument cmd, you could write a
shell or interactive environment in Rexx (perhaps running either
external utilities or built-in commands, much like bash).
Stem variables and associative arrays
In Rexx -- somewhat like in TCL -- to a large extent everything is a
string. Having stacks and streams composed of lines gives you a
simple list or array of strings. But mostly, strings simply act like
other datatypes as needed. For example, a string that contains a
suitable representation of a number (digits, decimal, an exponent "e",
etc.) can be used in arithmetic operations. For processing reports,
log files, and the like, this is exactly the behavior you want.
Rexx, however, does have one additional standard datatype: associative
arrays. In Rexx they go under the name "stem variables," but the
concept is very similar that of dictionaries in many other languages.
The syntax for stem variables will be oddly familiar to users of OOP
languages like Java, Python, or Ruby: a dot separates "objects" and
their "attributes." This is not really object-orientation, but the
syntax does (accidentally) highlight the degree to which an object
resembles a particularly robust dictionary; there are OOP extensions
to Rexx out there, but this article will not address them.
Not every string is valid Rexx symbol -- which restricts the keys in the
dictionary -- but Rexx is pretty liberal about its symbol names,
compared to most languages. E.g.
Listing 8. Using stem variables in Rexx
$ cat stems.rexx
#!/usr/bin/rexx
foo.X_!1.bar = 1
foo.X_!1.23 = 2
foo.fop.fip = 3
foo.fop = 4
SAY foo.X_!1.bar # foo.X_!1.23 # foo.fop.fip # foo.fop # foo.fop.NOPE
$ ./stems.rexx
1 # 2 # 3 # 4 # FOO.FOP.NOPE
|
A couple features stand out in this example. We set a value for both a
stem and its compound (e.g. foo.fop and
foo.fop.fip). Also notice
that the undefined symbol foo.fop.nope
simply stands for its own
spelling, absent an assignment to the contrary. This lets us skip
quotes in most situations. Case of names is normalized to upper case
in most Rexx contexts.
One useful trick is to set a value for the dotted stem, which then
acts as a default value for compound names based on the stem. For the
next example, we also make use of the capability to ADDRESS the
sequential numbered symbols of a compound name as an output
environment:
Listing 9. ls-3.rexx
#!/usr/bin/rexx
ls. = UNDEF
ADDRESS SYSTEM ls WITH OUTPUT STEM ls.
DO i=1
IF ls.i == UNDEF THEN LEAVE
SAY i || ") " || ls.i
END
|
As soon as the loop gets to some compound variable name that was not
populated by the output of the external ls
utility, we detect the
default value of "UNDEF" and leave the loop (if the output might
contain that string, a false collision could occur, however).
Rexx also has an error-handling system that lets you SIGNAL
conditions and handle them appropriately. Instead of checking for a
default compound value, you can also catch the access to an undefined
variable. E.g.:
Listing 10. ls-6.rexx
#!/usr/bin/rexx
ADDRESS SYSTEM ls WITH OUTPUT STEM ls.
SIGNAL ON NOVALUE NAME quit
DO i=1
SAY i || ") " || ls.i
END
quit:
|
Just to round our ls variants out, here is
one more that uses a file
for its I/O:
Listing 11. ls-4.rexx
#!/usr/bin/rexx
ADDRESS SYSTEM ls WITH OUTPUT STREAM files
DO i=1
line = linein(files)
IF line = "" THEN LEAVE
SAY i || ") " || line
END
rm files
|
Since the output stream is a regular file, it is probably good to
remove it at the end.
Text processing functions
The brief examples above will give readers a bit of the feel of Rexx
as a programming language. You can also, of course, define your own
procedures and functions -- in separate module files, if you wish -- and
call them either with the CALL instruction
or using parenthesized
arguments, as with some of the examples in this article that use
standard functions.
Perhaps the greatest strength in Rexx as a text processing language is
its useful collection of built-in string manipulation functions.
Somewhere over half of all the standard Rexx functions are for working
with strings, with a chunk of others thrown in for quite readable
manipulation of bit vectors. Moreover, even bit vectors are often
manipulated (or read in) as strings of ones and zeros:
Listing 12. bits.rexx
#!/usr/bin/rexx
SAY b2c('01100001') b2c('01100010') /* --> a b */
SAY bitor(b2c('01100001'), b2c('01100010')) /* --> c */
SAY bitor('a','b') /* --> c */
EXIT
/* Function in ARexx, but not ANSI Rexx */
b2c: PROCEDURE
ARG bits
return x2c(b2x(bits))
|
A nice feature of Rexx's text handling functions is the naturalness of
treating lines as being composed of whitespace-separated
words. For
textual reports and log files, easily ignoring extraneous whitespace
is quite useful -- 'awk' does something similar, but Python's
string.split() quickly gets more "busy" in
describing the same
operations. In fact, "arrays" in Rexx just amount to
whitespace-separated strings. The PULL
instruction will pull out variables from
a general template pattern for a line, which at a minimal case allows
word division:
Listing 13. pushpull.rexx
#!/usr/bin/rexx
PUSH "a b c d e f"
PULL x y " C " z /* pull x and y before the C, remainder into z */
SAY x # y # z /* --> A # B # D E F */
|
Further dividing strings that may or may not have been pulled with a
template is elegant. Functions like wordpos(), word(),
wordindex() or words(), subword() let you refer to the "words"
in a string as if they made up a list, e.g.:
Listing 14. Working with words in a string
seuss = "The cat in the hat came back"
thehat = wordpos('the hat', seuss)
SAY "'came'" is wordlength(seuss, thehat+2) letters long
/* --> 'came' IS 4 LETTERS LONG */
|
Of course, you also get a rich collection of character-oriented
functions as well. It is equally easy to work with character
positions using functions like reverse(),
right(), justify(),
center(), pos(),
or substr() (and others).
Another batch of the built-in functions let you work with dates and
numbers in a flexible, report-oriented, way. That is, numbers can be
read and written in a variety of formats, with arbitrary
(configurable) precision. Dates, similarly, can be read, written and
converted among many formats with standard function calls (e.g.
day-of-week, days-in-century, European versus US dates, etc.). The
flexibility with dates and numbers is probably less often necessary in
writing system scripts and processing log files than it is in working
with semi-structured output reports from database applications. But
when you need it, it is much more robust to have well-tested built-in
functions than to write your own ad-hoc converters and formatters.
Wrap up
Coming more out of an IBM "big-iron" environment than from Unix
systems, Rexx is little-known to many Linux programmers and systems
administrators. But there remains an important Linux niche where
Rexx is a better scripting solution than either the "too-light" bash
or ksh shells, or the
"too-heavy" interpreted programming languages
like Python, Perl, Ruby, TCL, or maybe Scheme. For quick and easily
readable scripts that perform text manipulation on the inputs and
outputs of external processes, Rexx is hard to beat, and not hard to
learn or install.
Resources -
Regina is the LGPL
Rexx implementation used in writing and testing the
examples in this article, and for most people will be the best choice
for an environment to install. Regina is available on an extremely
wide range of platforms, and is fully ANSI compliant (with a few
extensions added). The site also contains
links to a number of useful
Rexx libraries for working with application areas like Tk, Curses,
Sockets, SQL, and others.
-
NetRexx is an IBM
project for compiling Rexx code for a Java Virtual
Machine. While it is certainly possible to use NetRexx for
client/workstation scripting needs, the main focus of NetRexx is to
allow more rapid development of server-side Java applications (JSP and
related technologies).
-
The Rexx
Language Association is a a general advocacy group for the
Rexx programming language. Their site contains miscellaneous links to
useful libraries and other resources, including details on the
Rexx ANSI
Standard.
-
IBM Hursley also maintains a nice list of Rexx resources. You can find
tutorials, references, and links to a variety of modern libraries.
The Hursley site also has a page on
the history of
Rexx.
-
IBM offers an as-is version of Object
REXX for Linux as well as for
other
platforms. An object-oriented programming language suited for beginners as well as
experienced OO programmers, Object REXX is upward compatible with previous
versions of classic REXX and provides programming interfaces to existing
applications, such as DB2, C, and C++ applications.
-
The IBM REXX
Family
runs on host systems, such as VM/ESA,
VSE/ESA, and MVS/ESA, and workstation environments, such as AIX, OS/2, Linux, and Windows.
-
There is even a Mod_Rexx
package for the Apache Web server.
-
Find more resources for Linux developers in the developerWorks Linux zone.
- Browse for books on these and other technical topics.
About the author  | |  |
David Mertz is owner and chief consultant for Gnosis Software, Inc. whose corporate slogan is "We Know Stuff!" (and we do). His fondness for IBM dates back embarrassingly many decades. You can reach David at mertz@gnosis.cx; you can investigate all aspects of his life at his personal Web page. Suggestions and recommendations on past or future articles are welcome. Check out his book, Text Processing in Python.
|
Rate this page
|  |