 | Level: Introductory Peter Seebach (developerworks@seebs.plethora.net), Freelance author, Plethora.net
08 Mar 2006 Technology professionals have loosely used the term "UNIX" since the first person had to explain the difference between the Berkeley and AT&T flavors, so it's not surprising to find as many UNIX® standards as there are versions of the operating system. Peter Seebach wades through the wellspring of UNIX standards and sorts them out for you, concluding that the rumors of the death of UNIX are (as usual) greatly exaggerated.
One of the nice things about UNIX development is your variety of
choices. A given computer might easily run two or three varieties of UNIX.
A PowerPC® Mac will typically run OS X (mostly a BSD under the hood, even
though it uses a Mach kernel), NetBSD, or Linux® without complaint. The IBM
POWER™-based big iron has Linux and AIX® ports, and some varieties will also
run BSD.
If you want to give some thought to portability, then, you can fairly easily get a lot of exposure to a variety of systems with only a small amount
of hardware. Deciding which standards to follow might seem daunting. A
good starting place is the system man pages, which will often indicate
which standards they comply with, and even which features are
extensions. Don't have the hardware budget to fill your basement with old iron? You can run a lot of systems on emulated hardware (such as vmware) for testing or man page reference.
And yet it moves
In a fit of irony, some people have complained that the diversity of UNIX
standards makes UNIX a less stable target than a more monolithic
environment with a single vendor, for instance, Microsoft® Windows®. This simply does
not match the reality of software development. In the past 20 years,
developers for "the same" desktop platform ("whatever Microsoft ships")
have been told that the API to target is (in this order):
- DOS
- Win16
- OS/2
- Win32
- WinNT
- WinXP
- and most recently .NET.
Of course, that list is from last year, and now the "stable" target that
you should be developing for, if you have an eye for the future, is Vista.
It hasn't been quite as bad in the Macintosh world, where the number of
major API changes has been limited: classic single-tasking Mac OS, classic
multitasking Mac OS (System 7), Carbon (System 8/9 and preview of OS X), and
Cocoa (OS X), but even there, the cost of migration has been significant.
At least OS X finally offers a stable UNIX API for the back-end part of
programs, allowing developers to ignore the API creep except in GUI code.
By contrast, twenty-year-old UNIX utilities still compile and
run. A new desktop computing API will come and everyone will have
to rewrite for it, but mountains will erode away before read() and write() stop
working. This is the reason that all the hassle of formal UNIX standards
has had so little effect on practical UNIX software development; the core
API is simple, clean, and well-designed, and there is no need to change it
significantly.
In fact, the stability of this API might be one of the reasons that many Mac
users aren't very worried about a proposed shift from one processor architecture
to another. UNIX users have been switching hardware platforms since the
1970s; it's no big deal. Users switching from MIPS to POWER, or from PA-RISC
to Itanium, or from Itanium to EM64T, are just as comforted by the knowledge
that the OS will keep running.
What is UNIX,
anyway?
Technical professionals use the term UNIX fairly loosely;
for instance, some people have quipped that Plan 9 (see Resources) is more Unix than UNIX
is. (The difference in capitalization has been taken by some to
distinguish between the design philosophy and the trademark; the trademark
is all capitals, but the design philosophy is just a proper name.)
Since the first person had to explain that a program ran only on
Berkeley Unix, or only on AT&T UNIX, or only on some other operating
system, there have been debates about what exactly is or is not part of
UNIX. The Berkeley/System III (and later System V) splits led to fierce
competition in the UNIX market. Early attempts at standardization of the
Berkeley/System V camps ironically led to even more splits! Formal UNIX
standardization has been elaborate, baroque, and inefficient, against a
backdrop of code which has moved from one system to another with porting
efforts ranging from "minimal" to "none." This frequently surprises people
who are used to fighting tooth and nail with vendors to get enough standards
compliance to have any hope of compiling an application.
Even when there's divergence, UNIX systems typically provide solid API
documentation. You might not like migrating from sockets to streams, or vice
versa, but at least the documentation for both is complete. UNIX systems
are not big on hidden or undocumented APIs.
Just as there are many varieties of UNIX, there are many UNIX
standards:
- Probably the oldest standard that people still refer to is AT&T's
1985 System V Interface Definition (SVID). This standard shows up, for
instance, in man pages describing the standards compliance of functions
that have been in the C library "forever."
- Meanwhile, X/Open (now the Open Group) was developing "portability
guides" with names like XPG2, XPG3, and so on. XPG1 was actually released
in 1995. The XPG guides are largely subsumed into newer specs, but once
again, are still referred to sometimes in documentation.
- The IEEE's POSIX standard showed up in 1990 with updates in 1992 and 1993
and a second edition in 1996. It's still a viable standard, although it
has suffered from poor accessibility. POSIX specs have names like 1003.x; for
instance, 1003.1 and 1003.2, which refer to different parts of the standard,
or 1003.1-1988 and 1003.1-1990, which refer to two versions of the
standard.
- The fairly ominous sounding "Spec 1170" (also known as "UNIX 98" or "Single Unix
Specification") is probably the most complete specification; it is produced by the Open Group, and is effectively a descendant of the XPG series. In
practice, this is "the" UNIX standard these days, although it's a little
large; this has had an impact on conformance testing.
- The Linux Standards Base is not strictly a UNIX standard, but it's a
standardization effort relevant to a very large number of developers working
with code designed to run "on UNIX."
In practice, very little conflict exists between UNIX specifications,
and distinctions are often out in the boundaries no one uses anyway, so
people tend to just write clean UNIX code and not worry about it. While
many people have anxiety about the Keystone Kop feel of UNIX
standardization, there hasn't been a practical problem with UNIX
software portability in a long time.
What are we specifying, anyway?
You can look at OS specifications in two very different ways:
one is from the point of view of a developer trying to port an
application, and the other
is from the point of view of the user trying to interact with the
system.
UNIX conveniently blurs this distinction. The primary user interface
is also one of the primary development environments; therefore, UNIX
specifications often cover not only the C language API, but also the shell
environment and many of the core utilities shell programmers rely on. This
is a far cry from the total lack of native automation many systems offer,
and the close integration of automation tools with the way users normally
work turns out to be an advantage.
However, these distinctions remain: it's not unheard of for the level
of conformance of the C programming environment and the shell environment
to be very different. The shell environment, especially system
administration, is where you find the most variance. C implementations
everywhere will provide printf(), but whether
system startup involves /etc/inittab, /etc/rc, or Something Else is pot
luck. Other systems have generally not had this issue to deal with; there's
only one Mac OS vendor, one Amiga vendor, and so on, so any given operating
system will be self-consistent. (Of course, vendors of totally different operating systems have no portability between them.)
Some specifications have gone further, specifying significant aspects
of file system hierarchy and layout. This helps system administrators and
software developers, but it can occasionally codify a questionable
decision that really needs reviewing. In some cases, existing practice
in a field reflects a decision a college student at Berkeley made at 3 AM.
The primary danger here is that this could be a barrier to future improvements.
SVID: System V Interface Definition
The System V Interface Definition nominally describes the interface to
System V UNIX; SVID corresponds to System V Release 2 (SVR2). It is not
clear that even System V ever completely conformed to the SVID
specification --
it's a fairly large specification and has become less important as more
recent specifications have been adopted. You can still see references to SVID in
the STANDARDS section of many man pages, but in practice it's not
significantly used anymore.
SVID was released in 1985 and still received references in the
early '00s. In the additional releases, SVID2 corresponds to
SVR3, and SVID3 corresponds to SVR4. These are the versions most likely to
show up in man pages describing conformance.
POSIX
The POSIX name (coined by Richard Stallman as a vast improvement over
the originally proposed name, "IEEE-IX," which strikes me as the sort of
sound a car alarm makes) refers to a family of related
standards that covers everything from a C programming API to a shell
environment. The POSIX spec is predominantly associated with UNIX
environments, but some non-UNIX systems have adopted all or part of the
POSIX API.
The POSIX specification reflects a generalization of UNIX terminology and API;
in some cases, the POSIX specification of a feature is much looser than
the behavior everyone expects from UNIX systems.
POSIX does not imply UNIX! It is an attempt to generalize the
traditional UNIX API to match a variety of operating system designs. So,
while POSIX implies a hierarchical file system, you might well be able to
manage a reasonably conformant implementation on a system with no real
top-level hierarchy. (For instance, Cygwin's mapping of Windows drive
letters to /cygdrive/c, /cygdrive/d, and so on.) However, people who claim
to be programming "for POSIX" frequently only test on UNIX. As a
side-effect, huge quantities of code allegedly developed "for POSIX
systems" run only on the more UNIX-like POSIX systems since they make many
assumptions not grounded in the specification. Programs that interact closely with
the network stack, refer to the format of the password file, and otherwise
assume they're in a traditional UNIX environment may not be as portable as
the authors think. For a particularly vivid example, try porting a Linux
program that relies heavily on the /proc file system to any other UNIX!
POSIX provides a C programming API and a shell environment
specification. The shell environment defines not only the shell
programming language, but also the core behaviors of a broad range of
utilities. Most UNIX-like systems (such as Linux, OS X, or the BSDs, or
even real, live certified UNIX from the big iron vendors) offer a number of
extensions, both in the behaviors of these utilities and in actual
additional utilities.
Availability as a weakness
One of the weaknesses of POSIX is the availability of the standard
itself. You can get it from IEEE, but you have to buy it and it
costs real money. If you're a commercial OS vendor, it's no big deal.
If you're a programmer, it could be a comparatively large obstacle.
A single programmer who wants a copy of the POSIX specification would have to
pay US$974 for it. That gets a one-year subscription; you are not licensed
to continue referring to the standard thereafter. By comparison, you can obtain the C
standard from ISO for a fraction of that price and once
you've bought it, it's yours. If you're willing to buy from NCITS rather
than ISO, you can get the ISO C standard for US$18.
It's hard to estimate the exact impact of the cost of getting access
to POSIX, but there does seem to be some. Since compliance with other
standards often implies POSIX compliance, some developers simply pick a
different standard and assume everything will work. Speaking as a
developer targeting UNIX, though, I can say that I hate it; I would much
rather see IEEE's pricing model adapt to reflect the needs of the
computing industry.
The pricing and licensing issues can result in situations where,
although POSIX compliance is the formal term of a specification, neither
party finds it convenient to refer to the POSIX specification itself, preferring to
refer to related or similar specifications.
In a fit of irony, the Single Unix Spec, which incorporates POSIX, is
freely available.
X/Open portability guides and the Single UNIX Standard
While POSIX tried to specify the least common denominator, X/Open
(later renamed to The Open Group) tried to specify the new directions
that were already being taken. Their first portability guide, XPG1,
covered sockets as early as 1985. By XPG4, in 1992, they had the X11 API
and System V curses support and full compliance with the C
standard.
The X/Open work ended up being accepted as a baseline that was
vendor-neutral enough for most people to work with -- it led to the
development of the Single UNIX Standard and The Open Group acquiring the
UNIX trademark.
Today, the Single UNIX Standard incorporates the various tangled
specifications as a single unified one that everyone can, in theory,
comply with and develop for. Furthermore, you can browse it online, or
buy a CD of the entire thing for US$243. The CD is not a temporary license, but a convenience copy of data you can browse online. You might wish POSIX
was as interested in disseminating its standard, but instead, you have
to go the long way around.
Unified but unused
The downside to a single unified standard is its size. Trying
to completely test compliance with the standard is an expensive
proposition; even the larger vendors have a hard time completing the
testing fast enough to do anyone any good. You might find it academically
interesting to know that, in fact, a software version which shipped a year
ago was compliant with a specification, but if it's not the version shipping today,
it might not matter. Some vendors have simply started calling systems
"UNIX" without any formal testing; in practice, if you announce that a
modern release of NetBSD "isn't UNIX," people are just going to laugh at
you.
Of course, this doesn't mean that people aren't using the standard. It
just means that conformance testing isn't as prevalent as it used to be.
In reality, everyone conforms about as well to the Single UNIX Standard
(and to other standards) as they comply with more rigorously tested
standards. In the words of a developer when asked about the technical
challenges of porting a major application to Linux from Solaris: "We typed
make." (This story is somewhat apocryphal and has
been attributed to both Informix and Oracle.)
Developers are concerned that the broadness of some of these standards
bloats the system; however, improved storage capacity and processing
capabilities seem to be keeping up handily. The big concern isn't with
the implementation, but the exhaustive and complete conformance testing.
Nonetheless, some systems provide support for stripping out some of
the additional features nominally required by the unified standards to
allow smaller or faster runtime systems. An obvious leader in this is
eCos, but Linux and BSD systems are quite amenable to having substantial
functionality stripped out. Embedded Linux systems can be very slim compared
to a desktop installation.
Linux
Linux is, in practice, a UNIX standard. Only it's not a standard, and
the documentation sometimes lags implementation. Nonetheless, Linux cast
its own vote, or more often votes, in many of the classic UNIX design
debates. From the perspective of a developer who's seen many Unix-like
systems, Linux is probably mostly sort of similar to System V. The heavy
focus on GNU utilities gives a sort of surreal combination of Berkeley and
System V features, but if you have to guess whether Linux does something
the Berkeley way or the System V way, go with System V. This is
especially true of system startup; nearly all Linux systems use the System
V /etc/inittab and /etc/rc.d structure, or something very close to it.
Linux offers an unusually rich variety of extensions, and some
programs run only on Linux, but not on any other UNIX-like system.
This is a frustrating state for software developers, as new programmers
who cut their teeth on Linux sometimes need a few days to get their sea
legs in a non-Linux environment. The unfortunate tendency not to distinguish
between portable features and local extensions has made life harder for a
lot of programmers.
Extensions: A toss-up of functionality
A common misunderstanding is the assumption that if a system complies
with the POSIX specification, programs which work on that system will work
on other POSIX systems.
Like the C standard it draws on, POSIX allows a number of extensions.
For instance, only five lowercase letters are not options
to Berkeley ls. (They're "ejvyz" if you're
wondering.) GNU utilities typically take a broad variety of additional
option flags, and new utilities, such as progress(1) ("feed input
to a command, displaying a progress bar"), are added to systems all the
time. Programs depending on these extensions might not be fully portable
from one POSIX system to another.
Furthermore, some features which might seem fairly fundamental might be
only partially functional. (A classic C example is the system() function -- it might be defined to indicate
that a command processor is not available and do absolutely nothing else.)
Fundamental features, such as BSD sockets, were not included in POSIX
to start; sockets are specified in P1003.1g. The specification in the
standard was not quite compatible with some existing implementations,
though. The POSIX specification has not always matched up well with
real-world practices -- some vendors have politely just ignored the POSIX
specifications.
The GNU utilities most Linux systems use have the quirk that they do
not run as POSIX utilities by default. Many depend on an environment
variable, POSIXLY_CORRECT, to force POSIX
compliance; the default behaviors are subtly different. (In many cases,
the non-POSIX behavior is more convenient to many users; for instance, the
selection of 1024-byte blocks instead of 512-byte blocks as a default is
obviously correct, no matter what the historical practice was.)
It's not UNIX, but it's Unix
One of the interesting side-effects of standardization is the
existence of systems which are clearly not based on UNIX but which comply
with some substantial portion of the various UNIX standards, especially
POSIX. (At one point, Microsoft advertised Windows NT as providing
compliance with POSIX, though some critics argued that the compliance
wasn't useful for real applications.)
Some systems provide near-compliance with POSIX while omitting whole
subsystems. For instance, eCos (see Resources)
provides much of the POSIX API, but being a single-process environment, it
provides no real support for the process management API and no shell
environment. For many programs this is good enough and the complete API
isn't necessary. Even partial compliance can substantially contribute to
porting efforts.
A lot of systems provide at least a partial UNIX-like API in part
because so much software has been written for it, but also in part because
the core of the UNIX API has done a particularly good job of providing the
essential abstractions most applications need to interact with files
and do their work.
Rough consensus and running code
"We reject kings, presidents and voting. We believe in rough consensus and
running code." -- Dave Clark
The IETF's famous slogan fits surprisingly well to the practical reality
of UNIX standardization.
No matter how confusing it might be to sort through the range of
committees and organizations pitching UNIX standards, the underlying meat
of standardization is there -- programs for one UNIX system will generally
run on another. In fact, it's been there for a fairly long time. With
rare exceptions, porting hassles between UNIX systems are long forgotten.
If you're a developer, and you want a stable platform, UNIX remains the
only real option; of course, that "option" is actually a broad range of
options, from NetBSD to Linux to Solaris, but given consistent code
portability, this can only be an advantage.
UNIX has a variety of different vendors producing products which are
compatible with each other, rather than a single vendor which produces
products which aren't -- an innovative strategy; perhaps competitors will
finally adopt it, and if not, there's always next time!
Resources Learn
- It can't be a complete history
of UNIX without "The
nitty-gritty on the C committee" (developerWorks, November 2004),
another Standards and specs column that shows you what goes into
the making of an ISO standard.
- As always, Wikipedia is a good
place to start looking for definitions of POSIX and SVID.
- Get a history of UNIX
standardization from someone who should know -- Eric Raymond.
- The Free Standards Group supports
ongoing work on free (and perhaps faster) standardization for UNIX-like
systems.
-
Despite persistent
rumors and
prognostications to the contrary, Unix is most certainly not
dead yet.
- Plan 9 from Bell Labs is
arguably the current state of the art in Unix design philosophy. It's not
UNIX, but it's definitely Unix.
- eCos is an open source,
royalty-free, RTOS intended for embedded applications that, while it
offers a single-process environment, still uses the familiar POSIX API.
- "Unix
utilities, Part 1," "Part
2," "Part
3," and "Part
4," (developerWorks, May-June 2001) introduces a fascinating component
architecture that gets back to the basics -- UNIX.
- Visit the developerWorks Power
Architecture zone for all-things Power Architecture technology.
- Visit the developerWorks Linux
zone to expand your Linux skills.
Get products and technologies
Discuss
About the author  | 
|  | Peter Seebach is not POSIX compliant, but runs most UNIX applications. He types "make" a lot. |
Rate this page
|  |