Standards and specs: Not by UNIX alone

A maze of twisty little standards, all alike

Technology professionals have loosely used the term "UNIX" since the first person had to explain the difference between the Berkeley and AT&T flavors, so it's not surprising to find as many UNIX® standards as there are versions of the operating system. Peter Seebach wades through the wellspring of UNIX standards and sorts them out for you, concluding that the rumors of the death of UNIX are (as usual) greatly exaggerated.

One of the nice things about UNIX development is your variety of choices. A given computer might easily run two or three varieties of UNIX. A PowerPC® Mac will typically run OS X (mostly a BSD under the hood, even though it uses a Mach kernel), NetBSD, or Linux® without complaint. The IBM POWER™-based big iron has Linux and AIX® ports, and some varieties will also run BSD.

If you want to give some thought to portability, then, you can fairly easily get a lot of exposure to a variety of systems with only a small amount of hardware. Deciding which standards to follow might seem daunting. A good starting place is the system man pages, which will often indicate which standards they comply with, and even which features are extensions. Don't have the hardware budget to fill your basement with old iron? You can run a lot of systems on emulated hardware (such as vmware) for testing or man page reference.

And yet it moves

In a fit of irony, some people have complained that the diversity of UNIX standards makes UNIX a less stable target than a more monolithic environment with a single vendor, for instance, Microsoft® Windows®. This simply does not match the reality of software development. In the past 20 years, developers for "the same" desktop platform ("whatever Microsoft ships") have been told that the API to target is (in this order):

  • DOS
  • Win16
  • OS/2
  • Win32
  • WinNT
  • WinXP
  • and most recently .NET.

Of course, that list is from last year, and now the "stable" target that you should be developing for, if you have an eye for the future, is Vista.

It hasn't been quite as bad in the Macintosh world, where the number of major API changes has been limited: classic single-tasking Mac OS, classic multitasking Mac OS (System 7), Carbon (System 8/9 and preview of OS X), and Cocoa (OS X), but even there, the cost of migration has been significant. At least OS X finally offers a stable UNIX API for the back-end part of programs, allowing developers to ignore the API creep except in GUI code.

By contrast, twenty-year-old UNIX utilities still compile and run. A new desktop computing API will come and everyone will have to rewrite for it, but mountains will erode away before read() and write() stop working. This is the reason that all the hassle of formal UNIX standards has had so little effect on practical UNIX software development; the core API is simple, clean, and well-designed, and there is no need to change it significantly.

In fact, the stability of this API might be one of the reasons that many Mac users aren't very worried about a proposed shift from one processor architecture to another. UNIX users have been switching hardware platforms since the 1970s; it's no big deal. Users switching from MIPS to POWER, or from PA-RISC to Itanium, or from Itanium to EM64T, are just as comforted by the knowledge that the OS will keep running.

What is UNIX, anyway?

Technical professionals use the term UNIX fairly loosely; for instance, some people have quipped that Plan 9 (see Resources) is more Unix than UNIX is. (The difference in capitalization has been taken by some to distinguish between the design philosophy and the trademark; the trademark is all capitals, but the design philosophy is just a proper name.)

Since the first person had to explain that a program ran only on Berkeley Unix, or only on AT&T UNIX, or only on some other operating system, there have been debates about what exactly is or is not part of UNIX. The Berkeley/System III (and later System V) splits led to fierce competition in the UNIX market. Early attempts at standardization of the Berkeley/System V camps ironically led to even more splits! Formal UNIX standardization has been elaborate, baroque, and inefficient, against a backdrop of code which has moved from one system to another with porting efforts ranging from "minimal" to "none." This frequently surprises people who are used to fighting tooth and nail with vendors to get enough standards compliance to have any hope of compiling an application.

Even when there's divergence, UNIX systems typically provide solid API documentation. You might not like migrating from sockets to streams, or vice versa, but at least the documentation for both is complete. UNIX systems are not big on hidden or undocumented APIs.

Just as there are many varieties of UNIX, there are many UNIX standards:

  • Probably the oldest standard that people still refer to is AT&T's 1985 System V Interface Definition (SVID). This standard shows up, for instance, in man pages describing the standards compliance of functions that have been in the C library "forever."
  • Meanwhile, X/Open (now the Open Group) was developing "portability guides" with names like XPG2, XPG3, and so on. XPG1 was actually released in 1995. The XPG guides are largely subsumed into newer specs, but once again, are still referred to sometimes in documentation.
  • The IEEE's POSIX standard showed up in 1990 with updates in 1992 and 1993 and a second edition in 1996. It's still a viable standard, although it has suffered from poor accessibility. POSIX specs have names like 1003.x; for instance, 1003.1 and 1003.2, which refer to different parts of the standard, or 1003.1-1988 and 1003.1-1990, which refer to two versions of the standard.
  • The fairly ominous sounding "Spec 1170" (also known as "UNIX 98" or "Single Unix Specification") is probably the most complete specification; it is produced by the Open Group, and is effectively a descendant of the XPG series. In practice, this is "the" UNIX standard these days, although it's a little large; this has had an impact on conformance testing.
  • The Linux Standards Base is not strictly a UNIX standard, but it's a standardization effort relevant to a very large number of developers working with code designed to run "on UNIX."

In practice, very little conflict exists between UNIX specifications, and distinctions are often out in the boundaries no one uses anyway, so people tend to just write clean UNIX code and not worry about it. While many people have anxiety about the Keystone Kop feel of UNIX standardization, there hasn't been a practical problem with UNIX software portability in a long time.

What are we specifying, anyway?

You can look at OS specifications in two very different ways: one is from the point of view of a developer trying to port an application, and the other is from the point of view of the user trying to interact with the system.

UNIX conveniently blurs this distinction. The primary user interface is also one of the primary development environments; therefore, UNIX specifications often cover not only the C language API, but also the shell environment and many of the core utilities shell programmers rely on. This is a far cry from the total lack of native automation many systems offer, and the close integration of automation tools with the way users normally work turns out to be an advantage.

However, these distinctions remain: it's not unheard of for the level of conformance of the C programming environment and the shell environment to be very different. The shell environment, especially system administration, is where you find the most variance. C implementations everywhere will provide printf(), but whether system startup involves /etc/inittab, /etc/rc, or Something Else is pot luck. Other systems have generally not had this issue to deal with; there's only one Mac OS vendor, one Amiga vendor, and so on, so any given operating system will be self-consistent. (Of course, vendors of totally different operating systems have no portability between them.)

Some specifications have gone further, specifying significant aspects of file system hierarchy and layout. This helps system administrators and software developers, but it can occasionally codify a questionable decision that really needs reviewing. In some cases, existing practice in a field reflects a decision a college student at Berkeley made at 3 AM. The primary danger here is that this could be a barrier to future improvements.

SVID: System V Interface Definition

The System V Interface Definition nominally describes the interface to System V UNIX; SVID corresponds to System V Release 2 (SVR2). It is not clear that even System V ever completely conformed to the SVID specification -- it's a fairly large specification and has become less important as more recent specifications have been adopted. You can still see references to SVID in the STANDARDS section of many man pages, but in practice it's not significantly used anymore.

SVID was released in 1985 and still received references in the early '00s. In the additional releases, SVID2 corresponds to SVR3, and SVID3 corresponds to SVR4. These are the versions most likely to show up in man pages describing conformance.

POSIX

The POSIX name (coined by Richard Stallman as a vast improvement over the originally proposed name, "IEEE-IX," which strikes me as the sort of sound a car alarm makes) refers to a family of related standards that covers everything from a C programming API to a shell environment. The POSIX spec is predominantly associated with UNIX environments, but some non-UNIX systems have adopted all or part of the POSIX API.

The POSIX specification reflects a generalization of UNIX terminology and API; in some cases, the POSIX specification of a feature is much looser than the behavior everyone expects from UNIX systems.

POSIX does not imply UNIX! It is an attempt to generalize the traditional UNIX API to match a variety of operating system designs. So, while POSIX implies a hierarchical file system, you might well be able to manage a reasonably conformant implementation on a system with no real top-level hierarchy. (For instance, Cygwin's mapping of Windows drive letters to /cygdrive/c, /cygdrive/d, and so on.) However, people who claim to be programming "for POSIX" frequently only test on UNIX. As a side-effect, huge quantities of code allegedly developed "for POSIX systems" run only on the more UNIX-like POSIX systems since they make many assumptions not grounded in the specification. Programs that interact closely with the network stack, refer to the format of the password file, and otherwise assume they're in a traditional UNIX environment may not be as portable as the authors think. For a particularly vivid example, try porting a Linux program that relies heavily on the /proc file system to any other UNIX!

POSIX provides a C programming API and a shell environment specification. The shell environment defines not only the shell programming language, but also the core behaviors of a broad range of utilities. Most UNIX-like systems (such as Linux, OS X, or the BSDs, or even real, live certified UNIX from the big iron vendors) offer a number of extensions, both in the behaviors of these utilities and in actual additional utilities.

Availability as a weakness

One of the weaknesses of POSIX is the availability of the standard itself. You can get it from IEEE, but you have to buy it and it costs real money. If you're a commercial OS vendor, it's no big deal. If you're a programmer, it could be a comparatively large obstacle.

A single programmer who wants a copy of the POSIX specification would have to pay US$974 for it. That gets a one-year subscription; you are not licensed to continue referring to the standard thereafter. By comparison, you can obtain the C standard from ISO for a fraction of that price and once you've bought it, it's yours. If you're willing to buy from NCITS rather than ISO, you can get the ISO C standard for US$18.

It's hard to estimate the exact impact of the cost of getting access to POSIX, but there does seem to be some. Since compliance with other standards often implies POSIX compliance, some developers simply pick a different standard and assume everything will work. Speaking as a developer targeting UNIX, though, I can say that I hate it; I would much rather see IEEE's pricing model adapt to reflect the needs of the computing industry.

The pricing and licensing issues can result in situations where, although POSIX compliance is the formal term of a specification, neither party finds it convenient to refer to the POSIX specification itself, preferring to refer to related or similar specifications.

In a fit of irony, the Single Unix Spec, which incorporates POSIX, is freely available.

X/Open portability guides and the Single UNIX Standard

While POSIX tried to specify the least common denominator, X/Open (later renamed to The Open Group) tried to specify the new directions that were already being taken. Their first portability guide, XPG1, covered sockets as early as 1985. By XPG4, in 1992, they had the X11 API and System V curses support and full compliance with the C standard.

The X/Open work ended up being accepted as a baseline that was vendor-neutral enough for most people to work with -- it led to the development of the Single UNIX Standard and The Open Group acquiring the UNIX trademark.

Today, the Single UNIX Standard incorporates the various tangled specifications as a single unified one that everyone can, in theory, comply with and develop for. Furthermore, you can browse it online, or buy a CD of the entire thing for US$243. The CD is not a temporary license, but a convenience copy of data you can browse online. You might wish POSIX was as interested in disseminating its standard, but instead, you have to go the long way around.

Unified but unused

The downside to a single unified standard is its size. Trying to completely test compliance with the standard is an expensive proposition; even the larger vendors have a hard time completing the testing fast enough to do anyone any good. You might find it academically interesting to know that, in fact, a software version which shipped a year ago was compliant with a specification, but if it's not the version shipping today, it might not matter. Some vendors have simply started calling systems "UNIX" without any formal testing; in practice, if you announce that a modern release of NetBSD "isn't UNIX," people are just going to laugh at you.

Of course, this doesn't mean that people aren't using the standard. It just means that conformance testing isn't as prevalent as it used to be. In reality, everyone conforms about as well to the Single UNIX Standard (and to other standards) as they comply with more rigorously tested standards. In the words of a developer when asked about the technical challenges of porting a major application to Linux from Solaris: "We typed make." (This story is somewhat apocryphal and has been attributed to both Informix and Oracle.)

Developers are concerned that the broadness of some of these standards bloats the system; however, improved storage capacity and processing capabilities seem to be keeping up handily. The big concern isn't with the implementation, but the exhaustive and complete conformance testing.

Nonetheless, some systems provide support for stripping out some of the additional features nominally required by the unified standards to allow smaller or faster runtime systems. An obvious leader in this is eCos, but Linux and BSD systems are quite amenable to having substantial functionality stripped out. Embedded Linux systems can be very slim compared to a desktop installation.

Linux

Linux is, in practice, a UNIX standard. Only it's not a standard, and the documentation sometimes lags implementation. Nonetheless, Linux cast its own vote, or more often votes, in many of the classic UNIX design debates. From the perspective of a developer who's seen many Unix-like systems, Linux is probably mostly sort of similar to System V. The heavy focus on GNU utilities gives a sort of surreal combination of Berkeley and System V features, but if you have to guess whether Linux does something the Berkeley way or the System V way, go with System V. This is especially true of system startup; nearly all Linux systems use the System V /etc/inittab and /etc/rc.d structure, or something very close to it.

Linux offers an unusually rich variety of extensions, and some programs run only on Linux, but not on any other UNIX-like system. This is a frustrating state for software developers, as new programmers who cut their teeth on Linux sometimes need a few days to get their sea legs in a non-Linux environment. The unfortunate tendency not to distinguish between portable features and local extensions has made life harder for a lot of programmers.

Extensions: A toss-up of functionality

A common misunderstanding is the assumption that if a system complies with the POSIX specification, programs which work on that system will work on other POSIX systems.

Like the C standard it draws on, POSIX allows a number of extensions. For instance, only five lowercase letters are not options to Berkeley ls. (They're "ejvyz" if you're wondering.) GNU utilities typically take a broad variety of additional option flags, and new utilities, such as progress(1) ("feed input to a command, displaying a progress bar"), are added to systems all the time. Programs depending on these extensions might not be fully portable from one POSIX system to another.

Furthermore, some features which might seem fairly fundamental might be only partially functional. (A classic C example is the system() function -- it might be defined to indicate that a command processor is not available and do absolutely nothing else.)

Fundamental features, such as BSD sockets, were not included in POSIX to start; sockets are specified in P1003.1g. The specification in the standard was not quite compatible with some existing implementations, though. The POSIX specification has not always matched up well with real-world practices -- some vendors have politely just ignored the POSIX specifications.

The GNU utilities most Linux systems use have the quirk that they do not run as POSIX utilities by default. Many depend on an environment variable, POSIXLY_CORRECT, to force POSIX compliance; the default behaviors are subtly different. (In many cases, the non-POSIX behavior is more convenient to many users; for instance, the selection of 1024-byte blocks instead of 512-byte blocks as a default is obviously correct, no matter what the historical practice was.)


It's not UNIX, but it's Unix

One of the interesting side-effects of standardization is the existence of systems which are clearly not based on UNIX but which comply with some substantial portion of the various UNIX standards, especially POSIX. (At one point, Microsoft advertised Windows NT as providing compliance with POSIX, though some critics argued that the compliance wasn't useful for real applications.)

Some systems provide near-compliance with POSIX while omitting whole subsystems. For instance, eCos (see Resources) provides much of the POSIX API, but being a single-process environment, it provides no real support for the process management API and no shell environment. For many programs this is good enough and the complete API isn't necessary. Even partial compliance can substantially contribute to porting efforts.

A lot of systems provide at least a partial UNIX-like API in part because so much software has been written for it, but also in part because the core of the UNIX API has done a particularly good job of providing the essential abstractions most applications need to interact with files and do their work.

Rough consensus and running code

"We reject kings, presidents and voting. We believe in rough consensus and running code." -- Dave Clark

The IETF's famous slogan fits surprisingly well to the practical reality of UNIX standardization.

No matter how confusing it might be to sort through the range of committees and organizations pitching UNIX standards, the underlying meat of standardization is there -- programs for one UNIX system will generally run on another. In fact, it's been there for a fairly long time. With rare exceptions, porting hassles between UNIX systems are long forgotten. If you're a developer, and you want a stable platform, UNIX remains the only real option; of course, that "option" is actually a broad range of options, from NetBSD to Linux to Solaris, but given consistent code portability, this can only be an advantage.

UNIX has a variety of different vendors producing products which are compatible with each other, rather than a single vendor which produces products which aren't -- an innovative strategy; perhaps competitors will finally adopt it, and if not, there's always next time!

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=105177
ArticleTitle=Standards and specs: Not by UNIX alone
publish-date=03082006