Standards and specs
The Interchange File Format (IFF)
Simple, portable, and extensible data storage
This content is part # of # in the series: Standards and specs
This content is part of the series:Standards and specs
Stay tuned for additional content in this series.
The Interchange File Format (IFF) standard is widely regarded as long dead, and indeed, no one uses it anymore, except that nearly everyone uses it sometimes. Many believe the IFF standard is an Amiga graphics standard, and certainly, there have been a great many graphics files saved in the IFF format. However, IFF is not just a graphics format. It has been used for graphics, audio, text, saved games, and more. Electronic Arts actually developed the standard, back when it was a software company and not just a video game company.
The IFF standard was introduced in 1985 and has a number of characteristics which date it. The most obvious is the lack of support for files exceeding 4GB in size.
The IFF format has long been abandoned. It's not that the Web site for it is down; it predates Web sites and never had one. There is no central registry to speak of anymore. However, the IFF format is still in use. The AIFF audio file format is simply one particular instance of an IFF file. The Quetzal save format shared by Z-machine interpreters is an IFF format. The file format survives because it does a good job of solving a number of the central and recurring problems which lead people to want a "file format" in the first place.
While the IFF format might not be actively maintained, or in very widespread use, it offers a lot of insight into generic file format standards and issues that might come up in dealing with them.
A brief overview
The easiest way to understand what the IFF standard aimed to accomplish is to look at a brief overview of it. Most generically, an IFF file consists of an IFF chunk. An IFF chunk is a four-byte type followed by data. The type is generally selected to be four ASCII characters in a row, which in some way hint at the type of data encoded. How is the data encoded? That depends on the type. A file must be of one of three types: 'FORM,' 'LIST,' or 'CAT .' (There is a space in the CAT name.) The FORM type is the simple type containing only a length (a 32-bit value) and some other chunk. The LIST and CAT types hold multiple chunks. Every chunk provides its size, so a program can skip over a chunk it doesn't know how to handle.
A number of common IFF types are widely enough known to be supported in modern software. The two most obvious are AIFF and ILBM, which are a sound format and a graphics format (InterLeaved BitMap), respectively. The JFIF (JPEG) image format has a similar header, too, and the PNG specification has imitated loosely some of the concepts. (One major change is that PNG files are designed to be easily serially writeable. IFF writers generally rely on the ability to seek back and update data.)
The IFF format is a binary file format. It is intended to be easy to read, but not human readable with the naked eye. The decision to encode sizes at the head of chunks simplifies processing when reading, but can make it harder to write files. Many programs which write IFF files depend on seeking back in the file to "fill in" the size once it's known, which can be awkward if you want to use IFF files in a pipeline. Note that no special marker is given for the end of a chunk; after all, you already know what size it is.
A couple of quirks reflect the 68000 architecture on which the IFF format was first developed. One is the 32-bit, big-endian numbers. The other is that chunks are always padded to an even number of bytes. The chunk length is not padded, but if your chunk length is three, you write a fourth byte after your real contents, so the reader doesn't have to worry about two-byte alignment issues. This provides a substantial performance advantage on many architectures, even when unaligned accesses are possible.
Some formats are at least similar to IFF. For instance, the RIFF file format is nearly identical to IFF, only it uses little-endian integers. For the arguable convenience of slightly faster interpretation of a four-byte file size on x86 hardware, compatibility was discarded -- in retrospect, maybe not a good choice. It's one more standard to keep track of, and one more format for software to interpret.
The IFF specification's allowance for new chunk types is already suspiciously similar to XML, but the real kicker is that, since chunks can be nested, the IFF spec provides for defining your own chunk type, which defines its own contents. Since the whole chunk is skipped by programs unfamiliar with it, there's no namespace clash here. On the other hand, doing this too much violates the point of the spec. This is supposed to be the Interchange File Format, not the Only I Know What This Is File Format. This is a problem XML tools can face too, however. It is intrinsic to allowing later definition of new data types.
Another thing particularly similar to IFF files is the Macintosh resource fork. The resource file format has a different basic structure: you can have many resources of each type, but each must have a unique numeric ID. By contrast, with IFF, you can have as many as you have space for, but you have to come up with your own convention for assigning tags or names to them. In fact, this might be an excellent use of the NAME chunk type. The AIFF FORM can have a NAME chunk attached to it, so you could scan a file for FORM chunks containing an AIFF chunk with a given name. The big feature this implementation would lack is the ability to randomly access specific resources and edit them without rewriting the whole file. You could write an IFF library to handle this -- it's just not a standard feature. Embedding additional tags in data is a popular feature. JPEG files can contain thumbnails and metadata (such as camera orientation or resolution).
Most file formats that provide headers have some conceptual similarity to IFF. What they typically lack is the in-file documentation of which parts are headers and what types of data a file contains. The entire idea of a "file format" is one which arose gradually. Early programs simply wrote their data in whatever order was convenient, then loaded it in the same order. The development of file formats intended to be shared took time. Wikipedia has some interesting material on the history of file formats (see Related topics).
The obvious problem with a very generic file format is the possibility for subtle variances in handling of boundary conditions. This problem is hardly unique to generic file formats. Microsoft® Word used to invert black and white in some non-empty set of well-formed TIFF files. (According to Microsoft tech support, to whom I spoke, Word did not officially support the Version 6 TIFF spec, but some unspecified previous version. This was in 1995, a full three years after the completely free and unencumbered reference implementation was released.) In principle, the generic format has the advantage that developers will be sharing a common file format library for reading and writing the file, and doing their own processing only on specific chunks. In practice, of course, not all developers are so careful.
Some particular instances of IFF files have unusual qualities. For instance, the Amiga's insanely strange Hold-And-Modify video modes are supported directly by the IFF file format. Pictures stored for this representation present an unusual challenge to software intending to display the picture on any other hardware. Even when the overall file format is somewhat standardized or open, a particular file's data might not be easy to use on another platform.
In fact, although you might expect endianness problems, the IFF format simply stated its expectations up front (sizes are big-endian), and I've never heard of any platform-specific problems in IFF code written with even a casual eye towards spec compliance. The weakness is mostly just that a graphics program which reads "IFF files" can't do much with an IFF sound file. Supporting the format doesn't imply supporting the data. This is much like the issues XML readers face. Being able to parse XML doesn't guarantee that you can read the data in a given XML file.
In some cases, data representations reflect hardware assumptions. For instance, the ILBM format reflects Amiga graphics hardware, and Amiga audio files often reflect Amiga audio hardware. (PC audio hardware was frequently different.)
This highlights one of the benefits of standardization: while very few people are using video hardware with a HAM mode (see sidebar), or have access to an Amiga audio chip, video and audio data for that platform are available in an exhaustively documented standard format, with multiple implementations floating around. By contrast, even a current and actively maintained format might be virtually impossible to use reliably. IFF "FTXT" (formatted text) files might not have had all the features of a modern word processor, but the definition is public, and programs supporting the format could generally communicate with each other. By contrast, I can't reliably exchange Microsoft Word documents with other users of Microsoft Word.
One of the most disappointing lessons learned from IFF is simply that a pretty good standard can easily be ignored in favor of a huge variety of proprietary formats. A very large number of formats out there have carefully attempted to recreate just the basic functionality of IFF. In practice, this means that graphics programs are stuck supporting dozens of subtly incompatible formats, each of which has its own innovative set of quirks.
If you were going to do an IFF standard today, the obvious things to fix would be the size limitation (32-bit sizes are no longer large enough for files) and the flat and fairly small namespace for chunk types. The PCI spec's paired 16-bit values for vendor and product, while providing no more data bits than the 32-bit chunk types, are in many ways more useful due to improved namespace management. On the other hand, four bytes here and there is small potatoes today. It might make sense to use four bytes for a 'vendor' and another four for a 'product.' The similarity between this notion and the Macintosh type/creator codes is probably not a pure coincidence. A broader range of universal types could be provided. In fact, such a standard could be a pure extension to traditional IFF, which thoughtfully reserved the names CAT1-CAT9, FOR1-FOR9, and LIS1-LIS9 for future versions.
The IFF file format's basic design goals have survived. The same issues that were important in 1985 seem to be important today. 99% of the files many users manipulate on a regular basis could be happily stored in an IFF file of some sort, and the code used to read and write such files would be comparatively trivial. Well-tested IFF libraries are out there.
The problems IFF left to developers (such as how to tell people what your new chunk type was) have not really been changed that much by newer standards, such as XML. XML does offer some improvements over IFF in extensibility, and especially in human-readability. On the other hand, compare the size of even a simple XML parser to a very complete and robust IFF parser; it's not all wine and roses.
In the end, everything else aside, it comes down to this: my life as a computer user was immeasurably easier on a platform where nearly everything used IFF as, well, an interchange file format. The partial solutions of XML haven't really solved anything. Programs that are storing exactly the same sort of data in XML files still do it in entirely innovative and incompatible ways. Providing for common data types was a good idea. The IFF format provided many of the features much lauded in XML -- for instance, an easy way to skip over data you don't know what to do with.
What's really changed isn't the technical issues; it's the culture. For whatever reason, XML seems to be used primarily as a checkmark. Is our standard open? You betcha, we're using XML. Never mind that it would be easier and cheaper to get our data storage routines through industrial espionage than to figure out what on earth we're storing in it; it's still an "open standard." Unfortunately, our cultural unwillingness to store data in accessible formats is beyond the scope of a standards committee.
- Wikipedia has some information on IFF, too: Wikipedia is often a Good Place to Start with research on the Web.
- Except for when it comes to the history of file formats: if you are researching those, we suggest the Home Page for Saugus, Massachusetts which hosts among other things a document on Filename Extensions List, which attributes the origin of the traditional three-character file extension convention to CP/M (and for more on CP/M, see Eli Freedman's Brief Outline of CP/M and CP/M Commands).
- Here's more on why file formats shouldn't be proprietary: from FSF founder Richard Stallman.
- The IBM Semiconductor solutions technical library hosts a wealth of information -- from specifications and user manuals to product briefs and errata and much more.
- Stay up-to-date with all the Power Architecture-related news: subscribe to the Power Architecture Community Newsletter.