Linux and the storage ecosystem
An open platform for flexible storage
Linux is many things, and its power lies in its ability to flexibly support vastly different usage models. But one of Linux's most important strengths is serving as the workhorse of the storage domain. Thinking about Linux and storage commonly conjures an image of direct-attached disks or the latest file system, but there's much more to storage and Linux than meets the eye. Elements in the Linux are not only stable but also cutting-edge.
This article explores the various storage technologies that keep Linux at the center of the storage universe. Let's start at the bottom—namely, storage architectures—and then work up the stack to features, file systems, and futures (see Figure 1).
Figure 1. Storage stack for exploration in this article
How the storage attaches to the platform is key to the overall storage architecture. Three general architectures cover the vast majority of models:
- Direct-attached storage (DAS)
- Storage area networks (SAN)
- Network-attached storage (NAS)
Of course, Linux supports all three and has evolved with the changes that are occurring with these models.
Figure 2 illustrates the models, with a focus on the location of the file system and storage. The DAS model covers the direct attachment of storage to the platform and represents the vast majority of storage use. The SAN separates the storage from the platform and makes it accessible over one of a number of block storage protocols. Finally, NAS provides a similar architecture to the SAN but operates at the file level.
Figure 2. Major storage architectures
Linux supports a large variety of DAS interfaces, including old standards like parallel Advanced Technology Attachment (ATA)—Integrated Drive Electronics [IDE]/ATA—parallel SCSI, and Fibre Channel as well as new storage interfaces like serial attached SCSI (SAS), serial ATA (SATA), and external SATA (eSATA). You'll also find advanced storage technologies such as USB3 (Extensible Host Controller Interface [xHCI]) and Firewire (Institute of Electrical and Electronics Engineers 1394).
Storage area network
The SAN provides consolidation of block-level storage so that it can be shared among a number of servers. The storage appears local to the servers, where the endpoint storage device may implement additional services for the client devices (such as backup and replication).
Protocols and interfaces for SANs are wide and varied. You can find the typical SAN protocols in Linux such as Fibre Channel as well as its extension over IP (iFCP). Newer protocols, such as SAS, Fibre Channel over Ethernet (FCoE), and Internet SCSI (iSCSI), are also present, as are more domain-specific protocols like iSCSI Extensions for remote direct memory access (RDMA—iSER) and the SCSI RDMA Protocol (SRP), which extends SCSI over RDMA for Infiniband.
The emergence of Ethernet as a storage protocol has been fully realized in Linux, as it illustrates the power and flexibility of these approaches. Further, 10-gigabit Ethernet (10GbE) is fully supported in Linux, permitting construction of high-performance SANs. You can also find protocols like ATA over Ethernet (ATAoE), which extends the ATA protocol over the ubiquitous Ethernet protocol.
Last but not least is NAS. NAS is a consolidation of storage over a network for access by heterogeneous clients at the file level. Two of the most popular protocols, which are fully supported in Linux, are Network File System (NFS) and Server Message Block/Common Internet File System (SMB/CIFS).
Although the original SMB implementation was proprietary, it was reverse-engineered to be supported in Linux. The later SMB revisions were openly documented to allow simpler development in Linux.
Linux has continued to evolve with the various enhancements and extensions made to NFS. NFS is now a stateful protocol and includes optimizations for data and metadata separation as well as data access parallelism. You can read more about the evolution of NFS using the links in Related topics. As with Ethernet-based SANs, 10GbE support in Linux enables high-performance NAS repositories.
Other storage architectures
Not all storage architectures fit cleanly in the DAS, SAN, and NAS buckets. Because Linux is open, it makes it easy to develop new technologies within it, which is why you can find the newest bleeding-edge technologies in Linux.
One interesting storage architecture, which is not new but worthwhile to mention, is the object storage architecture. Object storage architectures split a file from its metadata and store them independently (on their respective data and metadata servers). This split provides certain advantages, such as minimizing the metadata bottleneck (because interactions with this server are only required to locate and open a file). Performance can also be enhanced by striping the data over multiple data servers for parallel access. Object storage is implemented in a variety of ways within Linux, including support for the Object Storage Device (OSD) specification as well as within the Linux clUSTER (Lustre) and Extended Object File System (exofs).
A similar technology exists called content-addressable storage (CAS) that uses a hash of the data to identify its name and address. This technology, also known as fixed-content storage (FCS), is advantageous, because it's easy to identify duplicate data: The hash (if strong enough) will be the same and permit simple de-duplication. The Venti architecture supports this approach and exists within Linux (in addition to the Plan 9 distribution of Bell Labs).
Storage services: logical volume management
Storage virtualization was once a feature unique to high-end storage systems, but it is now a standard feature of Linux. One of the most important services available in Linux is the Logical Volume Manager (LVM). The LVM is a thin layer that sits above physical storage available in the underlying storage architecture (with accompanying user-space tools) and abstracts it to one or more logical volumes that are simpler to manage. For example, while a physical disk cannot be resized, a logical volume can be resized to add or remove space from it.
With the ability to abstract physical devices into logical devices, LVM creates a number of other storage capabilities, such as read-only and read-write snapshots of volumes, data striping across volumes for performance (redundant array of independent disks [RAID]-0), data mirroring across volumes (RAID-1), and migration of volumes (even while online) between physical devices.
For data protection beyond mirroring, Linux includes
md (which stands for multiple disks)
and provides a rich set of RAID functionality. This element implements
software RAID functionality, supporting RAID-4 (striped data with a parity
block), RAID-5 (striped data with a distributed parity block), RAID-6
(striped data with distributed and dual-redundant parity blocks), and
RAID-10 (striped and mirrored data).
The LVM relies on another storage component called the Device-mapper, which provides (among other features) the ability to multipath. For example, in a SAN environment, there are commonly multiple storage interfaces into the SAN fabric. Multipathing is a feature that protects against the failure of a given path, ensuring that storage remains available as long as a path exists to communicate with the endpoint.
In the past few years, two relatively simple features have been added to the storage stack that address the evolution of the storage ecosystem:
- Data integrity
- Support for solid-state disks (SSDs)
The first change addresses the use of commodity drives in enterprise storage settings. Although enterprise-class drives (such as SAS drives) are reliable, SATA drives are built with different requirements and with cost as a major factor. For this reason, SATA drives can suffer from a problem known as silent data corruption, where errors can be introduced and not detected when the data is read from the disk. To solve this problem and support SATA drives in enterprise settings, data integrity codes are added to blocks on the disk (where the disk uses 520-byte sectors instead of the traditional 512-byte blocks). In addition, the drive itself can validate the data being written, so that its integrity code matches the data. In this way, errors can be caught as they're written to the disk, instead of detecting the error later when nothing can be done about it.
This mechanism is called the data integrity field (DIF), as shown in Figure 3, and represents an 8-byte trailer that includes a cyclic redundancy check (CRC) over the block of data, a reference tag (typically a portion of the logical block addressing [LBA]), and an application tag that the application defines. The reference tag is useful for catching mis-writes of data to an incorrect block, where the application tag can be used to catch other errors in the software stack. For example, if a PDF document is written, the application tag could be set to a value indicating a special PDF tag. When the PDF is read, each block's application tag can be inspected to ensure that all specify the PDF tag. DIF is supported within Linux as of kernel version 2.6.27.
Figure 3. DIF structure for a 512-byte sector
Growing support for SSDs
The introduction of SSDs is changing the storage ecosystem in a number of ways. These disks remove some of the large latencies found in spinning disks and therefore provide a way to maintain data flow to and from the CPU. But SSDs are different from hard disk drives (HDDs) in that they are consumable. The storage within an SSD can be written a finite number of times (depending on the technology); therefore, it's important to be as efficient as possible when writing data. To make matters worse, the SSD must internally shift data to minimize the introduction of errors in a process called garbage collection or wear-leveling. This process results in writes to the consumable storage and should therefore be minimized.
Another issue with SSDs and traditional storage is that an HDD didn't care whether data on disk was valid. If the file system invalidated the data, the data could remain on disk without any downside. This constraint does not exist with SSDs because of the wear-leveling requirement. For this reason, Linux now supports the ability of the file system to communicate discarded blocks to the SSD (as of kernel version 2.6.29). This ability allows the SSD to remove these blocks from wear-leveling processes and helps to increase the endurance of the drive.
What truly sets Linux apart from other operating systems is its vast library of file systems. In Linux, you can find traditional client file systems like the third extended file system (ext3) and the fourth extended file system (ext4), but you'll also find the state of the art in distributed file systems, cluster file systems, and parallel file systems. You can find new, cutting-edge file systems based around new ideas and addressing new problems in the storage domain, as well.
In terms of cutting-edge file systems today, Linux supports both ZFS and Butter FS (BTRFS). These two file systems compete with one another and share the distinction of copy-on-write semantics (blocks are never written in place). In addition, both file systems support data de-duplication, internal data protection (RAID-like protection), data and metadata checksums, and other storage features (like snapshots).
Linux is home to many distributed file systems, as well. One example is Lustre, which is a massively parallel distributed file system that supports tens of thousand of nodes and scales to petabytes of storage capacity. Ceph provides similar functionality and, in the past year, was introduced into the Linux kernel. Other examples in Linux include GlusterFS and the General Parallel File System (GPFS).
You can find specialized file systems in Linux, as well, including log-structured file systems like the New Implementation Log Structure File System (NiLFS(2)) and object-based file systems like exofs. Because Linux finds itself in many use models, you'll also find file systems for resource-constrained uses (such as embedded systems) as well as low-latency applications such as high-performance computing (HPC). File systems in the embedded area include the Yet Another Flash File System version 2 (YAFFS2), the Journaling Flash File System version 2 (JFFS2), and the Unsorted Block Image File System (UBIFS). File systems in the HPC space include the Parallel NFS (pNFS), Lustre, and the GPFS.
Linux storage ahead
Linux is and will continue to be the target for file systems and general storage research because of its openness and large community of developers.
One of the latest changes in storage is the use of remote services for cost-efficient storage of archive data. Known today as cloud storage, numerous vendors provide efficient and transparent access to remote, centralized storage with varying service level agreements (covering capabilities like protection and bandwidth). Two examples include Ubuntu One and Dropbox. Another service, called SpiderOak, can be used to back up your local user directories to the cloud for a small fee.
What other features might be on the horizon for Linux? Support for large sector sizes (moving beyond 512-byte sectors), thin provisioning to avoid reserved but unused capacity (where advertised storage exceeds the physical capacity), storage de-duplication (to maximize storage availability), and an even more efficient storage stack to exploit new speeds and efficiencies of drives like SSDs, perhaps? Whatever is coming in storage ecosystem evolution, Linux will be there first.
- Learn more about the differences in storage architectures in Demystifying Storage Networking: DAS, SAN, NAS, NAS Gateway, Fibre Channel, and iSCSI from IBM Storage Networking.
- NFS continues to evolve with Linux and now supports a parallel form of NFS called pNFS for higher scaling and performance. You can learn more about the evolution of NFS in Linux in Tim's article Network file systems and Linux (developerWorks, November 2010).
- This article referenced the LVM as a means of virtualizing storage in a platform. You can learn more about LVM2 (the second revision of LVM) at the LVM2 Resource Page. LVM2 uses the Device-mapper, which you can find at the Device-mapper Resource Page. This page also introduces you to the user-space tools for multipath operation.
- The T10 DIF provides end-to-end protection within a storage system. You can learn more about the DIF trailer and its justification within the original T10 proposal, T10/03-224r0.
- You can learn more about some of the file
systems mentioned in this article in Tim's other articles:
- Run ZFS on Linux (developerWorks, January 2011)
- Next-generation Linux file systems: NiLFS(2) and exofs (developerWorks, October 2009)
- Ceph: A Linux petabyte-scale distributed file system (developerWorks, May 2010)
- Anatomy of ext4 (developerWorks, February 2009)
- Linux kernel advances (developerWorks, March 2009)
- Anatomy of the Linux virtual file system switch (developerWorks, August 2009)
- Anatomy of Linux flash file systems (developerWorks, May 2008)
- The distributed replicated block device (DRBD) provides a simple way to protect a volume across a pair of servers. It works with a single physical volume and replicates it across a traditional TCP/IP network to a peer host. You can learn more about DRBD in High availability with the Distributed Replicated Block Device (developerWorks, August 2010).
- CAS systems provide an efficient means of storing data given the simplicity of de-duplication. One interesting application of CAS is in the context of virtualization environments, where duplicate data is common. Read more in Experiences with Content Addressable Storage and Virtual Disks (Anthony Liguori and Eric Van Hensbergen).
- Linux provides support for the Enhanced Host Controller Interface (EHCI) as well as the newer xHCI (for high-speed USB 3.0 devices).
- Linux supports the wide variety of disk attachment standards, such as SCSI, IDE/Enhanced IDE (EIDE)/ATA, SAS, SATA, Fibre Channel, and others.
- This article mentioned a number of file systems, including the GPFS, which is a high-performance clustered file system, and Lustre, a massively parallel distributed file system. When it comes to file systems, Linux is the operating system of choice.
- SSDs replace the mechanical internals of an HDD with solid-state memory such as NAND flash. SSDs offer superior performance with lower power requirements than traditional spinning disks. Today, you'll find SSDs in enterprises and in consumer devices.
- In the developerWorks Linux zone, find hundreds of how-to articles and tutorials, as well as downloads, discussion forums, and a wealth of other resources for Linux developers and administrators.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment.
- Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on developerWorks.