Snapshot is a common industry term denoting the ability to record the state of a storage device at any given moment and preserve that snapshot as a guide for restoring the storage device in the event that it fails. A snapshot primarily creates a point-in-time copy of the data. Typically, snapshot copy is done instantly and made available for use by other applications such as data protection, data analysis and reporting, and data replication applications. The original copy of the data continues to be available to the applications without interruption, while the snapshot copy is used to perform other functions on the data.
Snapshots provide an excellent means of data protection. The trend towards using snapshot technology comes from the benefits that snapshots deliver in addressing many of the issues that businesses face. Snapshots enable better application availability, faster recovery, easier back up management of large volumes of data, reduces exposure to data loss, virtual elimination of backup windows, and lowers total cost of ownership (TCO).
There are different implementation approaches adopted by vendors to create snapshots, each with its own benefits and drawbacks. Therefore, it is important to understand snapshot implementations in order to be able to build effective data protection solutions and identify which functions are most critical for your organization to help select the snapshot vendor accordingly.
This section describes commonly used methodologies for creating the snapshot.
A snapshot of a storage volume is created using the pre-designated space for the snapshot. When the snapshot is first created, only the meta-data about where original data is stored is copied. No physical copy of the data is done at the time the snapshot is created. Therefore, the creation of the snapshot is almost instantaneous. The snapshot copy then tracks the changing blocks on the original volume as writes to the original volume are performed. The original data that is being written to is copied into the designated storage pool that is set aside for the snapshot before original data is overwritten, hence the name "copy-on-write".
Before a write is allowed to a block, copy-on-write moves the original data block to the snapshot storage. This keeps the snapshot data consistent with the exact time the snapshot was taken. Read requests to the snapshot volume of the unchanged data blocks are redirected to the original volume, while read requests to data blocks that have been changed are directed to the "copied" blocks in the snapshot. Snapshot contains the meta-data that describes the data blocks that have changed since the snapshot was first created. Note that original data blocks are copied only once into the snapshot storage when the first write request is received.
The following diagram illustrates a snapshot operation that creates a logical copy of the data using copy-on-write method.
Figure 1. Copy-on-write illustration
Copy-on-write snapshot might initially impact performance on the original volume while it exists, because write requests to the original volume must wait while original data is being "copied out" to the snapshot. The read requests to snapshot are satisfied from the original volumes if data being read hasn’t changed. However, this method is highly space efficient, because the storage required to create a snapshot is minimal to hold only the data that is changing. Additionally, the snapshot requires original copy of the data to be valid.
IBM FlashCopy® (NOCOPY), AIX® JFS2 snapshot, IBM TotalStorage® SAN File System snapshot, IBM General Parallel FIle System snapshot, Linux® Logical Volume Manager, and IBM Tivoli Storage Manager Logical Volume Snapshot Agent (LVSA) are all based on copy-on-write.
This method is quite similar to copy-on-write, without the double write penalty, and it offers offers storage space and performance efficient snapshots.
New writes to the original volume are redirected to another location set aside for snapshot. The advantage of redirecting the write is that only one write takes place, whereas with copy-on-write, two writes occur (one to copy original data onto the storage space, the other to copy changed data).
However, with redirect-on-write, the original copy contains the point-in-time data, that is, snapshot, and the changed data reside on the snapshot storage. When a snapshot is deleted, the data from the snapshot storage must be reconciled back into the original volume. Furthermore, as multiple snapshots are created, access to the original data, tracking of the data in snapshots and original volume, and reconciliation upon snapshot deletion is further complicated . The snapshot relies on the original copy of the data and the original data set can quickly become fragmented.
IBM N series and the NetApp Filer snapshot implementation is based on redirect-on-write.
Split mirror creates a physical clone of the storage entity, such as the file-system, volume, or LUN for which snapshot is being created, onto another entity of the same kind and the exact same size. The entire contents of the original volume are copied onto a separate volume. Clone copies are highly available, since they are exact duplicates of the original volume that resides on a separate storage space. However, due to the data copy, such snapshots cannot be created instantaneously. Alternatively, a clone can also be made available instantaneously by "splitting" a pre-existing mirror of the volume into two, with the side effect that original volume has one fewer synchronized mirror. This snapshot method requires as much storage space as the original data for each snapshot. This method has the performance overhead of writing synchronously to the mirror copy.
EMC Symmterix and AIX Logical Volume Manager support split mirror. Additionally, any raid system supporting multiple mirrors can be used to create a clone by splitting a mirror.
This solution uses log files to track the writes to the original volume. When data need to be restored or rolled back, transactions from the log files are run in reverse. Each write request to the original volume is logged much like a relational database.
Some vendors offer an implementation where a full copy of the snapshot data is created using copy-on-write and a background process that copies data from original location to snapshot storage space. This approach combines the benefits of copy-on-write and split mirror methods as done by IBM FlashCopy and EMC TimeFinder/Clone. It uses copy-on-write to create an instant snapshot and then optionally starts a background copy process to perform block-level copy of the data from the original volume (source volume) to the snapshot storage (target volume) in order to create an additional mirror of the original volume.
When a FlashCopy operation is initiated, a FlashCopy relationship is created between the source volume and target volume. This type of snapshot is called a COPY type of FlashCopy operation.
Incremental FlashCopy tracks changes made to the source and target volumes when the FlashCopy relationships are established. This allows the capability to refresh a LUN or volume to the source or target's point in time content using only the changed data. The refresh can occur in either direction, and it offers improved flexibility and faster FlashCopy completion times.
This incremental FlashCopy option can be used to efficiently create frequent and faster backups and restore without the penalty of having to copy entire content of the volume .
Continuous data protection (CDP), also called continuous backup, refers to backups of data when a change is made to that data by automatically capturing the changes to a separate storage location. CDP effectively creates an electronic journal of complete storage snapshots.
CDP is different from other snapshot implementation method described in this section because it creates one snapshot for every instant in time that data modification occurs as opposed to one point-in-time copy of the data created by other methods.
CDP-based solutions can provide fine restore granularities of objects, such as files, from any point in time to crash consistent images of application data, for example database filer and mailboxes.
A storage stack is comprised of many hardware and software components that render physical storage media to the applications that run on a host operating system. The diagram below shows commonly used storage stack layers.
Aside from different snapshot implementation methods, snapshot solutions can be implemented in many layers in the storage stack. Broadly, snapshots can be created in software based layers or in hardware based layers. This is also categorized as controller-based (storage device or hardware driven) snapshot or host-based (file-system or volume managers) snapshots.
Controller-based snapshots are managed by storage subsystem hardware vendors and are integrated into disk arrays. These snapshots are done at LUN level (block level) and are independent of the operating system and file systems.
Host-based snapshots are implemented between the device driver and file-system levels. Snapshot can be performed by file systems, volume managers, or third party software. Host based snapshots have no dependency on the underlying storage hardware but depend on the file-system and volume manager software. Also these snapshots operate on the logical view of the data as opposed to the physical layout of the data which is used by the controller-based snapshot.
Figure 2. Storage stack and snapshot
Below are some vendors and products with snapshot solutions at different storage stack layer.
- Storage subsystems: IBM TotalStorage Disk Systems, EMC Symmetrix, NetApp NAS
- Virtualizations: IBM Total Storage SAN Volume Controller
- Volume Managers : Veritas Volume Manager, Linux LVM, IBM Tivoli Storage Manager LVSA, Microsoft® Windows® 2003 VSS System provider
- File systems: AIX JFS2, IBM TotalStorage SAN File System, IBM General Parallel File System, IBM N series, NetApp filers, and Veritas File System
The storage stack layer in which snapshot is implemented has implications on the data protection solutions. Following are key observations that must be noted.
- Physical storage (provided by storage subsystems) and volume managers, which facilitate use of physical storage, are two essential components in any meaningful storage implementation. These layers are always present.
- Use of file system is optional, as some applications may choose to use the logical volume directly, for example database applications, which cannot be managed by snapshot technologies at the file system layer
- The application layer in the stack may not necessarily provide a snapshot solution, but rather back-up mechanisms tied to the next storage stack layer it interfaces with, that is, file systems or volume managers. This includes quiescing the I/O to allow for a consistent data view.
- Each layer ensures data consistency at its level, hence the buffers in the layers above it need to be flushed out before creating a snapshot.
- File systems and volume manager-based snapshots are typically easy to use and provide better recovery granularity than the hardware-based snapshots.
- Hardware-based snapshots provide protection against hardware failures and better performance. Many implementations offer data consistency groups to ensure consistency across more than one storage unit, such as LUN.
The table below provides a quick look at the various aspects of each of the snapshot implementations described above.
Table 1. Snapshot Implementations Overview at a Glance
|Copy-on-write||Redirect-on-write||Split mirror||Log structure file architecture||Copy-on-write with background copy (IBM FlashCopy)||IBM incremental FlashCopy||Continuous data protection|
|Snapshot requires original copy of data||Yes: the unchanged data is accessed from the original copy||Yes: the unchanged data is accessed from the original copy||No: the mirror contains full copy of the data||Yes: the unchanged data is accessed from the original copy||Only until background copy is complete||Only until background copy is complete||No-Most implementations include a replica of the original copy|
|Space-efficient||Yes: in most cases space required only for changed data – exceptions such as IBM FlashCopy exist. Check with the vendor||Yes:in most cases space required only for changed data. Check with the vendor||No: requires same amount of space as original data||Yes: spaces required for the changed data||No: requires same amount of space as original data||No: requires same amount of space as original data||Yes: space required depends on the amount and frequency of changes to data when multiple point-in-time copies need to be kept.|
|I/O and CPU performance overhead on the system with original copy of the data||High: software based snapshot None: hardware-based snapshots (performed by the storage hardware)||High: software based snapshot None: hardware-based snapshots (Impact on the storage hardware)||Low: after mirror is split High: prior to the split to keep the mirror synchronized||High: overhead incurred in logging the writes||Low: performed by the storage hardware||Low : performed by the storage hardware||Implementation specific: Check with the vendor|
|Write overhead on the original copy of the data||High: first write to data block results in additional write||None: writes are directed to new blocks||None: write overhead is incurred before the split||High: writes must be logged||High: first write to data block results in additional write||High: first write to data block results in additional write||High: Each write results in a corresponding write to the storage space|
|Protection against logical data errors||Yes: changes can be rolled back or synched back into the original copy||Yes: changes can be rolled back or synched back into the original copy||Yes: data from the mirror must be copied. Typically slower since changes are not tracked.||Yes: the changes can be rolled back||Yes: another FlashCopy can be created in the reverse direction||Yes: another FlashCopy can be created in the reverse direction. Typically faster, since only the changed blocks are copied||Yes: changes can be synched back into the original copy|
|Protection against physical media failures of the original data||None: valid original copy must exist||None: valid original copy must exist||Yes: the split mirror is a full clone||None: valid original copy must exist||Full protection after background copy is complete||Full protection after background copy is complete||Implementation-specific: Check with the vendor|
- More about IBM Tivoli Storage Manager offernings
- Product Manual IBM Tivoli Storage Manager for Copy Services: Microsoft Exchange Volume Shadow Copy Services (VSS) Integration Module
- Product Manual IBM Tivoli Storage Manager for Advanced Copy Services: DB2 UDB Integration Module 5.3.3
- Product Manual Data Protection for FlashCopy Devices for mySAP Installation and User's Guide for Oracle
- Product Manual Data Protection for FlashCopy Devices for mySAP Installation and User's Guide for DB2UDB
- Product Manual Data Protection for Enterprise Storage Server Databases (Oracle) Installation and User's Guide
- Product Manual IBM Tivoli Continuous Data Protection for Files.
- Product Manual IBM Tivoli Storage Manager for Windows Backup-Archive Clients Installation and User's Guide
- Product Manual IBM Tivoli Storage Manager for UNIX and Linux Backup-Archive Clients Installation and User's Guide
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Build your
next development project with IBM trial software, available for download directly from
- Participate in the discussion forum.
- Participate in developerWorks
blogs and get involved in the developerWorks community.
Neeta Garimella has been a member of the TSM development team for over five years. She was one of the key architects of IBM Tivoli Storage Manager for Advanced Copy Services and Copy Services Modules. Prior to joining the TSM team, she was the lead developer for Tivoli Workload Scheduler. Before joining IBM, she worked at BEA Systems as a consultant where she helped customers build and deploy solutions using BEA products. She started her career with Tata Consultancy Services in India where she worked on a variety of customer projects both at the system and application level. Her special interests include Snapshot and Continuous Data Protection technology.