Understanding and exploiting snapshot technology for data protection, Part 1: Snapshot technology overview

Helps you make informed decisions about implementing snapshots

Snapshot technology is becoming prevalent to perform data protection and other tasks such as data mining and data cloning. Most leading storage hardware and software vendors provide snapshot support. Advanced data protection solutions like IBM® Tivoli® Storage Manager are being built based on the snapshot technology. Use of snapshot technology for data protection offers critical business value, such as zero impact backup with minimal or no application downtime, frequent backups (for example, hourly) to reduce recovery time, efficient backup of large volumes of data, reduced exposure to data loss, and instant recovery from snapshot. However, you must give careful consideration before selecting a solution that fits your needs and environment.

The goal of this series is to provide an overview of snapshot technology and the snapshot-based data protection solutions offered by IBM Tivoli Storage Manager. This information enables you to make informed decisions about exploiting snapshot capabilities in the most effective way in your environment.

This series is targeted for IT architects, managers, and developers looking for ways to exploit snapshot capabilities to improve quality of the data protection services offered by IT organization.

This part of the series provides an overview of snapshot technology. Refer to the entire series.

Neeta Garimella (neeta@us.ibm.com), TSM Client Developer, Tivoli - Software Group, IBM

Neeta GarimellaNeeta Garimella has been a member of the TSM development team for over five years. She was one of the key architects of IBM Tivoli Storage Manager for Advanced Copy Services and Copy Services Modules. Prior to joining the TSM team, she was the lead developer for Tivoli Workload Scheduler. Before joining IBM, she worked at BEA Systems as a consultant where she helped customers build and deploy solutions using BEA products. She started her career with Tata Consultancy Services in India where she worked on a variety of customer projects both at the system and application level. Her special interests include Snapshot and Continuous Data Protection technology.



26 April 2006

What is a snapshot

Snapshot is a common industry term denoting the ability to record the state of a storage device at any given moment and preserve that snapshot as a guide for restoring the storage device in the event that it fails. A snapshot primarily creates a point-in-time copy of the data. Typically, snapshot copy is done instantly and made available for use by other applications such as data protection, data analysis and reporting, and data replication applications. The original copy of the data continues to be available to the applications without interruption, while the snapshot copy is used to perform other functions on the data.

Snapshots provide an excellent means of data protection. The trend towards using snapshot technology comes from the benefits that snapshots deliver in addressing many of the issues that businesses face. Snapshots enable better application availability, faster recovery, easier back up management of large volumes of data, reduces exposure to data loss, virtual elimination of backup windows, and lowers total cost of ownership (TCO).


How snapshots are implemented

There are different implementation approaches adopted by vendors to create snapshots, each with its own benefits and drawbacks. Therefore, it is important to understand snapshot implementations in order to be able to build effective data protection solutions and identify which functions are most critical for your organization to help select the snapshot vendor accordingly.

This section describes commonly used methodologies for creating the snapshot.

Copy-on-write

A snapshot of a storage volume is created using the pre-designated space for the snapshot. When the snapshot is first created, only the meta-data about where original data is stored is copied. No physical copy of the data is done at the time the snapshot is created. Therefore, the creation of the snapshot is almost instantaneous. The snapshot copy then tracks the changing blocks on the original volume as writes to the original volume are performed. The original data that is being written to is copied into the designated storage pool that is set aside for the snapshot before original data is overwritten, hence the name "copy-on-write".

Before a write is allowed to a block, copy-on-write moves the original data block to the snapshot storage. This keeps the snapshot data consistent with the exact time the snapshot was taken. Read requests to the snapshot volume of the unchanged data blocks are redirected to the original volume, while read requests to data blocks that have been changed are directed to the "copied" blocks in the snapshot. Snapshot contains the meta-data that describes the data blocks that have changed since the snapshot was first created. Note that original data blocks are copied only once into the snapshot storage when the first write request is received.

The following diagram illustrates a snapshot operation that creates a logical copy of the data using copy-on-write method.

Figure 1. Copy-on-write illustration
cow view

Copy-on-write snapshot might initially impact performance on the original volume while it exists, because write requests to the original volume must wait while original data is being "copied out" to the snapshot. The read requests to snapshot are satisfied from the original volumes if data being read hasn’t changed. However, this method is highly space efficient, because the storage required to create a snapshot is minimal to hold only the data that is changing. Additionally, the snapshot requires original copy of the data to be valid.

IBM FlashCopy® (NOCOPY), AIX® JFS2 snapshot, IBM TotalStorage® SAN File System snapshot, IBM General Parallel FIle System snapshot, Linux® Logical Volume Manager, and IBM Tivoli Storage Manager Logical Volume Snapshot Agent (LVSA) are all based on copy-on-write.

Redirect-on-write

This method is quite similar to copy-on-write, without the double write penalty, and it offers offers storage space and performance efficient snapshots.

New writes to the original volume are redirected to another location set aside for snapshot. The advantage of redirecting the write is that only one write takes place, whereas with copy-on-write, two writes occur (one to copy original data onto the storage space, the other to copy changed data).

However, with redirect-on-write, the original copy contains the point-in-time data, that is, snapshot, and the changed data reside on the snapshot storage. When a snapshot is deleted, the data from the snapshot storage must be reconciled back into the original volume. Furthermore, as multiple snapshots are created, access to the original data, tracking of the data in snapshots and original volume, and reconciliation upon snapshot deletion is further complicated . The snapshot relies on the original copy of the data and the original data set can quickly become fragmented.

IBM N series and the NetApp Filer snapshot implementation is based on redirect-on-write.

Split mirror

Split mirror creates a physical clone of the storage entity, such as the file-system, volume, or LUN for which snapshot is being created, onto another entity of the same kind and the exact same size. The entire contents of the original volume are copied onto a separate volume. Clone copies are highly available, since they are exact duplicates of the original volume that resides on a separate storage space. However, due to the data copy, such snapshots cannot be created instantaneously. Alternatively, a clone can also be made available instantaneously by "splitting" a pre-existing mirror of the volume into two, with the side effect that original volume has one fewer synchronized mirror. This snapshot method requires as much storage space as the original data for each snapshot. This method has the performance overhead of writing synchronously to the mirror copy.

EMC Symmterix and AIX Logical Volume Manager support split mirror. Additionally, any raid system supporting multiple mirrors can be used to create a clone by splitting a mirror.

Log structure file architecture

This solution uses log files to track the writes to the original volume. When data need to be restored or rolled back, transactions from the log files are run in reverse. Each write request to the original volume is logged much like a relational database.

Copy-on-write with background copy (IBM FlashCopy)

Some vendors offer an implementation where a full copy of the snapshot data is created using copy-on-write and a background process that copies data from original location to snapshot storage space. This approach combines the benefits of copy-on-write and split mirror methods as done by IBM FlashCopy and EMC TimeFinder/Clone. It uses copy-on-write to create an instant snapshot and then optionally starts a background copy process to perform block-level copy of the data from the original volume (source volume) to the snapshot storage (target volume) in order to create an additional mirror of the original volume.

When a FlashCopy operation is initiated, a FlashCopy relationship is created between the source volume and target volume. This type of snapshot is called a COPY type of FlashCopy operation.

IBM incremental FlashCopy

Incremental FlashCopy tracks changes made to the source and target volumes when the FlashCopy relationships are established. This allows the capability to refresh a LUN or volume to the source or target's point in time content using only the changed data. The refresh can occur in either direction, and it offers improved flexibility and faster FlashCopy completion times.

This incremental FlashCopy option can be used to efficiently create frequent and faster backups and restore without the penalty of having to copy entire content of the volume .

Continuous data protection

Continuous data protection (CDP), also called continuous backup, refers to backups of data when a change is made to that data by automatically capturing the changes to a separate storage location. CDP effectively creates an electronic journal of complete storage snapshots.

CDP is different from other snapshot implementation method described in this section because it creates one snapshot for every instant in time that data modification occurs as opposed to one point-in-time copy of the data created by other methods.

CDP-based solutions can provide fine restore granularities of objects, such as files, from any point in time to crash consistent images of application data, for example database filer and mailboxes.


Snapshot and storage stack

A storage stack is comprised of many hardware and software components that render physical storage media to the applications that run on a host operating system. The diagram below shows commonly used storage stack layers.

Aside from different snapshot implementation methods, snapshot solutions can be implemented in many layers in the storage stack. Broadly, snapshots can be created in software based layers or in hardware based layers. This is also categorized as controller-based (storage device or hardware driven) snapshot or host-based (file-system or volume managers) snapshots.

Controller-based snapshots are managed by storage subsystem hardware vendors and are integrated into disk arrays. These snapshots are done at LUN level (block level) and are independent of the operating system and file systems.

Host-based snapshots are implemented between the device driver and file-system levels. Snapshot can be performed by file systems, volume managers, or third party software. Host based snapshots have no dependency on the underlying storage hardware but depend on the file-system and volume manager software. Also these snapshots operate on the logical view of the data as opposed to the physical layout of the data which is used by the controller-based snapshot.

Figure 2. Storage stack and snapshot
Storage stack and snapshot view

Below are some vendors and products with snapshot solutions at different storage stack layer.

  • Storage subsystems: IBM TotalStorage Disk Systems, EMC Symmetrix, NetApp NAS
  • Virtualizations: IBM Total Storage SAN Volume Controller
  • Volume Managers : Veritas Volume Manager, Linux LVM, IBM Tivoli Storage Manager LVSA, Microsoft® Windows® 2003 VSS System provider
  • File systems: AIX JFS2, IBM TotalStorage SAN File System, IBM General Parallel File System, IBM N series, NetApp filers, and Veritas File System

Key observations regarding storage stack:

The storage stack layer in which snapshot is implemented has implications on the data protection solutions. Following are key observations that must be noted.

  • Physical storage (provided by storage subsystems) and volume managers, which facilitate use of physical storage, are two essential components in any meaningful storage implementation. These layers are always present.
  • Use of file system is optional, as some applications may choose to use the logical volume directly, for example database applications, which cannot be managed by snapshot technologies at the file system layer
  • The application layer in the stack may not necessarily provide a snapshot solution, but rather back-up mechanisms tied to the next storage stack layer it interfaces with, that is, file systems or volume managers. This includes quiescing the I/O to allow for a consistent data view.
  • Each layer ensures data consistency at its level, hence the buffers in the layers above it need to be flushed out before creating a snapshot.
  • File systems and volume manager-based snapshots are typically easy to use and provide better recovery granularity than the hardware-based snapshots.
  • Hardware-based snapshots provide protection against hardware failures and better performance. Many implementations offer data consistency groups to ensure consistency across more than one storage unit, such as LUN.

Snapshot implementations at a glance

The table below provides a quick look at the various aspects of each of the snapshot implementations described above.

Table 1. Snapshot Implementations Overview at a Glance
Copy-on-writeRedirect-on-writeSplit mirrorLog structure file architectureCopy-on-write with background copy (IBM FlashCopy)IBM incremental FlashCopyContinuous data protection
Snapshot requires original copy of dataYes: the unchanged data is accessed from the original copyYes: the unchanged data is accessed from the original copy No: the mirror contains full copy of the data Yes: the unchanged data is accessed from the original copyOnly until background copy is completeOnly until background copy is completeNo-Most implementations include a replica of the original copy
Space-efficientYes: in most cases space required only for changed data – exceptions such as IBM FlashCopy exist. Check with the vendorYes:in most cases space required only for changed data. Check with the vendor No: requires same amount of space as original data Yes: spaces required for the changed data No: requires same amount of space as original dataNo: requires same amount of space as original dataYes: space required depends on the amount and frequency of changes to data when multiple point-in-time copies need to be kept.
I/O and CPU performance overhead on the system with original copy of the dataHigh: software based snapshot None: hardware-based snapshots (performed by the storage hardware)High: software based snapshot None: hardware-based snapshots (Impact on the storage hardware)Low: after mirror is split High: prior to the split to keep the mirror synchronized High: overhead incurred in logging the writesLow: performed by the storage hardwareLow : performed by the storage hardwareImplementation specific: Check with the vendor
Write overhead on the original copy of the dataHigh: first write to data block results in additional writeNone: writes are directed to new blocksNone: write overhead is incurred before the splitHigh: writes must be logged High: first write to data block results in additional writeHigh: first write to data block results in additional write High: Each write results in a corresponding write to the storage space
Protection against logical data errorsYes: changes can be rolled back or synched back into the original copyYes: changes can be rolled back or synched back into the original copyYes: data from the mirror must be copied. Typically slower since changes are not tracked. Yes: the changes can be rolled backYes: another FlashCopy can be created in the reverse direction Yes: another FlashCopy can be created in the reverse direction. Typically faster, since only the changed blocks are copiedYes: changes can be synched back into the original copy
Protection against physical media failures of the original dataNone: valid original copy must existNone: valid original copy must existYes: the split mirror is a full cloneNone: valid original copy must existFull protection after background copy is complete Full protection after background copy is completeImplementation-specific: Check with the vendor

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Tivoli (service management) on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Tivoli (service management), Tivoli
ArticleID=106287
ArticleTitle=Understanding and exploiting snapshot technology for data protection, Part 1: Snapshot technology overview
publish-date=04262006