Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Autonomic data protection, your file system life saver

Real-time to-disk transparent file backup and recovery

Chris Stakutis (chris.stakutis@us.ibm.com), CTO Emerging Storage Software, IBM/Tivoli
Author photo
Chris Stakutis is a renowned data storage industry inventor, technologist, and author with over 20 years of industry experience. He holds over 6 U.S. patents (8 more filed) along various data and networking inventions. Currently working for IBM managing cutting-edge data storage research and development, he was the founder and CTO of SANergy (high speed data sharing), which was sold to IBM in 2000. Mr. Stakutis is often published in industry journals and seen speaking at industry events. Mr. Stakutis graduated from Worcester Polytechnic Institute in an accelerated three year program and then went on to obtain an MBA from Babson College at a leisurely 10-year pace. He has held key engineering and product management rolls in various high technology companies including Mercury Computer Systems, Precision Robots, MIT Lincoln Laboratories, and many startups.

Summary:  Automatically and transparently back up laptops and workstations by exploiting disks and networks. It is estimated that nearly 70% of corporate digital assets are now stored on laptops and other non-server endpoints; typically, only 10% of those systems are backed up in most corporations (even lower in households). The new world of fast and inexpensive disks and broadband offers great hope for a new style of data protection: transparent real-time to multiple targets.

Date:  09 Feb 2005
Level:  Introductory

Comments:  

What is autonomic data protection?

Data protection is autonomic if it:

  • Replicates files the moment they change, automatically and transparently
  • Locks down files from any alteration, including viruses
  • Unwinds file changes (restores to an earlier time)
  • Prioritizes and optimizes transfer (protection) operations
  • Operates in real-time; protection and restoration must be online activities and not subject to arcane time-elapsing methodologies

The autonomic computing world is focused on reducing human interaction and allowing software and machinery to manage more operations intelligently. The world of data protection has been sorely missing such a level of automation and instantaneous operation. Finally, there is some evidence of change, made possible by pervasive networks and inexpensive storage targets.


Why are we underprotected?

Too many people have fallen into the trap of not backing up their primary machines. Statistics show that less than 10% of users back up their laptops. The typical reasons given:

  • Important material should be on a corporate-managed file server
  • Not being connected at midnight (when the backup program would like to run)
  • Backups over the network or dial-up never complete in time
  • Periodically burning a CD with what is believed to be the important stuff

End-user backup packages can be painful to run. Most of them are CD-focused and want you to feed 20-100 CDs to back up your new 200-GB drive; this is simply not practical. The world needs a better solution that is more continuous (autonomic) and more transparent.


The changing world, end-point data

Years ago, your instructions as a corporate end user were to save your important files to a managed file server (which presumably is backed up regularly). This typical practice was suitable in the age when most people used workstations that were permanent fixtures in the offices and had continuous access to network mounts.

Like driving without a seat belt...

...less than 10% of users back up their laptops

...the business value of what you do during a day today is worth three times as much as it was in the late 80s

...all of your important files are now carried with you, and quite likely never stored elsewhere.

The loss of a single user-end-point file today can cost a corporation thousands of dollars.

The world is different today. Many people use laptops and lug them around everywhere. Sometimes you're are on a network, sometimes not. And even if you are on a network, it might not be one that can correctly access the corporate backbone. As a result, all of your important files are now carried with you, and quite likely never stored elsewhere.

In addition, the business value of what you do during a day today is worth three times as much as it was in the late 80s. Over the last two decades, the level of efficiency has sky rocketed, which means that any hiccup in operation can have a magnified impact on a job. As a comparison, consider your own financial banking use. Years ago, if your bank decided to change their closing time to 4 p.m. instead of 5 p.m., the impact was minimal. Today, if your bank was not online and available through the Internet, the cost of banking with them would be extraordinarily high in comparison to other banks. The new level of efficiencies have squeezed out waste and thus increased the importance of hiccup-free operations.

Today, the loss of a single user-end-point file can cost a corporation thousands of dollars between lost productivity, lost opportunity (if it was a sales presentation for example), and increased employee irritation. Users have a hidden expectation that all work done today is secured and magically protected, but that's not the case.


The new approach to backup

The world of data protection is changing. Disappearing from the landscape is the notion of tape (offline sequential media). Years ago, if you wanted material restored, your administrator would likely have to call their offsite archive provider, retrieve a tape, mount it, and somehow search for your material. Easily a lengthy task, and fewer than 30% of the time would they find your material. By contrast, in the new world, backed up data is stored online and in its native format (that is, as files; identical to the source files).

Keeping backed up files online and in their native format greatly speeds up the recovery process (as well as the backup process). Furthermore, the backed up data can be managed by any means that is comfortable to the user because the material is simply stored as files. Backup files can be searched, indexed, and reorganized to meet your changing needs.


What is VitalFile?

VitalFile, a combination of replication and traditional backup, brings significant additional values. There are two different types of files on your system: files that you personally create (such as a source file or a Word document) that have a high-importance value, and then everything else. VitalFile immediately replicates your changed high-importance files so that you have versions to step back through. Typically, it stores the versioned copies in a parallel tree on your computer, which allows for backup/recovery even while in an airplane. Additionally, VitalFile queues the files for transfer to a network file server (or even a TSM server). When the network is available, the files transfer, and if unavailable, they wait.

Get VitalFile today

Try it for yourself. Start down the road of autonomic data protection and download VitalFile today from the alphaWorks Web site. You can learn more with the VitalFile Overview Presentation.

In addition to the high-importance files, everything else on your computer is still important. Often there are project files (for example, for Microsoft™ Visual C++) or jpegs of your child's birthday party that you might not be changing continuously, but are still important to have protected. VitalFile does a more traditional scheduled backup (for example, nightly or even hourly) that grabs all other material without ever searching your disk because it employs a change-journal for rapid discovery of the changed files. These changed files are sent to the file server (or queued if unavailable) and then versioned at the file server. That is, you can go back to earlier instances of those too if needed.

A day in the life...

The typical software developer today spends an inordinate amount of time using his or her laptop from a surprisingly varied set of locations. Often the day starts at 5:00 a.m. with some e-mail and maybe some overnight test inspections. Dash off a few code changes that were realized while sleeping, and the developer heads off to a morning doctor's appointment. While in the waiting room, the developer is productively creating some test scripts and reviewing the latest Java™Beans. Later at the office, the developer rejoins a high speed network. Later that evening, after the post-dinner mad programming session, the developer finally finds time for sleep. Let's examine the various types of protection the developer transparently enjoyed throughout the day:

  • While at home, the computer is making real-time copies of the highly important C test script files. These files are stored twice: once locally on the disk as distinct versions, and once to some network-visible file server.
  • While at the doctor's office, additional changes made are once again versioned to his local disk and queued for transfer to the now unavailable network server.
  • While at the office, the queued files transfer to the network server.
  • Late at night, all the other somewhat less-important files are captured, versioned, and sent to the remote file server, allowing for a time-based restore if ever needed.

Never was the developer at risk for loosing mid-day-created material. The autonomic real-time and versioning aspect of this type of protection ensures higher productivity and comfort.

There is a huge difference between VitalFile and traditional backup: single endpoint. VitalFile only runs on client systems and has no need to have a component on the target file server. It can perform all of the necessary management from the client, including the creation and auto-management of the versioned instances at the target. VitalFile is suitable for a single home end-user up through a sizable company with hundreds of users.


The hybrid approach

Two different data-protection methodologies have come together to form a comprehensive yet rapid and modern approach to protecting your digital assets.


Figure 1. The new view
The new view

Replication has historically been used for failover and disaster recovery type applications. The goal was to keep an additional site (or disk) identical to some primary one. For good or bad, this also meant identical mistakes and viruses as well as unnecessary real-time replication of unimportant material. Traditional backup, by contrast, focused on having time-based captures of material, and having that material stored on less expensive external media. Great complexity was involved in keeping track of all the various media elements (tapes, CDs, and so on) and being able to reassemble to a specified point in time.

This new approach of blending replication with traditional backup captures the best of both worlds: high speed captures (and restores), avoids unnecessary data movement (and thus unnecessary run time impact), and yet retains the ability to restore to a point in time. Furthermore, the system is specifically guarding some files with the real-time protection and is also tolerant of spottily available network destinations.

A typical small office, home office configuration might look as follows:


Figure 2. SOHO example
XML error: The image is not displayed because the width is greater than the maximum of 580 pixels. Please decrease the image width.

A variety of workstations and laptops can each run VitalFile and can be configured to use a common section of network storage, typically a network attached storage (NAS) device or a file server. The single-endpoint approach of VitalFile lets the target end points be any sort of file-serving solution, including closed architectures. All of the management of the target is performed by the clients, but there is far less management needed as the approach leverages the very nature of file servers: to serve files.

The configuration in Figure 2 can easily be scaled to hundreds of systems. If necessary, larger file servers can be built, or extra disks added, or both. The backup file server itself can also be replicated offsite to another file server; broadband connections are extremely affordable today and quite fast, plus, such a configuration typically has 20 hours to catch up before the next new load of material needs to move across.


Using and configuring VitalFile

The first step is to download the VitalFile software (see Resources). The software installs quickly after acknowledging a few simple screens. VitalFile automatically loads the necessary drivers and agents and takes you to a configuration screen to let you specify operations regarding your autonomic protection.

Configuring a VitalFile system is very straightforward. As mentioned previously, there are two types of protection: the real-time (high importance) files and the scheduled lower priority (everything else) files. The configuration for the real-time feature is shown in Figure 3.


Figure 3. Real-time feature
XML error: The image is not displayed because the width is greater than the maximum of 580 pixels. Please decrease the image width.

The lower-priority scheduled protection is shown in Figure 4.


Figure 4. Scheduled protection
XML error: The image is not displayed because the width is greater than the maximum of 580 pixels. Please decrease the image width.


Summary

Times have changed. We each now lug around a tremendous amount of disk storage with us wherever we go. Laptops are can have hard drive failures or could be lost or stolen. To make matters worse, most of us have fallen into a habit of ad-hoc file protection; we make copies of particular projects or source-file trees when we believe we have hit a development milestone or plateau. In fact, we are somewhat stingy and do not make backups or archives until we are satisfied with some level of utility, which further increases the time between our discrete protection intervals.

Running a traditional backup application that insists on connectivity to the corporate systems is not practical (either you're not connected at the right time or it moves too much data and we abort the backup and not have any protection). Running a homeowner backup application is too focused on CDs and simply not practical in today's world of nearly terabyte disks.

A hybrid approach of real-time replication-like protection with traditional versioning and off-machine copies holds some promise. The key is simplifying the overall task and machinery involved (or otherwise the solution would be too complex for the vast majority of us holding that 70% of unprotected corporate data). The VitalFile approach, single-end-point, real-time (for the highest importance files), tolerant of network conditions, and time-based, allows for a wide range of users to have superb protection.


Resources

  • Start down the road of autonomic data protection and download VitalFile today from the alphaWorks Web site.

  • Get involved in the developerWorks community by participating in developerWorks blogs.

  • Browse for books on these and other technical topics.

About the author

Author photo

Chris Stakutis is a renowned data storage industry inventor, technologist, and author with over 20 years of industry experience. He holds over 6 U.S. patents (8 more filed) along various data and networking inventions. Currently working for IBM managing cutting-edge data storage research and development, he was the founder and CTO of SANergy (high speed data sharing), which was sold to IBM in 2000. Mr. Stakutis is often published in industry journals and seen speaking at industry events. Mr. Stakutis graduated from Worcester Polytechnic Institute in an accelerated three year program and then went on to obtain an MBA from Babson College at a leisurely 10-year pace. He has held key engineering and product management rolls in various high technology companies including Mercury Computer Systems, Precision Robots, MIT Lincoln Laboratories, and many startups.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Tivoli
ArticleID=35148
ArticleTitle=Autonomic data protection, your file system life saver
publish-date=02092005
author1-email=chris.stakutis@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).