• 2 replies
  • Latest Post - ‏2013-12-06T09:40:09Z by Christian_Svensson
136 Posts

Pinned topic snapshots and incremental backups

‏2013-11-28T13:32:41Z |


I am looking for a viable way to get incremental "backups" from GPFS snapshots without using TSM (or a comparable software)?  The background is that, in a planned system, updated files should just be copied somewhere else (outside GPFS) periodically and that has to be controlled somehow.

I am thinking of that approach: One could do snapshots at regular points in time, say T_1, T_2, .... Then, in order to run a backup at time T_i, one would inspect which files have actually data blocks in the snapshot from T_(i-1) (meaning they have been changed since T_(i-1) ), and read those from snapshot T_i. That might be quicker than an inode scan if the change rate is low and there are many inodes, I think. Is there an easy way to identify those files which do have data in the snapshot (which is a prereq. for that approach to work at all)?

As an alternative, the use of the api for inode scans (like in ts_inode) should be considered I suppose. A question WRT this: objects in a snapshot have the same inode (number) as their sources - how would the indode scan distinguish that (or does it just ignore all snapshot objects)?





  • yuri
    237 Posts

    Re: snapshots and incremental backups


    Identifying all changed files is hard.  You can't really avoid an inode scan, in some form.  GPFS inode scan API allows one to examine snapshot content, but it's not trivial to use.  This is one of those tasks that are straightforward on a high level, but quickly become rather messy when one gets down to details.  You can take a look at mmbackup code, for example.  It uses the policy scan infrastructure to identify the set of files to back up, and passes it to TSM.  Skipping the TSM part may be close to what you're looking for.

    The inode number of a given file is the same in the active file system and snapshots.


  • Christian_Svensson
    23 Posts

    Re: snapshots and incremental backups


    What you can do is generate a candidate list of all files that have been created/modified the last X hours/days and then do a full backup of that.

    Don't know what Backup Software you are using, but the problem you may have if you are using EMC/Commvault or any similar software that generates or create full backups + incremental, then will you have a risk where you losing data if you don't do a full backup of the entire GPFs cluster once a wild.

    Christian Svensson