Topic
  • 4 replies
  • Latest Post - ‏2010-07-02T16:58:12Z by TSM_man
TSM_man
TSM_man
3 Posts

Pinned topic GPFS Backup questions (mmbackup)

‏2010-07-01T22:15:03Z |
Hello,
I would appreciate if some can answere my following questions.
1.What is the recommended backup strategy to backup GPFS file system using mmbackup in TSM environment ? ( example :Weekly Full and Daily Incremental -OR- Initial full and then forever incremental etc)
2.What TSM Management class attributes are required and honored by mmbackup ? Can we use multiple management class for different retentions with include-exclude ?
3.How to schedule GPFS backup using mmbackup? (http://publib.boulder.ibm.com/infocenter/tsminfo/v6/index.jsp?topic=/com.ibm.itsm.client.doc/t_bac_gpfssched.html ) This TSM 6.1 documentation talks about OBJECTS="gpfs_script", but nowhere I could find details of gpfs_script
4.How to protect mmbackupconfig information ? Does mmbackupconfig integrates with TSM or we need to use manual TSM BA incremental command to backup mmbackupconfig output ?
5.How the TSM expiration process expires GPFS backed up data ? Do we need any additional command like dsmc expire etc ?
6.Do we still need to use gpfs mmbackup command with TSM 6.2 or anything changed in with TSM TSM6.2 BA Client? I understand with previous version of TSM, mmbackup is the only efficient way to perform gpfs backup of millions of files.
7.Any best practices to backup GPFS using mmbackup and/or TSM BA client (Linux client)

Thank you all !

TSM_man
Updated on 2010-07-02T16:58:12Z at 2010-07-02T16:58:12Z by TSM_man
  • sberman
    sberman
    78 Posts

    Re: GPFS Backup questions (mmbackup)

    ‏2010-07-02T13:36:57Z  
    TSM_man,
    These are excellent questions. I'll try to help with some guidelines and specific answers:
    1) There is no single recommended backup strategy for GPFS because every GPFS cluster has its own unique data access evolutionary pattern. Some clusters ingest and never change or delete. Others change and delete or rename files frequently, etc. The best backup strategy is one that minimizes your exposure for data loss based on the kinds of changes that happen in your file system(s) over time. If a full backup is feasible in a reasonable amount of time, you should run it as frequently as you can tolerate. Depending on how much disk and how much tape you have behind your TSM server you can also decide how many backups are retained and for how long inside TSM. Incremental backups and full backups together in a regular schedule are recommended. When mmbackup -t incremental is run, files which have been deleted will be flagged for expiry in the TSM (more on incremental backup later in Q 5).

    2) Simple include and exclude pattern directives, stored in the TSM server are extracted by mmbackup and used to guide the policy-based scan of the file system. Mmbackup performs "dsmc query inclexcl" to derive the list of directives. It is not obvious to me how to use include and exclude rules as a means for retention limits but if you wanted to change the include and exclude behavior between mmbackup runs you could certainly do so.

    3) Use mmbackup as the top-level command to drive a single backup operation. You can utilize any scheduling system you wish to invoke mmbackup with appropriate options. One choice would be to use cron(1). Or you can wrap mmbackup with your own script do other things in addition to just mmbackup.

    4) Mmbackupconfig is new in GPFS 3.3. This tool helps capture some important information about a file system that is not typically available among the data that mmbackup saves. As a single-purpose program mmbackupconfig should be used along with other backup utilities such as mmbackup to comprise a cluster-wide data protection plan. For example, you could just to run mmbackupconfig and deposit the resulting config file in the root of the GPFS itself and then run mmbackup which would save to archive that configuration data file. Perhaps part of a script mentioned in #3.

    5) Mmbackup -t incremental will run "dsmc expire" on all files that were backed up in the last successful backup, but are no longer present on the file system at this time. This expiration process does not necessarily delete the file from the TSM server, but rather marks it as expired. Your TSM server configuration includes expiration criteria which dictate for how long a file remains on the server in the "expired" state. You can list the expired files with dsmc q backup -inactive "pathname"

    6) Mmbackup is a fundamentally different means of directing backup with TSM than using the TSM BA client directly. TSM BA client is capable of walking through the file system and determining how to backup files, however it can be very time consuming to backup GPFS in this manner. In contrast, mmbackup will utilize mmapplypolicy to use the parallel policy engine inside GPFS to enumerate the inodes of the file system much more efficiently. Policy-based inode enumeration can run in parallel on multiple nodes, and in multiple threads on every server node used. Therefore it scales out to large file system much more efficiently than walking the directory in the traditional way done by TSM BA client. After enumerating the file system's inodes with policy, mmbackup directs TSM to backup the list of files who are candidates for archive by issuing a series of dsmc selective -filelist=<file list> commands. These can also be run on multiple nodes if desired by using the -N <node spec> option to mmbackup.

    7) Mostly answered in #1 above, but in short, use good judgment to schedule regular runs of "mmbackup -t full" and more frequently, "mmbackup -t incremental" and use multiple nodes to permit parallel processing of the file system through mmapplypolicy.

    Lastly, snapshots can also be used to enhance backups. A snapshot (mmcrsnapshot) permits a point-in-time version of the GPFS to be captured within the file system. This can be an efficient means to keep even a one-day old backup of the file system online and accessible to users. Thus if you delete a file that was there yesterday, you can simply access yesterday's snapshot from the specified path set up by the admin. This can be orchestrated by a cron(1) job as well.

    Mmbackup should eventually be able to backup the file system by using a snapshot as it did in GPFS 3.2 but in GPFS 3.3 as of the current release this function is not fully implemented.

    -Steve
  • TSM_man
    TSM_man
    3 Posts

    Re: GPFS Backup questions (mmbackup)

    ‏2010-07-02T16:08:11Z  
    • sberman
    • ‏2010-07-02T13:36:57Z
    TSM_man,
    These are excellent questions. I'll try to help with some guidelines and specific answers:
    1) There is no single recommended backup strategy for GPFS because every GPFS cluster has its own unique data access evolutionary pattern. Some clusters ingest and never change or delete. Others change and delete or rename files frequently, etc. The best backup strategy is one that minimizes your exposure for data loss based on the kinds of changes that happen in your file system(s) over time. If a full backup is feasible in a reasonable amount of time, you should run it as frequently as you can tolerate. Depending on how much disk and how much tape you have behind your TSM server you can also decide how many backups are retained and for how long inside TSM. Incremental backups and full backups together in a regular schedule are recommended. When mmbackup -t incremental is run, files which have been deleted will be flagged for expiry in the TSM (more on incremental backup later in Q 5).

    2) Simple include and exclude pattern directives, stored in the TSM server are extracted by mmbackup and used to guide the policy-based scan of the file system. Mmbackup performs "dsmc query inclexcl" to derive the list of directives. It is not obvious to me how to use include and exclude rules as a means for retention limits but if you wanted to change the include and exclude behavior between mmbackup runs you could certainly do so.

    3) Use mmbackup as the top-level command to drive a single backup operation. You can utilize any scheduling system you wish to invoke mmbackup with appropriate options. One choice would be to use cron(1). Or you can wrap mmbackup with your own script do other things in addition to just mmbackup.

    4) Mmbackupconfig is new in GPFS 3.3. This tool helps capture some important information about a file system that is not typically available among the data that mmbackup saves. As a single-purpose program mmbackupconfig should be used along with other backup utilities such as mmbackup to comprise a cluster-wide data protection plan. For example, you could just to run mmbackupconfig and deposit the resulting config file in the root of the GPFS itself and then run mmbackup which would save to archive that configuration data file. Perhaps part of a script mentioned in #3.

    5) Mmbackup -t incremental will run "dsmc expire" on all files that were backed up in the last successful backup, but are no longer present on the file system at this time. This expiration process does not necessarily delete the file from the TSM server, but rather marks it as expired. Your TSM server configuration includes expiration criteria which dictate for how long a file remains on the server in the "expired" state. You can list the expired files with dsmc q backup -inactive "pathname"

    6) Mmbackup is a fundamentally different means of directing backup with TSM than using the TSM BA client directly. TSM BA client is capable of walking through the file system and determining how to backup files, however it can be very time consuming to backup GPFS in this manner. In contrast, mmbackup will utilize mmapplypolicy to use the parallel policy engine inside GPFS to enumerate the inodes of the file system much more efficiently. Policy-based inode enumeration can run in parallel on multiple nodes, and in multiple threads on every server node used. Therefore it scales out to large file system much more efficiently than walking the directory in the traditional way done by TSM BA client. After enumerating the file system's inodes with policy, mmbackup directs TSM to backup the list of files who are candidates for archive by issuing a series of dsmc selective -filelist=<file list> commands. These can also be run on multiple nodes if desired by using the -N <node spec> option to mmbackup.

    7) Mostly answered in #1 above, but in short, use good judgment to schedule regular runs of "mmbackup -t full" and more frequently, "mmbackup -t incremental" and use multiple nodes to permit parallel processing of the file system through mmapplypolicy.

    Lastly, snapshots can also be used to enhance backups. A snapshot (mmcrsnapshot) permits a point-in-time version of the GPFS to be captured within the file system. This can be an efficient means to keep even a one-day old backup of the file system online and accessible to users. Thus if you delete a file that was there yesterday, you can simply access yesterday's snapshot from the specified path set up by the admin. This can be orchestrated by a cron(1) job as well.

    Mmbackup should eventually be able to backup the file system by using a snapshot as it did in GPFS 3.2 but in GPFS 3.3 as of the current release this function is not fully implemented.

    -Steve
    Steve,
    This is awesome.Thank you very much for taking time to answer every single question in detail.
    Just one follow up question :
    Is there any GPFS requirment of running regular full backup after the first-initial full? Can we apply same TSM forever incremental approach here ?
    Run initial full using mmbackup -t full option. Once initial full is over then schedule daily incremental using mmbackupup -t incremental option. Any thought on running forever incremental backup ? Do you see any drawback ? I would like to avoid running fullbackup becuase of large data ( 100TB)
    Thank you again Steve !

    TSM_man
  • sberman
    sberman
    78 Posts

    Re: GPFS Backup questions (mmbackup)

    ‏2010-07-02T16:40:00Z  
    • TSM_man
    • ‏2010-07-02T16:08:11Z
    Steve,
    This is awesome.Thank you very much for taking time to answer every single question in detail.
    Just one follow up question :
    Is there any GPFS requirment of running regular full backup after the first-initial full? Can we apply same TSM forever incremental approach here ?
    Run initial full using mmbackup -t full option. Once initial full is over then schedule daily incremental using mmbackupup -t incremental option. Any thought on running forever incremental backup ? Do you see any drawback ? I would like to avoid running fullbackup becuase of large data ( 100TB)
    Thank you again Steve !

    TSM_man
    Just like in your earlier question 1, it really depends on how the GPFS is used. If your cluster ingests mostly un-changing files, and the vast majority of file system activity is adding new files, then one full backup followed by incrementals forever would be efficient. Even so, I would think that a good backup strategy would be to have a full backup at major intervals such as once each year, or before any hardware changes. If your GPFS usage includes a lot of renames (file moves), or directory deletions, then the difficulty in doing restore would be you would have to restore a lot of incrementals to recover the changes in the file structure. Mmbackup does not keep a long-term history of the changes between the current file system image and the last full backup. Rather it keeps a single history file that is updated at each new backup.

    So I still think an occasional full back is appropriate.

    -Steve
  • TSM_man
    TSM_man
    3 Posts

    Re: GPFS Backup questions (mmbackup)

    ‏2010-07-02T16:58:12Z  
    • sberman
    • ‏2010-07-02T16:40:00Z
    Just like in your earlier question 1, it really depends on how the GPFS is used. If your cluster ingests mostly un-changing files, and the vast majority of file system activity is adding new files, then one full backup followed by incrementals forever would be efficient. Even so, I would think that a good backup strategy would be to have a full backup at major intervals such as once each year, or before any hardware changes. If your GPFS usage includes a lot of renames (file moves), or directory deletions, then the difficulty in doing restore would be you would have to restore a lot of incrementals to recover the changes in the file structure. Mmbackup does not keep a long-term history of the changes between the current file system image and the last full backup. Rather it keeps a single history file that is updated at each new backup.

    So I still think an occasional full back is appropriate.

    -Steve
    Steve,
    This explains all.
    I am grateful for your help.
    Is it possible to add this nice information to to the original GPFS documentation or at least to the IBM Technote. . http://www-01.ibm.com/support/docview.wss?uid=swg21305169
    Even TSM Support could not answer any of these questions. Hope this will help them as well.
    Thank you again for all your help !

    TSM_man