Informix OnBar and TSM, Part 2

Configuring and using IDS/OnBar with Tivoli Storage Manager

Comments

This is the second part of a two-part article series discussing the backup of the IBM Informix Dynamic Server (IDS) using IBM Tivoli Storage Manger (TSM).

In the first installment, you learned about some common TSM terms and the necessary IDS and TSM configuration steps for backing up your databaseserver using OnBar and TSM. This article gives you some hints to prevent known pitfalls and it shows you how to enable useful tracing mechanisms.

IDS/TSM pitfalls

The following hints will help you avoid some common pitfalls when working with OnBar and TSM.

Timeout occurring

  • Problem

    Incremental backups are terminated because of timeout issues.

    During an incremental backup, IDS has to read every used page in order to decide if this page is a candidate for backup or not. If only a small subset of the data has changed, this might result in lengthy read operations without writing any data to the TSM server. Timeouts are possible under these circumstances.

  • Solution

    Your TSM administrator should increase the TSM parameters IDLETIMEOUT and COMMTIMEOUT to reasonable values in order to avoid the termination of incremental OnBar backups.

XBSA Error 133 (0x85) returned

  • Problem

    The XBSA 133 (0x85) error code means BAR_NO_BSALIB. It indicates a problem with the XBSA library.

  • Solution

    Make sure that the $ONCONFIG parameter BAR_BSALIB_PATH points to the correct XBSA library and that the user informix is able to load the library, which means the user is able to check the permissions of the library and the directories.

    Check your OnBar activity log for the exact error message:

    Error message in OnBar activity log
    2005-01-01 11:31:12 68140  39848 ERROR: An unexpected error occurred:  Could not load 
    module /usr/tivoli/tsm/client/informix/bin/libTDPinf.a.  The module has an invalid 
    magic number. Exec format error.

    The above error message indicates that you might need to extract the shared object bsashr10.o from the shared library libTDPinf.a. This is an AIX®-specific issue and was discussed in the first article.

XBSA Error 96 (0x60) returned

  • Problem

    OnBar terminates a backup and returns the XBSA Errorcode 96 (0x60). This is a general error code that could have several causes.

  • Solution

    Most of the time the XBSA 96 (0x60) error is returned because of missing or invalid entries in the inclexcl.def file.

    If something in inclexcl.def is wrong, you should see the following or a similar message in your OnBar> activity log:

    Error message in OnBar activity log
    2005-01-01 13:16:02 48788  68348 XBSA Error (BSACreateObject): An unspecified 
    XBSA error has occurred: 96

    The abort of the BSACreateObject() call indicates a possible inclexcl.def misconfiguration.

    Another cause for the XBSA 96 (0x60) error might be a missing or invalid TDP/Informix license file in IDS versions before V10 (in IDS V10, a license file is not required):

    Error message in OnBar activity log
    2005-01-01 11:34:02 68212  48844 XBSA Error (BSAInit): An unspecified 
    XBSA error has occurred: 96

    Here the BSAInit() aborted. Check if a logfile named tsmlic.log has been generated in you current working directory. This logfile will contain further information about the license problem.

    The license file is normally stored in the default installation path for TDP/Informix:

    • AIX: /usr/tivoli/tsm/client/nformix/bin[64]/tdpi.lic
    • Solaris: /opt/tivoli/tsm/client/nformix/bin[64]/tdpi.lic

Unique backup object names for logical logs

  • Problem

    Transaction logs always have a unique name because the log number is part of the object name. Here is an example of a backup object name for an IDS instance named ids10, servernumber 67 and logical log ID 99:

    • /ids10/ids10/67/99

    The consequence of this naming convention is that logical logs will never expire because they have always a unique name and thus will never become inactive. The exception to this rule could be a log salvage operation where already saved logical logs are backed up again with the same object name, thereby inactivating the previous version. However, the new version of the backup object (the salvaged log) takes over the active status.

  • Solution

    For the inactivation of logical logs, Informix delivers the onsmsync (Online Storage Manager Synchronization) utility. Onsmsync can be used for the inactivation of logical logs in TSM and will also remove old entries from the sysutils database and the emergency bootfile, keeping them in sync.

    You might also use the 'dsmc expire' command to inactivate your logical logs. However, keep in mind that in this case no synchronization with the sysutils database and the emergency bootfile takes place. You will now be responsible for keeping this information in sync.

Duplicate backup object names for dbspaces

  • Problem

    A Level-0-Backup of the root dbspace for the IDS instance ids10 will be named:

    • /ids10/ids10/rootdbs/0

    Incremental backups (Level 1 and 2) will have the following backup object names associated with them:

    • /ids10/ids10/rootdbs/1
    • /ids10/ids10/rootdbs/2

    Dbspace backup objects will automatically become inactive because a new backup of the same level (for example, Level-0, Level-1, Level-2) will have exactly the same backup object name as before and thus the previous version will become inactive.

    From the first look the reusing of backup object names for dbspaces seems to be a good approach because the previous version will automatically switch to the inactive state. Now TSM is responsible for expiring those objects based on the settings of the underlying backup copy group.

    However, there is a pitfall in this technique that comes into play when you want to store backups with different management classes. Imagine you make a weekly Level-0-backup, and daily Level-1-backups, which are both kept for one month (example TSM management classname MC31). At the end of a quarter, you want to take a Level-0-backup, which should be kept for one year (example TSM management classname MC365). A possible approach would be to work with two different inclexcl.def files, where the default inclexcl.def would have the following entry for the daily backups:

    • include /ids10/.../[0-2] MC31

    For the quarterly backups, you would substitute the above entry in inclexcl.def like here:

    • include /ids10/.../[0-2] MC365

    The problem that now arises is that TSM will perform a rebinding of the management class for all backup objects with the same name. Thus the previous performed weekly Level-0-backups will now be rebound to management class MC365. By the way, all active and inactive versions with the same backup object name will be re-bound.

    This is not pretty, but the real problem occurs later when taking a weekly Level-0-backup with the original inclexcl.def entry. Now the quarterly Level-0-backup will be re-bound from management class MC365 to MC31. So the quarterly backup will likely not be available when you might need to restore it in the future.

  • Solution

    To prevent the above-described behavior, you have to register a dedicated nodename for the quarterly backups. Normally the TSM nodename corresponds to the hostname of your client machine, but you are allowed to register several nodenames for the same client machine. Your TSM administrator has to perform the registration of the new nodename on the TSM server. After the registration has been done, you are able to perform your quarterly backup under the dedicated TSM nodename.

    Because you have no possibility to tell OnBar to use this dedicated nodename, you have to change the client system options file (dsm.sys). Change (or add) the NODENAME parameter in the dsm.sys file to the the dedicated nodename. Afterwards, you could perform your level-0-backup using OnBar. This procedure ensures that the backup is done under the dedicated nodename and prevents the rebinding of management classes for the existing backup objects.

    One thing to mention is that you should perform your quarterly backup as whole system backups, for example, specifying the OnBar option '-w'. A whole-system-backup ensures that you are able restore this backup later without applying logical logs. Your modified inclexcl.def should contain the following two entries:

    • exclude /.../*
    • include /ids10/.../[0-2] MC365

    The exclude-entry ensures that logical logs are not backed up under the dedicated nodename, as this would cause trouble during a restore operation of your daily backups. The problem that will appear in that case is that you will have some logical logs stored under the normal nodename and others under the dedicated nodename. So during the logical restore, OnBar would only be able to find the logical logs stored under the normal nodename, not those stored under the dedicated nodename (unless you granted the appropriate access permissions using 'dsmc set access...' (see the section on Imported restore).

    It is advisable that you always store your logical logs under the normal nodename by omitting the entry for the logs and also by specifying an exclude-entry in your modified inclexcl.def. This approach will lead to the following error message in your OnBar activity log during the Level-0-Backup while log switches occur, the ALARMPROGRAM is fired and wants to perform a backup of unsaved logical logs:

    Error message in OnBar activity log
    ... Successfully connected to Storage Manager.
    ... XBSA Error (BSACreateObject): An unspecified XBSA error has occurred: 96
    ... /usr/informix/bin/onbar_d: process exit 96 (0x60)

    This error message can be ignored. The logical log will be saved later after the Level-0-Backup is done and you restored your original nodename in dsm.sys. The next log switch will back up all unsaved logical logs using the ALARMPROGRAM mechanism (for example, 'onbar -b -l') under the original nodename. Ensure that you have enough logical log space configured to prevent a log full condition during your quarterly backup.

    It is also possible to create a completely uncoupled dsm.sys configuration that is only considered for this quarterly backup. The logical logs could be still backed up by the ALARMPROGRAM while the special quarterly Level-0-Backup is running. To achieve this, you have to create a dedicated directory that contains symbolic links for all files (except dsm.sys) to the original DSMI_DIR directory . In this new directory you create a modified version of the dsm.sys file (changing NODENAME and INCLEXCL entries) and then you set the DSMI_DIR environment variable to this directory before starting your quarterly backup.

    This ensures that the modified dsm.sys settings are only used for this special quarterly Level-0-Backup and that the logical logs are still backed up with the ALARMPROGRAM under the original nodename. The danger of a log full condition during the quarterly backup is eliminated. You will still receive the above-described XBSA 0x96 error once at the end of the quarterly backup because the OnBar process performing the Level-0-Backup tries to send the current logical log to TSM (this OnBar process uses the dedicated nodename environment). However, as already mentioned, there is no need to worry about this because the affected logical log will be saved by the ALARMPROGRAM as soon as the next log switch occurs.

    The above-described procedure will not prevent you from using these logical logs in conjunction with your quarterly backup performed under the dedicated nodename in order to perform a Point-In-Time-Restore. To accomplish this task, you need to split your restore process in a physical- and a logical restore, for example:

    1. Set nodename to the dedicated nodename used for your quarterly backups
    2. Perform a physical restore only
      • 'OnBar -r -w -p -t <timestamp_of_quarterly_backup>'
    3. Reset nodename to the normal nodename
    4. Continue your restore operation with the logical restore
      • 'onbar -r -l -t <timestamp_for_point_in_time>'

    If you don't want to split the restore operation, granting the appropriate access permissions using 'dsmc set access....' (see the section on Imported restore) might be an alternative.

    You don't need to specify the '-w' flag if you want to restore a Whole-System-Backup. This is a common misunderstanding. Specifying the '-w' flag during a restore means that a sequential restore (such as dbspace after dbspace) should be performed by OnBar.

    If you omit the '-w' option, OnBar is still capable of restoring a Whole-System-Backup. In this case, only the root dbspace is restored sequentially and after that OnBar will fork several additional OnBar processes (depending on the $ONCONFIG parameter BAR_MAX_BACKUP) and restores the remaining dbspaces in parallel.

    LTAPEDEV has to be set to something different than /dev/null in your $ONCONFIG before starting such a parallel restore. This will reduce your restore time significantly if the infrastructure (disks, network, TSM server) is adequate. At the end of a restore you will see a message like the following in the OnBar activity log:

    Warning message in OnBar activity log
    ...WARNING: Physical restore complete. Logical restore required before work

    This is misleading. You must not restore any logs because you have restored a Whole-System-Backup (in parallel mode), not a Parallel-Backup. You can bring your IDS database server online at this point by executing the 'onmode -m' command. This process might take a few minutes because the physical log and the logical logs will be cleared. Execute 'onstat -D -r' to monitor the write operations on the associated dbspaces/chunks.

    Whole-System-Backups are identified by a 1 in the fourth position of your emergency bootfile. Parallel-Backups have a 0 at this position:

    Emergency Boot File
    ids10             rootdbs            R  1 117   0 0  .........
    ids10             rootdbs            R  0 112   0 0  .........

    The difference between a sequential and a parallel restore is shown by the following excerpts from the OnBar activity log:

    Sequential physical restore ('onbar -r -w -p')
    2006-01-01 17:00:25 45242  38228 Completed cold level 0 restore rootdbs.
    2006-01-01 17:00:25 45242  38228 Begin cold level 0 restore logdbs.
    2006-01-01 17:00:25 45242  38228 Completed cold level 0 restore logdbs.
    2006-01-01 17:00:25 45242  38228 Begin cold level 0 restore physdbs.
    2006-01-01 17:00:25 45242  38228 Completed cold level 0 restore physdbs.
    2006-01-01 17:00:26 45242  38228 Completed whole system restore.

    Dbspace after dbspace is restored from a single OnBar process. The child process ID is shown in the third position and remains the same during a sequential restore.

    Parallel physical restore ('OnBar -r -p')
    2006-01-01 17:04:02 42612  25106 Begin cold level 0 restore rootdbs.
    2006-01-01 17:05:27 42612  25106 Completed cold level 0 restore rootdbs.
    2006-01-01 17:05:27 52488  42612 Process 52488  42612 successfully forked.
    2006-01-01 17:05:27 56058  42612 Process 56058  42612 successfully forked.
    2006-01-01 17:05:27 52488  42612 Successfully connected to Storage Manager.
    2006-01-01 17:05:27 56058  42612 Successfully connected to Storage Manager.
    2006-01-01 17:05:27 52488  42612 Begin cold level 0 restore logdbs.
    2006-01-01 17:05:27 56058  42612 Begin cold level 0 restore physdbs.
    2006-01-01 17:05:29 56058  42612 Completed cold level 0 restore physdbs.
    2006-01-01 17:05:29 52488  42612 Completed cold level 0 restore logdbs.
    2006-01-01 17:05:29 56058  42612 Process 56058  42612 completed.
    2006-01-01 17:05:29 52488  42612 Process 52488  42612 completed.
    2006-01-01 17:05:29 42612  25106 WARNING: Physical restore complete. Logical restore required 
    before work can continue.

    After restoring the root dbspace, OnBar forks additional OnBar processes and the remaining dbspaces are processed in parallel mode (up to BAR_MAX_BACKUP). Multiple process IDs are contained in the third position of the OnBar activity log file. The misleading warning message is displayed after the physical restore has completed. Executing 'onmode -m' (omitting logical restore) after the physical restore will lead to the following messages in the online.log:

    Messages in online.log after executing 'onmode -m')
    17:06:54  No logical log restore will be performed.
    17:06:54  Clearing the physical and logical logs has started
    17:07:26  Cleared 129 MB of the physical and logical logs in 31 seconds
    17:07:26  Physical Recovery Started.
    17:07:26  Physical Recovery Complete: 0 Pages Restored.
    17:07:26  Logical Recovery Started.
    17:07:28  Logical Recovery Complete.
              0 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks
    17:07:29  Bringing system to On-Line Mode with no Logical Restore.
    17:07:30  On-Line Mode

    IDS will be in online mode after the physical log and the logical logs have been completely cleared.

TSM export and import

  • Problem

    You have performed a TSM export and import operation and after that a restore if the IDS database server doesn't work anymore. Backup objects requested by OnBar will not be found in the TSM server anymore.

  • Solution

    There is no direct solution to this problem. Each backup object gets a unique object ID from TSM. OnBar stores this unique object ID in the sysutils database and also in the emergency bootfile. During a restore operation the desired backup objects are retrieved by executing the XBSA call BSAGetObject() and providing this object ID.

    TSM guarantees the uniqueness and durability of an object ID for the lifecycle of an backup object. The exception to this rule is an TSM export/import operation. What has happened is that after an export/import operation the object IDs no longer match the values stored in the sysutils database or emergency bootfile.

    This is an OnBar design problem. The TSM API documentation suggests using the XBSA call BSAQueryObject() for locating an object and performing the BSAGetObject(), providing the object ID received from the first call. OnBar normally only executes the BSAGetObject() call, except during the end of a logical restore, where it calls BSAQueryObject() to see if there are any additional logical logs that are not contained in sysutils database or the emergency bootfile.

    The data is not lost. It is possible to write a program that queries the sysutils database, obtains the new object ID from TSM by calling BSAQueryObject() with the appropriate query parameters, and writes the received object ID back to the sysutils database. Afterwards, you can run 'onsmsync -b' to regenerate the emergency bootfile from the sysutils database. If the sysutils database is not available, you can parse the emergency bootfile instead and obtain the new object ID based on this information and write it back to the emergency bootfile.

    The above-described procedure for acquiring the new object ID's is feasible but definitely not an easy task. The better way is not using the TSM export/import as long as there are IDS backups that you might need to restore.

Imported restore

  • Problem

    An imported restore means the restoration of your IDS database server on a second machine. This is a quite common requirement for setting up a test environment or a disaster-recovery site.

  • Solution

    Performing a restore using OnBar and TDP/Informix on a secondary machine is called imported restore. The following points should be considered for such kind of restores:

    • Ensure that the underlying operating systems and hardware architectures are the same.
    • Use the same version of IDS on both machines. If using IDS before version 10, then make also sure that the same version of TDP/Informix is installed on both machines.
    • Create the same disk layout on the second machine (at least the size and name of the chunks must be the same as on the primary machine. You are allowed to use symbolic links). If you are on IDS V9.4 or higher, you can use the Redirected Restore feature. This feature allows you to perform a restore without having the same disk layout.
    • Copy the following files from your $INFORMIXDIR/etc directory to the second machine:
      • $ONCONFIG
      • $INFORMIXSQLHOSTS
      • oncfg_<servername>.<servernum>
      • sm_versions
      • ixbar.<servernum>
    • Substitute the hostname in your $INFORMIXSQLHOSTS file with the hostname of the secondary machine.
    • Make the necessary entries in your /etc/services file
    • Make sure that your dsm.opt and dsm.sys files contain appropriate entries:
      • Ensure that the same TSM server is contacted (TCPSERVERADDRESS, TCPPORT)
      • Set PASSWORDACCESS to GENERATE
    • Allow the secondary node to access objects backed up on the primary node (this has to be done on the primary machine):
      • 'dsmc set access backup /ids10/*/* <second_node>informix'
      • Check if the permission has been granted using 'dsmc query access'

    These should be the major points to consider before performing an imported restore. After the above preparations have been done, start the restore on the secondary machine using the appropriate OnBar command.

Other important files to back up

  • Problem

    A disaster occurred. You have to restore your IDS instance on a secondary machine (see the section on Imported restore). You forgot to back up some essential files like the emergency bootfile.

  • Solution

    Beside the dbspaces and logical logs backed up by OnBar, you should always back up the following files and information. This ensures that all necessary information will be available if you have to restore the database server in an emergency situation:

    • $INFORMIXDIR/etc/$ONCONFIG
    • $INFORMIXSQLHOSTS
    • $INFORMIXDIR/etc/ixbar.<servernum>
    • $INFORMIXDIR/etc/oncfg_<servername>.<servernum>
    • $INFORMIXDIR/etc/sm_versions
    • inclexl.def, dsm.sys, dsm.opt
    • 'onstat -d' output
    • Important environment settings

    You can use the 'dsmc archive' command to back up these additional files and information. Make sure that you use an appropriate management class, so that the files are still available in TSM when you need them.

    Exercise restore operations on a regular basis. Having the right procedures in place and the necessary experience helps you to survive an emergency situation.

    Ensure that your backups are consistent and reliable. OnBar offers this functionality with the '-v'; (verify) switch. Perform regular checks of your backups using this helpful OnBar functionality.

Trace mechanisms

If something is not working as expected, you can trace the backup operation. This should help you identifying the source of the problem. OnBar and the TSM API offer both tracing methods.

OnBar tracing

Useful information about OnBar problems can be either found in the online.log ($ONCONFIG parameter: MSGPATH) or in the OnBar activity log ($ONCONFIG parameter: BAR_ACT_LOG).

If this information is not satisfactory, you can set the following two $ONCONFIG parameters for debugging the OnBar process:

  1. BAR_DEBUG_LOG
    • Full pathname of the debug logfile to be created by OnBar
  2. BAR_DEBUG
    • Amount of debug information to be generated. Possible values are 1 to 9

Both parameters can be set without restarting the database server. The OnBar process extracts these settings from your $ONCONFIG during the next run. In newer IDS versions (V7.31.UD5 and higher and V9.40.UC1 and higher), OnBar also checks for the existence of BAR_DEBUG in the $ONCONFIG file every 120 seconds. This means you can set it on the fly, after the OnBar process has already been started.

Debugging generates a lot of information and also slows down your backup/restore process significantly. Set BAR_DEBUG only if you want to analyze a problem, never during normal operation. A debug level of 7 should normally be adequate. Debug level 9 generates a lot of information because it dumps the page headers of all pages.

TSM API tracing

Information about the TSM API can be found in the TSM API log. The name of this logfile is dsierror.log and it is normally generated in the same directory where OnBarhas been started. You can also set the environment variable DSMI_LOG to the desired directory. If you are not sure where your dsierror.log is located, execute the following find command:

  • 'find / -name dsierror.log -ls'

The modification time should help you finding the correct dsierror.log. Setting the environment variable DSMI_LOG might be the better way.

If the information in dsierror.log is not sufficient, tracing the TSM API is the next step. Tracing can be enabled by setting the following parameters in your client user options file dsm.opt:

  • tracefile
    • The full pathname of the file storing the trace information.
  • traceflag
    • A comma-separated list of different trace categories. Useful categories for IDS are:
      • appl
      • api_detail
      • verbdetail
      • timestamp

A detailed list of trace categories can be found in the TSM product manuals (see Related topics).


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=105333
ArticleTitle=Informix OnBar and TSM, Part 2
publish-date=03092006