Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

What's new in Informix MACH 11 high-availability features for secondary servers

Reduce the load on your primary server and recover quickly from problems

Harshavardhan Changappa
Harshavardhan Changappa is a certified systems administrator for Informix databases. He has been working with the Informix development team for over four years now. He has worked with quality assurance and development teams and has been instrumental in developing functional and Integration test cases for various features of IDS. He's widely recognized for his contribution in the areas of IDS replication and high-availability, like flexible grid, transactional survival, ER log lag action, etc.
Bharath Sriram (bharath.sriram@gmail.com), Software Developer, Epic
Bharath Sriram
Bharath Sriram worked with the integration team and on OAT while at IBM. He is certified in systems administration for IBM Informix Dynamic Server V11. He holds a master's degree in computer science from Ohio State University. His research interests include information retrieval and text mining in social networks.

Summary:  IBM® Informix® V11.50.xC6 brought several enhancements for Informix multi-node active clusters for high availability, called MACH 11. In this article, get an overview of some of the new features, such as the ability to stop or suspend replication in the Informix cluster environment, and the ability to perform an external backup on a remote standby (RS) secondary server. You'll also learn how to avoid blocking checkpoints on primary servers in a high-availability data replication (HDR) environment. Explore detailed steps for implementing the new enhancements and learn how they work. A concluding frequently asked questions section strives to answer any lingering questions.

Date:  21 Jul 2011
Level:  Intermediate PDF:  A4 and Letter (98KB | 20 pages)Get Adobe® Reader®
Also available in:   Chinese

Activity:  2995 views
Comments:  

Overview

Beginning with V11, Informix includes a group of high-availability and clustering features called MACH 11. These features ensure that data remains available regardless of problems that may occur. In V11.50.xC6, there are several enhancements for secondary servers that enable you to reduce the load on the primary server and provide help in recovering more quickly from problems. The purpose of this article is to give you the information you need to implement these new features in your own environment.


Stop or suspend replication in the Informix cluster environment

Starting with Informix V11.50.xc6, you can suspend or stop the logical logs applying at remote stand-alone secondary (RS) nodes using the delayed-apply feature. This helps DBAs quickly recover from problems either by quickly suspending or stopping the logical logs applying at secondary nodes. Delayed apply is quick and easy, allowing DBAs to extract the data from the secondary node prior to the point where corruption or a problem deletion occurred.

Enabling and disabling the delayed-apply feature

You can suspend the logical logs flowing from primary to secondary by setting configuration parameter DELAY_APPLY in the configuration onconfig file. You can manually edit the configuration file or dynamically change the value of the parameter using these commands:

  • onmode -wf LOG_STAGING_DIR = [Valid_Dir_Path — Updates the value of the specified configuration parameter in the ONCONFIG file
  • onmode -wm DELAY_APPLY = Dir_Path — Dynamically sets the value of the specified configuration parameter in memory

You can specify a time delay in terms of days, hours, minutes, and seconds, indicated as follows in the DELAY_APPLY parameter:

  • s/S: seconds
  • m/M: minutes
  • h/H: hours
  • d/D: days

The following examples show how you would delay applying the log files on the remote standby secondary server.

To delay for four hours:

  • onmode -wf DELAY_APPLY=4H

To delay for one day:

  • onmode -wf DELAY_APPLY=1D

To delay for 30 minutes:

  • onmode -wf DELAY_APPLY=30M

To delay for 15 seconds:

  • onmode -wf DELAY_APPLY=15S

To turn off DELAY_APPLY, disabling the delay of logical log files applying on the RS secondary, use the onmode command as follows: onmode -wf DELAY_APPLY=0.

Before setting the parameter DELAY_APPLY, the LOG_STAGING_DIR parameter must be set in the configuration file. It is the directory in which logs are staged at the secondary node that are coming from primary server.

You can manually edit the configuration file or dynamically change the value of the parameter using these commands:

  • onmode -wf LOG_STAGING_DIR = [Valid_Dir_Path] — Updates the value of the specified configuration parameter in the ONCONFIG file
  • onmode -wm LOG_STAGING_DIR = [Valid_Dir_Path] — Dynamically sets the value of the specified configuration parameter in memory

Before setting this parameter, ensure that the directory is created with the following permissions:

  • Owner and Group should be user "informix"
  • Must not have public read, write, or execute permission (i.e., 0770)

Steps to configure delayed apply

The following steps describe how to configure delayed and show how it works.

1. Create a log staging directory with proper permissions

The example below shows the directory logs_staged_dir is created under /work/harsha and the configuration parameter LOG_STAGING_DIR is set to valid path, /work/harsha/logs_staged_dir.


Listing 1. Directory listing for logs_staged_dir

ls -ld /work/harsha/logs_staged_dir
drwxrwx---  2 informix informix   512 Jun 1 01:52 /work/harsha/logs_staged_dir/

onmode -wf LOG_STAGING_DIR=/work/harsha/logs_staged_dir

Listing 2 shows the RS secondary server online log message when LOG_STAGING_DIR is set at RS secondary server.


Listing 2. Remote secondary server online log message

RS secondary server Online log:
----------------------------------------------------------------------------------
01:47:57  DR: RSS secondary server operational
01:48:11  Checkpoint Completed:  duration was 0 seconds.
01:48:11  Tue Jun  1 - loguniq 4, logpos 0x49e018, timestamp: 0x27bf0 Interval: 21

01:48:11  Maximum server connections 0 
01:48:11  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 143, Llog used 0

01:48:12  Logical Log 4 Complete, timestamp: 0x2802b.
01:48:18  Checkpoint Completed:  duration was 1 seconds.
01:48:18  Tue Jun  1 - loguniq 5, logpos 0x10018, timestamp: 0x28091 Interval: 22

01:48:18  Maximum server connections 0 
01:48:18  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 49, Llog used 0

01:54:06  Value of LOG_STAGING_DIR has been changed to /work/harsha/logs_staged_dir.

By looking at the logical logs shown in Listing 3, you can see logs are at the same position. That means data is in sync at both the primary and RS secondary server.


Listing 3. Logical logs at primary and RS secondary server

Logical Logs at Primary Server:

Logical Logging
Buffer bufused  bufsize  numrecs    numpages   numwrits   recs/pages pages/io
  L-1  0        32       209379     17044      4683       12.3       3.6     
        Subsystem    numrecs    Log Space used
        OLDRSAM      209236     28540032      
        SBLOB        20         1008          
        HA           123        5412          

address     number   flags    uniqid   begin       size     used    %used
8759bf50    1        U-B----  1        1:7763      3000     3000   100.00
8759bfb8    2        U-B----  2        1:10763     3000     3000   100.00
87582f50    3        U-B----  3        1:13763     3000     2458    81.93
87582fb8    4        U-B----  4        1:16763     3000     1476    49.20
87456b48    5        U-B----  5        1:19763     3000     3000   100.00
87456bb0    6        U-B----  6        1:22763     3000     3000   100.00
87456c18    7        U---C-L  7        1:25763     3000     1111    37.03
87456c80    8        A------  0        1:28763     3000        0     0.00


Logical Logs at Secondary Server:

Logical Logging
Buffer bufused  bufsize  numrecs    numpages   numwrits   recs/pages pages/io
  L-1  0        32       0          0          0          0.0        0.0     
        Subsystem    numrecs    Log Space used

address     number   flags    uniqid   begin       size     used    %used
87455f98    1        F------  0        1:7763      3000        0     0.00
8759bf50    2        F------  0        1:10763     3000        0     0.00
8759bfb8    3        F------  0        1:13763     3000        0     0.00
87582f50    4        U-B----  4        1:16763     3000     1476    49.20
87582fb8    5        U-B----  5        1:19763     3000     3000   100.00
87456518    6        U-B----  6        1:22763     3000     3000   100.00
87456580    7        U---C-L  7        1:25763     3000     1111    37.03
874565e8    8        A------  0        1:28763     3000        0     0.00


2. Set DELAY_APPLY configuration parameter at RS secondary server

Use the onmode utility as follows to set DELAY_APPLY to 5 minutes, delaying the applying of logical logs at RS secondary server by 5 minutes:

			
Execute: onmode -wf DELAY_APPLY=5M
			

Once you set the DELAY_APPLY parameter using the onmode utility, a subdirectory will be automatically created based on the SERVERNUM, ifmxlog_[server_num], where the logical logs are staged that are coming from primary server.

The next example shows the RS secondary server online log when DELAY_APPLY is set. The server automatically creates a staging area "ifmxlog_14" under LOG_STAGING_DIR and logical log 7 is staged.


Listing 4. Logs when DELAY_APPLY is set
			
RS secondary server Online log
-----------------------------------------------------------------------------
02:12:47  Maximum server connections 1 
02:12:47  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 1053, Llog used 0

02:15:20  Secondary Delay or Stop Apply: 
Using the directory /work/harsha/logs_staged_dir/ifmxlog_14.
02:15:20  A Request to reset the log position to 7:0 was sent 
to the primary server.
02:15:21  Value of DELAY_APPLY has been changed to 5M.
--------------------------------------------------------------------

ls -l /work/harsha/logs_staged_dir/ifmxlog_14
total 4464
-rw-rw----   1 root     informix 2273280 Jun  1 02:15 ifmxUniqueLog_7


Primary Server Online log
------------------------------------------------------------------------------
02:12:47  Maximum server connections 2 
02:12:47  Checkpoint Statistics - Avg. Txn Block Time 0.009, # Txns blocked 1,
 Plog used 1050, Llog used 1760

02:15:20  DELAY_APPLY has been set to 5M on server m64a1_c2
------------------------------------------------------------------------------
			

Performing transactions at the primary server

The next example shows dropping the customer table in the stores_demo database to demonstrate whether the following transaction will be applied immediately at RS secondary.


Listing 5. Transactions at primary server after DELAY_APPLY is set

--------------------------------------------------------------------------------
>dbaccess stores_demo -

Database selected.

> begin work;

Started transaction.

> info tables;


Table name

call_type          catalog            classes            cust_calls        
customer           employee           ext_customer       items             
manufact           orders             state              stock             
tab                warehouses         

> drop table customer;

Table dropped.

> commit work;

Data committed.

> close database;

Database closed.
----------------------------------------------------------------------------------

Now look at the commit time of above transaction using onlog utility as shown below.


Listing 6. Inspecting commit time of transactions

onlog -n 7
----------------------------------------------------------------------------------
addr     len  type     xid      id link    
4611f8   56   COMMIT   34       0  4611cc   06/01/2010 02:18:36
462018   44   HA       33       0  45f648   SDSCYCLE 106     

When you look for the customer table within 5 minutes, you see that the logical logs will not be applied on the RS secondary node. The committed transaction log will be staged at LOG_STAGING_DIR at the RS secondary server.

The example below shows that the customer table in stores_demo at the RS secondary server is still available.


Listing 7. stores_demo database at secondary server

>dbaccess stores_demo -

Database selected.

> info tables;

Table name

call_type          catalog            classes            cust_calls        
customer           employee           ext_customer       items             
manufact           orders             state              stock             
tab                warehouses         

> 

Directory where logical logs are staged:
/work/harsha/logs_staged_dir/ifmxlog_14> ls -l
total 4512
-rw-rw----   1 root     informix 2299904 Jun  1 02:18 ifmxUniqueLog_7

After 5 minutes of commit transaction time, the secondary node will read the transactions from the staging directory LOG_STAGING_DIR and apply them.


Listing 8. Secondary server's online log

APPLY TIME = COMMIT TIME + DELAY_TIME
	     = 02:18:36	 + 5 mins
 	     = 02:23:36

Now query the stores_demo database and look for customer table at RS secondary server after 5 minutes of commit time.

The example below shows that customer table in stores_demo at RS secondary server is not available and that transaction has been applied.


Listing 9. Customer table at secondary server after transaction applied

> dbaccess stores_demo -

Database selected.

> info tables;

Table name

call_type         catalog           classes          cust_calls        
employee          ext_customer      items            manufact          
orders            state             stock            tab               
warehouses         

> select * from customer;

  206: The specified table (customer) is not in the database.

  111: ISAM error:  no record found.
Error in line 1
Near character position 22
> 


External backup on remote stand-alone secondary

This feature allows the DBA to reduce the load on the primary server. Often, available RSS nodes have a lower workload, and DBAs would like to use them to perform a backup.

One advantage is that performing an external backup using shadow copies in the secondary server doesn't require any special hardware. You need to be able to copy the chunks in parallel using archive custom programs like dd and gzip.

There are a few thing to take note of when planning backup on an RS secondary:

  • Currently, only external backup is supported on RSS secondary.
  • The feature works similar to backup of a primary server.
  • The primary server is not blocked or affected by the external backup performed on RS secondary nodes.
  • The backup obtained from the secondary server is restorable to any other server in the cluster.
  • The backup obtained from the secondary server cannot be restored with Level 1 or Level 2 backups made on a different server in the cluster.

Prerequisites

The following are required to perform external backup on RS secondary:

  • LOG_STAGING_DIR must be set to a valid value in the RSS server.
  • STOP_APPLY cannot be active in order to start an external archive.
  • DELAY_APPLY may be active, though it is recommended that it be disabled as you might not have been able to get a timely archive checkpoint to accomplish the archive.

Steps to taking an external backup

The steps to take an external backup are:

  1. Block the RS secondary server using onmode -c block [timeout].
  2. Perform the backup using operating system commands such as tar, gzip, cp, and dd.
  3. Unblock the RS secondary server using onmode -c unblock.

We'll walk through these steps in detail in an upcoming section.

How external backup works

Figure 1 shows the primary, HDR, and RS secondary nodes and illustrates the steps of an external backup on RSS.


Figure 1. Primary, HDR, and RSS
Image shows pictorial representation of primary, HDR, and RSS secondary, and RSS secondary has logical log staging area

The moment you block the RS secondary server using the onmode -c block command, the RS secondary will request a checkpoint from the primary server. Once a checkpoint is transferred to the RSS node, it will start staging the log records coming from the primary in a log staging area at RS secondary server. The RS secondary server continues to receive logical logs from the primary but does not apply them. The primary server will not know that logical logs are staged at RS secondary server. Once the archive checkpoint is completed in the RSS node, it will stop applying any other further log records, leaving the server in the correct state to copy the chunks by an external utility.

After completion of copying chunks using external utility, when onmode -c unblock is executed, the RSS server will start applying log records from the logical logs that were staged.

Once the staged logical log records are applied, it continues to resume applying the log records as soon as they are received from the primary.


Detailed steps for performing external backup on RS secondary node

Below steps clearly describe performing external backup on RS secondary.

1. Create a log staging directory with proper permission

The below example shows the directory logs_staged_dir is created under /work/harsha and the configuration parameter LOG_STAGING_DIR is set to valid path /work/harsha/logs_staged_dir.


Listing 10. Log staging directory

ls -ld /work/harsha/logs_staged_dir
drwxrwx---   2 informix informix     512 Jun  1 01:52 /work/harsha/logs_staged_dir/

onmode -wf LOG_STAGING_DIR=/work/harsha/logs_staged_dir

RS secondary server online log message when LOG_STAGING_DIR is set at RS secondary server.


Listing 11. RS secondary server online log

onstat -
IBM Informix Dynamic Server Version 11.50.FC6     -- Updatable (RSS) -- Up 04:14:04 --
 165576 Kbytes

06:38:53  Maximum server connections 1 
06:38:53  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 2, Llog used 0

06:39:54  Value of LOG_STAGING_DIR has been changed to /work/harsha/logs_staged_dir.

2. Block the server

Block the server using the onmode utility: onmode -c block 60. When this command is issued, it waits for checkpoint request from primary.


Listing 12. Primary and secondary logs after the server is blocked
			
Primary Server Online Log
-----------------------------------------------------------------------------------
06:40:41  A checkpoint request for an archive was received from a secondary server.
06:40:41  Checkpoint Completed:  duration was 0 seconds.
06:40:41  Fri Jun  3 - loguniq 4, logpos 0xe9018, timestamp: 0x3542e Interval: 30


RS Secondary Server Online Log
-----------------------------------------------------------------------------
06:38:53  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 2, Llog used 0

06:39:54  Value of LOG_STAGING_DIR has been changed to /work/harsha/log_staged_dir.
06:40:41  Secondary Delay or Stop Apply: 
Using the directory /work/harsha/log_staged_dir/ifmxlog_64.
06:40:41  A Request to reset the log position to 4:0 was sent to the primary server.
06:40:42  Staging of logical logs successfully started.
06:40:42  Checkpoint Completed:  duration was 0 seconds.
06:40:42  Fri Jun  3 - loguniq 4, logpos 0xe9018, timestamp: 0x35431 Interval: 31

06:40:42  Maximum server connections 1 
06:40:42  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 0, Llog used 0

06:40:43  The external backup is blocking the database server at checkpoint
 4:0xe9018.
06:40:43  Dynamic Server blocked.

 onstat -

IBM Informix Dynamic Server Version 11.50.FC7     -- Updatable (RSS) -- Up 04:16:53
 -- 165576 Kbytes
Blocked:ARCHIVE 
			

3. Copy all RSS chunks

This is the time to make a copy of all RSS chunks using OS commands such as cp, tar, gzip, etc. During external backup, logical logs coming from primary are staged at RS secondary server. The server automatically creates a directory named ifmxlog_XX.

After the RS secondary server is blocked, the logical logs coming from the primary server are staged at the RS secondary server. The RS secondary server is blocked while logical log is at highlighted position shown below.


Listing 13. Logs being staged
			
Primary Logical Logs
-----------------------------------------------------------------------------
Logical Logging
Buffer bufused  bufsize  numrecs    numpages   numwrits   recs/pages pages/io
  L-2  0        32       217224     17834      5488       12.2       3.2     
        Subsystem    numrecs    Log Space used
        OLDRSAM      217141     29202988      
        HA           83         3652          

address       number   flags    uniqid   begin         size     used    %used
4b7ccf50      1        U------  1        1:25263       5000     5000   100.00
4b7ccfb8      2        U------  2        1:30263       5000     5000   100.00
4b7b3f50      3        U------  3        1:35263       5000     2030    40.60
4b7b3fb8      4        U-----L  4        1:40263       5000     5000   100.00
4b687b48      5        U---C--  5        1:45263       5000      804    16.08
4b687bb0      6        A------  0        1:50263       5000        0     0.00
 6 active, 6 total


RS Secondary Logical Logs
-----------------------------------------------------------------------------
Logical Logging
Buffer bufused  bufsize  numrecs    numpages   numwrits   recs/pages pages/io
  L-1  0        32       0          0          0          0.0        0.0     
        Subsystem    numrecs    Log Space used

address       number   flags    uniqid   begin         size     used    %used
4b686f98      1        F------  0        1:25263       5000        0     0.00
4b7ccf50      2        F------  0        1:30263       5000        0     0.00
4b7ccfb8      3        U------  3        1:35263       5000     2030    40.60
4b7b3f50      4        U---C-L  4        1:40263       5000      280     5.60
4b7b3fb8      5        A------  0        1:45263       5000        0     0.00
4b687518      6        A------  0        1:50263       5000        0     0.00
 6 active, 6 total
	

Here, log 4 and log 5 that are coming from primary server are staged at RS secondary server.


Listing 14. File listing showing logs 4 and 5
			
/work/harsha/log_staged_dir/ifmxlog_64> ls -lrt
total 11636
-rw-rw---- 1 root informix 10240000 Jun  3 06:50 ifmxUniqueLog_4
-rw-rw---- 1 root informix  1652736 Jun  3 06:51 ifmxUniqueLog_5
			

4. Unblock RS secondary server using onmode utility

When the server is unblocked, staged logical log files from primary server at RS secondary server are automatically removed after being applied.


Listing 15. Using the onmode utility to unblock the applying of logs

onmode -c unblock

IBM Informix Dynamic Server Version 11.50.FC7     -- Updatable (RSS) -- Up 04::47
 -- 165576 Kbytes

Message Log File: /usr2/MAIN1110/dbspaces/demo/rss/online.log
06:56:33  Dynamic Server unblocked.
06:56:34  Checkpoint Completed:  duration was 0 seconds.
06:56:34  Fri Jun  3 - loguniq 4, logpos 0x82e018, timestamp: 0x3c940 Interval32


/usr2/MAIN1110/dbspaces/demo/rss/log_staged_dir/ifmxlog_64> ls -lrt
total 0

Restoring external backup taken on RS secondary server

External backup taken on RS secondary server can be restored just like restoring backups taken on stand-alones or primary servers.

External backup taken on RS secondary server:

  • Can be restored to create a primary server.
  • Can be restored to create a RS secondary server.
  • Can be restored to create a HDR secondary server.
  • Can be restored to create a stand-alone server.

Example 1: Restoring a backup taken on RS secondary to create new RS secondary server

  1. Copy all backed-up chunks to ROOTPATH location using UNIX® commands such as cp, tar, and unzip.
  2. Use the ontape or onbar utility to restore.


    Listing 16. Using ontape or onbar to restore
    
    $ontape -p -e or onbar -r -p -e
    Physical restore complete. Logical restore required before work can continue.
    Program over.
    
    Server online Log:
    ---------------------------------------------------------------------
    IBM Informix Dynamic Server Version 11.50.FC6     -- Fast Recovery -- 
    Up 00:00:15 -- 149192 Kbytes
    04:25:36  Event notification facility epoll enabled.
    04:25:36  IBM Informix Dynamic Server Version 11.50.FC6     
    Software Serial Number AAA#B000000
    04:25:37  IBM Informix Dynamic Server Initialized -- Shared Memory Initialized.
    
    04:25:37  Started 1 B-tree scanners.
    04:25:37  B-tree scanner threshold set at 5000.
    04:25:37  B-tree scanner range scan size set to -1.
    04:25:37  B-tree scanner ALICE mode set to 6.
    04:25:37  B-tree scanner index compression level set to med.
    04:25:37  Data replication type and state information reset. To start DR, use
              the 'onmode -d' command and wait for the pair to be operational,
              before shutting down the database server
    
    04:25:37  Physical Recovery Started at Page (1:926).
    04:25:37  Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored.
    04:25:37  Dataskip is now OFF for all dbspaces
    04:25:37  Restartable Restore has been ENABLED
    04:25:37  Recovery Mode
    04:25:38  External restore started.
    04:25:38  External restore  Completed.
    

  3. Run the onmode utility: onmode -d RSS [primary server alias].

Example 2: Creating an HDR secondary server from a backup taken on RSS

  1. Follow Step 1 from Example 1.
  2. Run the onmode utility: onmode -d secondary [primary server alias].

Important: Performing external backup on an RSS server containing non-logged objects such as non-logged blobspaces, and non-logged databases will not be allowed. onmode -c block fails if the RSS server has non-logged objects.


Listing 17. Attempt to block logging on a non-logged database

onmode -c block
onmode could not block server.

07:08:48  Maximum server connections 1 
07:08:48  Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0,
 Plog used 21, Llog used 0

07:08:48  The nonlogging database, 'data', is preventing the backup 
on the secondary server.

Questions

What happens when the primary server fails while logs are staged at RS secondary during external backup operation is in progress?

Primary fail-over to a new secondary node (SDS or HDR) will not impact the archive, or be impacted by the archive, while an external archive is being taken on an RSS node. SDS or HDR secondary node can take over primary, and RS secondary server continues to receive logical logs from new primary server.

How is restoring external backup done on the RS secondary server?

External backup taken on the RS secondary server can be restored just like external backups taken on stand-alones or primary servers: using ontape or onbar utilities. Example: ontape -p -e or onbar -r -e.


How to avoid blocking checkpoints on primary in an HDR environment

To understand how to avoid blocking checkpoints, we need to know how blocking checkpoints occur. Please refer to an article written and published by Madison Pruet, Informix replication and availability architect, on how HDR works and when blocking checkpoints occur (see Resources).

Blocking checkpoints occur if we don't process the things efficiently on secondary, and the problem in secondary will back-flow and impact primary (blocks primary). To avoid blocking checkpoints, we enable log staging at HDR secondary nodes. Let's learn how to enable log staging and how it works.

Prerequisites

The following prerequisites are important for enabling logical log staging during checkpoint:

  • LOG_STAGING_DIR must be set to a valid value in the HDR server.
  • If the directory specified by LOG_STAGING_DIR does not exist, then server creates automatically.

Note: We cannot demote HDR secondary to RSS secondary while there is a logical log staged.

How log staging works during checkpoint

The below diagram represents primary and HDR secondary nodes, and steps of working log staging during checkpoint.


Figure 2. Pictorial diagram of primary and HDR server
Diagram shows primary and HDR secondary server, which has logical                     log staging area; diagram explains how log staging happens during checkpoint on HDR secondary

  • When the secondary encounters a checkpoint, it enters a buffering mode. While in buffering mode, the secondary stages any log page data coming from the HDR, onstat -g dri ckpt shows the log staging statistics as shown below.
  • Primary in to files in the staging directory. It immediately acknowledges receipt of the log data buffer to the primary before it actually applies or stages it.
  • When checkpoint processing at the secondary completes, secondary goes to drain mode. In this mode, the secondary reads data from the staging file and receives new data from the primary directly. It appends it to the log staging area.
  • The secondary keeps data for each log file in a separate staging file. It immediately deletes the files as soon as it has read all data during drain mode.
  • Once the staging area is empty, secondary resumes normal operation.
  • The secondary may encounter another checkpoint log record while it is in drain or buffering mode, and you may see pending checkpoint increased. If another log checkpoint record is encountered while it is in drain mode, it goes back to buffering mode. That means the state may change to buffer-drain or drain-buffer or drain-normal.

Listing 18. Buffering

During normal mode:
----------------------
onstat -g dri ckpt

Data Replication at 0x4c057028: 
Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes 
HDR Secondary on mysore 16 / 694 N 

DRINTERVAL 30 
DRTIMEOUT 30 
DRAUTO 0 
DRLOSTFOUND /vobs/tristarp/sqldist/etc/dr.lostfound 
DRIDXAUTO 0 
ENCRYPT_HDR 0 

DR Checkpoint processing:
Save State N
Pages Saved 0
Save Area none
Received log id, page 17,68
Saved log id, page 0,0
Drain log id, page 0,0
Processed log id, page 17,68
Pending checkpoints 0

[pinch-cdrkojinnbc] (cdrstat) 1025 % 


During buffering and draining mode:
---------------------------------------
onstat -g dri ckpt 

Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes 
HDR Secondary on mysore 19 / 318 N 

DRINTERVAL 30 
DRTIMEOUT 30 
DRAUTO 0 
DRLOSTFOUND /vobs/tristarp/sqldist/etc/dr.lostfound 
DRIDXAUTO 0 
ENCRYPT_HDR 0 

DR Checkpoint processing:
Save State D
Pages Saved 25147
Save Area /notnfs/pinch/hdrservers/dbs/amsterdam_sec_drckptlogs/ifmxhdrstage_72
Received log id, page 53,29
Saved log id, page 53,29
Drain log id, page 19,383
Processed log id, page 19,382
Pending checkpoints 2
Pending ckpt log id, page 36,1 53,1

[pinch-cdrkojinnbc] (cdrstat) 1029 %   

Questions

What happens when primary fails while logs are staged at HDR secondary?

During manual fail-over, onconfig parameter DRAUTO is set to zero (DRAUTO =0). HDR secondary has logs staged, and primary server went down while in drain mode. Execution of onmode -d standard on HDR secondary completes until HDR secondary completes reading (drain) from logs staged (waits till HDR secondary comes to normal mode).

During automatic fail-over, onconfig parameter DRAUTO is set to one (DRAUTO =1). HDR secondary has logs staged, and primary server went down while in drain mode. HDR secondary goes to standard mode after reading all logs from staged.

What happens when HDR secondary is restarted while HDR secondary logs staged data?

Restarting HDR secondary (onmode -kuy followed by oninit) while it has staged data will discard staged data and request logs from primary.

What happens if LOG_STAGING_DIR has been changed to a new directory while HDR secondary has logs staged?

Changing LOG_STAGING_DIR to new directory while HDR secondary has logs staged will continue to stage in the old directory until the next checkpoint (or the old area is completely drained), then it stages at new area.


Conclusion

This article provided an overview of the enhancements to the secondary servers of an Informix MACH 11 cluster. We have explained the working and code syntax to stop/suspend the replication in an Informix cluster, which helps DBAs quickly recover from problems. We also explained how an external backup can be taken on an RS secondary server. This primary helps the DBA reduce the load on the primary server. We concluded with a detailed explanation on blocking checkpoints on primary in an HDR environment.


Resources

Learn

Get products and technologies

Discuss

About the authors

Harshavardhan Changappa

Harshavardhan Changappa is a certified systems administrator for Informix databases. He has been working with the Informix development team for over four years now. He has worked with quality assurance and development teams and has been instrumental in developing functional and Integration test cases for various features of IDS. He's widely recognized for his contribution in the areas of IDS replication and high-availability, like flexible grid, transactional survival, ER log lag action, etc.

Bharath Sriram

Bharath Sriram worked with the integration team and on OAT while at IBM. He is certified in systems administration for IBM Informix Dynamic Server V11. He holds a master's degree in computer science from Ohio State University. His research interests include information retrieval and text mining in social networks.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=741702
ArticleTitle=What's new in Informix MACH 11 high-availability features for secondary servers
publish-date=07212011
author1-email=vardhan.harsha@in.ibm.com
author1-email-cc=
author2-email=bharath.sriram@gmail.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers