IBM Support

50 DB2 Nuggets #4 : Tech Tip - Explaining Blocked Transaction on HADR Primary - Brewer's CAP Theorem

Technical Blog Post


Abstract

50 DB2 Nuggets #4 : Tech Tip - Explaining Blocked Transaction on HADR Primary - Brewer's CAP Theorem

Body

There are situations where in a HADR environment, transaction on the Primary are blocked(applications will seem to hang) when the HADR database is in peer state and there is a network event (network congestion/network down or the Primary does not receive ACK from the Standby-Standby log buffer full). This can be explained with the help of Brewer's CAP theorem. 
 
Brewer's CAP Theorem:
Brewer's CAP theorem states that,  in a Distributed Computer System, we can guarantee only two of the following simultaneously:
  • Consistency  (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1. Venn Diagram above shows that there can be only 2 of Consistency(C), Availability (A) or Partition tolerance(P) at a given time 

 

Now putting that in terms of DB2 HADR for a LOGGED OPERATION:
 
Consistency (C) Data is the SAME on Primary and Standby
Availability (A)  Clients are ALWAYS able to connect and get a response (i.e no blocking)
Partition Tolerance (P) Network/Communication failure (think of Network Latency or Not receiving an ACK from Standby where required) 
      
 
Basically the theorem says that in the presence of a partition event (like a network outage between the 2 HADR  nodes) there MUST BE a tradeoff  between availability and consistency.
 
HADR sync modes are the primary dial to control C vs. A tradeoffs:
 
Sync Modes:
Transaction are committed on the Primary after relevant logs have been written to disk on Primary and: 
 
Sync Mode Sync Mode Implication on Standby  Relation to CAP
SYNC logs have been written to disk on Standby as well Highest C (Data guaranteed on Standby) 
NEARSYNC Logs are received into memory on Standby More C
ASYNC Logs sent to the Standby on the Network (Don't wait for ACK) More A
SUPERASYNC Don't wait for Standby!! Highest A (No guarantee)
 
In SYNC mode, the total time for a log write = Σ (Primary_Log_Write + Log_Send + Standby_Log_Write + Ack_Message)

image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2: Shows the various sync modes

 
Further fine tuning C vs A :

 

Parameters
Config Parameter Config Implication Relation to CAP
HADR_TIMEOUT How long to wait before considering communication failure. Till we reach this time we are still considered PEER More C
HADR_PEER_WINDOW In case of communication failure, how long to wait before actually breaking the connection  More C
HADR_PEER_WAIT_LIMIT How long to wait before disconnecting Primary and Standby if transaction are BLOCKED on the Primary  More A
 

When both HADR_PEER_WINDOW and HADR_PEER_WAIT_LIMIT are set:
The maximum time before Primary disconnects from Standby = HADR_PEER_WINDOW + HADR_PEER_WAIT_LIMIT

 

Additional Config Parameters:
Config Parameter Parameter Implication 
DB2_HADR_BUF_SIZE For increasing the HADR standby log receive buffer size
DB2_HADR_SOSNDBUF Maximizing TCP throughput
DB2_HADR_SORCVBUF Buffering for HADR log shipping
 
 
Higher DB2_HADR_BUF_SIZE and  DB2_HADR_SOSNDBUF/DB2_HADR_SORCVBUF implies that there is no P, hence there is C and A.
 
In situation where we see the Transaction being blocked on the Primary, it is because the HADR setup is in favor of C vs A.  For example:
 
1. Sync Mode = SYNC/NEARSYNC
2. There is a Partition event (standby log buffer full/network failure)
 
Then we see blocking on Primary i.e no A. In such situations we can tune:
       - HADR_PEER_WAIT_LIMIT                 -  (to get AP)
       - DB2_HADR_BUF_SIZE                   -  (to get CA)
       - DB2_HADR_SOSNDBUF/DB2_HADR_SORCVBUF  -  (to get CA)
 
 
Note: Starting v10.1 we have introduced a new parameter 'HADR_SPOOL_LIMIT'.  Log data that is sent by the primary is written, or spooled, to disk on the standby if it falls behind in log replay. The standby can later on read the log data from disk.
 
Bada Bing Bada Boom!! :)

 

References:
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.
http://berb.github.io/diploma-thesis/community/061_challenge.html

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11141432