IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & industry solutions      Support & downloads      My IBM     
developerWorks  >  Blogs  >   developerWorks

author The Replication Roundtable ---replication solutions available with the Informix Dynamic Server

Madison is a Senior Technical Staff Member (STSM) with IBM and is the replication architect for the Informix Dynamic Server. He has not only been responsible for the development of much of the ER and MACH11 functionality, but he has also played significant roles for non-replication functionality such as large chunk and network encryption. He lives in Flower Mound, Texas with his wife, Colleen.



Saturday July 04, 2009

IDS 10.50xC4 Enhancements

ER Enhancements in 11.50xC4

ER Enhancements 
in IDS 11.50xC4

XML ATS/RIS files
New Event Alarms
Delete Wins
Background Sync / Check
Named Tasks
Cdr Stats
Parallel Sync / Check
Checks with in-flight data
Repair Verification
Role Separation

We are getting fairly close to releasing IDS11.50xC5, and it contains some fairly significant replication enhancements.  However, before I get into the new items in 11.50xC5, I want to spend a bit of time discussing the things which were added in 11.50xC4.  We could think of this as sort of a 'prelude' to the new stuff which will be in 11.50xC5.  In this posting, I'm only going to give a brief description of each of the enhancements.  Later postings will go into more detailed usage of each of the enhancements.

XML ATS/RIS files

When a transaction can not be fully applied on a node, an ATS or RIS file can be generated.  This file will identify the rows which could not be applied and the nature of the failure.  Up till now, the ATS/RIS file has been a simple text file.  It does contain the information which might be needed to perform analysis and repair on the failed apply, but since it is in a text format, it is a bit tedious to parse by a user application.  In 11.50xC4, we have made it possible to create an XML document instead of a basic text file.  By doing this, it is easier for applications to process the ATS/RIS file, especially for a JAVA application.  

This enhancement is activated by a new option on the cdr define server and cdr modify server commands.

-X        --atsrisformat=[text|xml|both] ATS and RIS file format


New Event Alarms

We have added several new event alarms to ER.  These include state change alarms as well as alarms which fire as part of the creation of an ATS/RIS file.  By combining the XML ATS/RIS files with the new alarms, it is fairly easy to add user written hooks which will automatically perform analysis on any apply failure.

Delete Wins

With timestamp conflict resolution, if a row is missing on a target node and an update operation is received,  the apply will convert the update into an insert operation.  This may be fine in most cases, but for others it means that a row may re-appear after it has been deleted.  In order to prevent such a re-appearance of a deleted row, the replicate can be defined with conflict resolution set to 'deletewins'.  Other than the fact that deletewins does not convert an update into an insert, it behaves as timestamp conflict resolution.  


Background Sync/Check

The cdr check and cdr sync commands can now be executed as a background task using the server admin component.  Up till now the checn check and cdr sync commands were done as a foreground task which tied up the invoker's session.  To request that the check and/or sync be done as a background task, the user simply includes the --background (-B) option on the cdr check and cdr sync command.  This options is available for both replicate and replset commands.  If doing a background sync/check, it is wise to also use the check and/or sync as a named task ( see below).

Named Tasks


By making the sync/check a named task, we make it possible to view the progress of the sync or check command when it is performed as a background task.  To name the sync or check command, use the --name=<task_name> option on the sync or check task.  To monitor the progress of the task, use the cdr stats check or cdr stats sync command.  (see below).

cdr stats check/sync

The cdr stats check and cdr stats sync commands allow the user to monitor the progress of the running check and/or sync named task.  As part of the cdr stats command, we also display an estimated completion time based on the rate that we are processing data and the amount of data to be processed.  Since there is a repeat option as part of the command, we can also see a running progress indication as the work is being done.

Parallel  sync/check of replsets

When performing a sync or check of replsets, we have added the ability to perform the operation in parallel. We do not perform the check of the individual replicate in parallel, but do perform the check/sync of multiple replicates within a replset in parallel.  This enhancement is invoked by using the  --process= ### (-p) option in the cdr sync replset or cdr check replset command. The parameter to the --process option is the number of processes which will be spawned to perform the check or sync.  As a rule of thumb, the number of processes should not exceed the number of processors available,.

In-Flight Data

One of the problems with performing cdr check is that it does not consider in-flight operations.  It needs to be understood that on an active system, data will always be in a state of flux and that data on various nodes are always somewhat 'out of sync'.  The degree of this is more or less determined by the latency between each of the nodes.  Up do now, cdr check did not consider in-flight operations and would report things as being out of sync when in fact the only problem was that the update operation had simply not yet been received on one of the nodes.  

We have added a 'recheck' for rows which we think are out-of-sync.  By default, we only recheck once and then only after one second has passed.  This recheck can be adjusted by using the --inprogress=### (-i) where ### represents the number of seconds that we will attempt to retry the check before we consider the row to be truly out-of-sync.  Using this option should reduce, if not eliminate the false failures that would otherwise be reported from a cdr check operation

Repair Verification

When performing a check with repair (cdr check repl/replset --repair),  we now perform verification of any out-of-sync rows which were repaired.  This is subject to the --inprogress option (see above).  If we are not able to verify any of the repair work which was done, then we will display the rows which could not be repaired and successfully verified.

Role Seperation

Up to now, all ER administrative commands had to be performed by user informix.  With 11.50xC4, we have extended that to support any user having a DBA role.  


Categories : [   ER  ]

Jul 04 2009, 02:59:43 PM EDT Permalink



Wednesday July 01, 2009

Constraints and ER

Constraints and ER

Constraints and ER

One of the characteristics of ER  is that it uses deferred constraint checking.  This means that constraints are checked as part of the commit and not as part of the update operation.  Defered constraint checking has a huge advantage for ER because that means we can dynamically increase the parallelism by the apply while still supporting constraints such as referential integrety.  For instance, this allows us to apply the transaction which  might create a parent row and a seperate transaction which creates a child row at the same time.  All we have to do is to ensure that 'parent transaction' commits prior to the 'child transaction'.  Since ER uses deferred constraint checking, by coordinating the order of commits, we can make sure that the parent row will exist prior to the deferred constraint checking being performed by the child transaction.  This significantly improves the apply of ER as it allows the apply to take maximum advantage of the resources which are available on the target node.  The transactions on the source were goverened by the application.  The application controlled the order of the activities on the source node.  That would mean that only the resource usage was limited to what the application could take advantage of.  Usually this will mean serialized operations.  However, the target has no such restriction and thus can take advantage of all resources.  That's why generally ER can catch up after an outage in a much shorter time than the origional activity took.

Source Activity
Let's see how this works.  Figure 1 shows what ER Applywill generally occur on a source node when a session performs two transactions in which the first will insert a parent row into one table and then commit.  The session then will open a subsequent transaction in which it will insert a child row into another table in which there is a relationship with the parent table (referential integrety).  This requires that two serialized transactions be executed.

On the other hand, figure 2 describes what will happen on the target node.  ER does not know that the original transactin was done by a single session within two transactions, nor does it care.  The goal is to apply the operations on the target as quickly as possible.  That means that the ER apply transactions must take advantage of all of the resources available.  So what we do is to sense that there is a referential integrety relationship between the parent and child tables and then to guarentee that the parent transaction will commit prior to the child transaction.  Since ER is using deferred constraint checking, that means that the constraint rules are checked during the commit.  Therefore, the apply is able to overlay the transactions and only serialize the commits.

It is generally best that systems using ER use constraints in general - especially if there are unique indexes besides the primary key.  By making those unique indexes into unique constraints, ER is better able to ensure that the transaction will be successfully applied when the unique columns are being updated.  In recent versions of ER, we have added startup warnings if we detect that a unique index exists which is not part of a unique constraint.  


Categories : [   ER  ]

Jul 01 2009, 11:19:11 AM EDT Permalink



Wednesday June 24, 2009

HDR Performance Thoughts

HDR Performance

HDR Performance

Thoughts on HDR Performance

The Spice Must Flow
Indexed Spices
Half 'n Half
Minimize the Secondary Checkpoint
Maximize Parallelism

High available data replication has been part of IDS since 6.0.  It has proved successful and is deployed at many customer sites.  Yet I still occasionally hear comments about performance issues when HDR is turned on.  So I thought I'd discuss some thoughts about how to improve the performance of the HDR pair.


"The Spice Must Flow"

Frank Herbert - 'Dune'


In the novel, "Dune", there is a phrase which is constantly repeated - "The spice must flow".  Well - a similar thing must happen with HDR, except with HDR it's "The Logs Must Flow".   Let me explain by examining the following diagram.

HDR DesignHDR works by transferring the logs to the secondary where the recovery component applys those logs.  The secondary is in perpetual recovery mode.  

The logs are transfered to the secondary by copying the log buffer into an HDR transmit buffer as part of the flush of the log buffer to disk.  If using synchronous mode of HDR, then the HDR Transmit buffer is immediately scheduled for transmission and the thread which caused the log flush is held until the ACK of that transmission is received from the secondary.

If using asynchronous HDR, then the HDR transmit buffer is not scheduled for transmission until either the transmit buffer is full or until it has aged to the DRINTERVAL time limit.

While the HDR transmit buffer is sized the same size as the log buffer, it is not a 1 to 1 relationship.  If a log buffer is flushed with only part of the buffer being filled (as is often the case with unbuffered committed transactions), then only part of the HDR transmit buffer is used.  When the next log flush occurs, we could then copy the new log pages into  the remainder of that HDR transmit buffer.

The transmission of the HDR transmit buffer is sent to the secondary using a half-duplex protocol.  That means that we can not send the next buffer until we receive an ACK for the previous transmission.  While this may increase the delay in sending a log buffer, it also is necessary to ensure one of the main characteristics of HDR.  That is the assurance that the secondary can take on the role of the primary with no loss of committed transactions.
In order to receive the HDR transmit buffer, the HDR receive thread must first obtain a buffer from the HDR receive buffer pool.  If it can not obtain a buffer, it will wait until the recovery threads place and empty buffer into the pool.  When the receive thread receives a buffer from the primary, it will ACK the buffer and then queue the received buffer of log pages to the recovery component.  While it is a bit more complex, for our purposes we will think of the recovery threads consisting of a main recovery thread an a bunch of worker recovery threads.  

The main recovery thread will split the log page into log records and then queue that log record to one of the worker recovery threads based on the partition number of that log record.  All of the log records for a given partition will be applied serially by the same worker recovery thread.  Some of the log records are processed by the main recovery thread only after all of the worker threads have processed the log records queued to them.  These log records can be considered as a globally serialized log record.  The checkpoint log record is one such log record.  

After the main recovery component is finished with an HDR buffer, that buffer is placed in the receive HDR buffer pool.

From this we can see that if we don't process things efficiently on the secondary, then we will not be able to return an HDR receive buffer into the receive queue quickly.  If we can't get the receive buffer into the HDR receive buffer pool quickly, then we will  not be able to easily get an HDR receive buffer in which to receive the data transmission.  If we don't receive a transmission quickly, then we can't send the next buffer from the source due to half-duplex transmission.  If we can't send a buffer from the source, then we can't return that full HDR transmit buffer to the HDR transmit buffer pool.  If we can't return the HDR transmit buffer quickly to the HDR transmit buffer pool quickly, then the log flush logic is unable to get an empty HDR transmit buffer easily.  If the log flush logic is not able to easily get an empty HDR transmit buffer, then logging will be blocked.  A problem with the apply on the secondary can back flow and impact the primary. So as Mr. Herbert said, "The spice must flow".

Indexed Spices

There is an additional consideration which needs to be considered and that's what happens when an index is created.  At the end of the index creation, we transfer the index to the secondary server.  This is done by having the thread which created the index to copy the newly created index into an HDR transfer buffer and sending the index to the secondary.  Because of the increased usage of the transmission buffers by the index transfer, there can be some degredation in the log transfer when an index is created.  One of the things that we did with IDS 11 was to implement a feature called Index Page Logging.  This allows the transfer of the log pages to be done by placing the index pages into the log itself.  To avoid a long transaction, the index page logging is actually done within multiple transactions.  

While this does increase the log consumption, it also ensures that the index transfer does not use all of the transfer buffers and allows non-index log pages to be intermixed with the index log pages.  This tends to equalize the impact of the index transfer with the user esql threads.

Half 'n Half

As was mentioned earlier, the transmission of HDR buffers to the secondary uses a half-duplex protocol.  This is absolutly critical in order to support failover with no loss of committed transactions, but is also critical if we want to do a 'flip-flop'  That is the case when the secondary becomes the primary and the primary becomes the secondary.  With IDS11, we implemented the RSS secondary which does not use half-duplex protocol, but instead uses full duplex with flow control.  This eliminates the impact of half-duplex protocol.  Additionally the RSS node does not block the log flush threads.  Because of this, the RSS node can be considered when willing to use asynchronized mode with HDR.

Minimize the Secondary Checkpoint Time

Since the checkpoint is a globally serialized operation on the secondary,  care must be taken that the checkpoint does not cause a back flow.  When the checkpoint is applyed on the secondary, it is a blocking checkpoint.  The main amount of work of the secondary checkpoint is involved in flushing all of the dirty buffers.  

There are two main flushing algorthms that we use.  The checkpoint will use 'chunk writes' which involvs first sorting the pages for a specific chunk.  By doing that, we are able to take advantage of write buddy bunching - and thus flush the buffers quicker.  The other form of flushing is called LRU writes and are not as efficient as chunk writes, but will be performed inbetween checkpoints to keep the buffer relative clean.  You can improve the time that it takes to perform the checkpoint by decreasing the lru_min_dirty and lru_max_dirty items in the BUFFERPOOL. While this will decrease the work done during a checkpoint, it will also increase the work done inbetween checkpoints.  Also to maximize performance, you might want to consider making the number of lrus to be about 2 times the number of CPUVPS.  This should make it a bit easier to perform LRU page flushing.  We are currently examining ways in which we can minimize the impact of the checkpoint on the secondary server.

Maximize Parallelism

We might get the log pages to the secondary really quick and with very few bottlenecks, but if we can't apply the log records on the secondary as fast on the primary, then we will run into back flow.  So it is very important to take advantage of all of the resources that the secondary so that the performance will at least match the primary.  That means that if we have 12 CPUVPS on the primary, we probably need to have 12 CPUVPS on the secondary.  But there is an additional consideration.  We need to be able to utilize the parallel recovery apply to make it easier to maintain a balance.

The number of recovery threads is determined by the onconfig parameter OFF_RECVRY_THREADS.  As a rule of thumb, there should be at least 3 times the number of recovery threads as there are CPUVPS.  The reason is that 1) the log records are spread across all of the recovery threads and 2)  there is an increased probability of having to do a read into the buffer in order to process that log record.  If we only have as many recovery threads as CPUVPS, then there is going to be a lot of time waiting for read completion.  By increasing the number of recovery threads to at least 3X the number of CPUVPS, then we can increase the probability of being able to work on another log record while waiting for the IO completion on another.  

If there are indexes on a table, then we should make sure that the index is located in a different partition than the data.  (i.e. IDS 6 style of indexes).  The reason for this is that if we have version 5 style of indexes then the index is in the same partition as the data. Since the passing out of the log records is based on the partition, then if the index is in the same partition as the data pages, then the index log records and the data log records have to be processed by the same recovery thread - which decreases the ability to utilized all of the resources on the secondary.   If the indexes are in the same partition as the data pages, then the apply is done serially.

Another thing that can be done to improve apply is to partition tables which are highly updated.  Again, by adding fragmentation of both the data and the indexes, we can maximize the degree of parallelism.  



Categories : [   HDR  |  MACH11  ]

Jun 24 2009, 09:53:51 PM EDT Permalink

Previous month
  July 2009
S M T W T F S
   1234
5
6
7891011
12131415161718
19202122232425
262728293031 
       
Today

RSS for

RSS for

Favorites

Categories
ER (2)
HDR (1)
MACH11 (1)

Recent Entries
IDS 10.50xC4 Enhancements
Constraints and ER
HDR Performance Thoughts

Blogs I read

Special offers
Save on Rational testing software
Download trial versions of popular IBM software
Register for the DB2 Information Management Technical Conference

More offers


 
    About IBM Privacy Contact