Data resilience

Start of changeData resilience is the availability of the data that is needed in a production environment. There are several technologies, which address the data resilience requirements that are described in the “Benefits of High Availability” section. These technologies can be split into two main categories on IBM® i – logical or software replication and hardware or disk replication.End of change

Logical replication

Logical replication is a widely deployed multisystem data resiliency topology for high availability (HA) in the IBM i space. It is typically deployed through a product that is provided by a high availability independent software vendor (ISV). Replication is run (through software methods) on objects. Changes to the objects (for example file, member, data area, or program) are replicated to a backup copy. The replication is near or in real time (synchronous remote journaling) for all journaled objects. Typically if the object such as a file is journaled, replication is handled at a record level. For such objects as user spaces that are not journaled, replication is handled typically at the object level. In this case, the entire object is replicated after each set of changes to the object is complete.

Most logical replication solutions allow for additional features beyond object replication. For example, you can achieve additional auditing capabilities, observe the replication status in real time, automatically add newly created objects to those being replicated, and replicate only a subset of objects in a given library or directory.

To build an efficient and reliable multisystem HA solution using logical replication, synchronous remote journaling as a transport mechanism is preferable. With remote journaling, IBM i continuously moves the newly arriving data in the journal receiver to the backup server journal receiver. At this point, a software solution is employed to “replay” these journal updates, placing them into the object on the backup server. After this environment is established, there are two separate yet identical objects, one on the primary server and one on the backup server.

With this solution in place, you can rapidly activate your production environment on the backup server by doing a role-swap operation.

A key advantage of this solution category is that the backup database file is live. That is, it can be accessed in real time for backup operations or for other read-only application types such as building reports. In addition, that normally means minimal recovery is needed when switching over to the backup copy.

The challenge with this solution category is the complexity that can be involved with setting up and maintaining the environment. One of the fundamental challenges lies in not strictly policing undisciplined modification of the live copies of objects residing on the backup server. Failure to properly enforce such a discipline can lead to instances in which users and programmers make changes against the live copy so that it no longer matches the production copy. If this happens, the primary and the backup versions of your files are no longer identical.

Another challenge that is associated with this approach is that objects that are not journaled must go through a check point, be saved, and then sent separately to the backup server. Therefore, the granularity of the real-time nature of the process may be limited to the granularity of the largest object being replicated for a given operation.

For example, a program updates a record residing within a journaled file. As part of the same operation, it also updates an object, such as a user space, that is not journaled. The backup copy becomes completely consistent when the user space is entirely replicated to the backup system. Practically speaking, if the primary system fails, and the user space object is not yet fully replicated, a manual recovery process is required to reconcile the state of the non-journaled user space to match the last valid operation whose data was completely replicated.

Start of changeLogical replication solutions can typically cover all types of outages, depending on the implementation. Recovery point objective (RPO) can be 0 if the distance between systems allows for synchronous remote journaling and all replicated objects are journaled. Using asynchronous remote journaling and having objects that must be replicated from the audit journal increases the RPO.End of change

Another possible challenge that is associated with this approach lies in the latency of the replication process. This refers to the amount of lag time between the time at which changes are made on the source system and the time at which those changes become available on the backup system. Synchronous remote journal can mitigate this to a large extent. Regardless of the transmission mechanism that is used, you must adequately project your transmission volume and size your communication lines and speeds properly to help ensure that your environment can manage replication volumes when they reach their peak. In a high volume environment, replay backlog and latency may be an issue on the target side even if your transmission facilities are properly sized.

Start of change

Hardware replication

Hardware replication is done at the operating system or disk level instead of at the object level. An advantage of these technologies over logical replication is that the replication is done at a lower level, and when done synchronously, there is a guarantee that both copies of the data are identical. The disadvantage of the technology is that the data is only accessible from one copy, and the second copy cannot be used during active replication.

Within hardware replication, there are again two categories, independent auxiliary storage pool (IASP) replication and full system replication. IBM PowerHA® SystemMirror® for i delivers several hardware replication technologies based on independent auxiliary storage pools or IASPs. An independent ASP or IASP is a set of disk units, which can be configured separately from a specific host system and can be independently varied on or off. An IASP is used to segregate application data from the operating system. Thus, the application data can be replicated by using hardware replication while not replicating the operating system. The IBM i implementation of IASPs supports both directory objects (such as the integrated file system (IFS)) and library objects (such as database files). While migrating the application data into the IASP is a separate step in setting up the environment, there are several advantages to only replicating the data and not the operating system. Planned and unplanned switches to the backup system are faster than if the entire system is replicated. The backup system contains a separate copy of the OS and can be used for other work while it is also used as a backup system for production. These technologies can be used for planned OS upgrades since there are again two copies of the operating system.

If migrating the application data into an IASP is not feasible, it is also possible to use hardware replication at the system level, typically called full system replication. Geographic mirroring, which is an IBM i replication technology, can be used in an i hosted environment to replicate a production system. The replication technologies that are provided by the IBM storage systems can also be used to replicate an entire system. While easier to initially set up, full system replication does require more bandwidth than IASP-based replication. Full system replication is considered more of a disaster recovery technology than high availability, since there is only one production environment and it must be IPL'd on another physical system for a planned or unplanned outage. There are tools and service agreements available from IBM Lab Services, which helps to automate and customize a full system replication environment if wanted.

End of change