High availability and disaster recovery considerations
You can connect IBM Robotic Process Automation on premises to your high availability (HA) environment. You can wire IBM RPA to your HA resources and manage your IBM RPA management servers by using techniques like load balancers, server scalability, and health monitoring.
For IBM RPA use cases that require minimal downtimes, it is desirable to configure the software and infrastructure to avoid outages. To determine whether your environment needs to be implemented at high availability, consider the impact of a failure of one or more system components or the entire system. High availability ensures that the system is available as often as possible, and provides means to recover from unexpected failures and avoid data loss. If your IBM RPA on premises environment fails, you can face significant impacts to your business continuity. To be successful, you must implement a business continuity plan that includes a high availability (HA) and disaster recovery (DR) solution.
Consider the impacts of a failure of one or more system components, or even of the entire system. You must analyze the risks and impacts of negative effects, and what kinds of failures the system can tolerate or not. Your analysis can dictate what HA topology best suits your needs.
Table of contents:
Pre-installation HA considerations
Consider the themes dealt with below before you start the installation, the correct schedule is an essential part of the project. A planning error can cause undesirable downtime and can be difficult to correct.
Table of contents:
- HA topology concepts
- Active-active HA topology
- Hot stand-by HA topology
- Message queue provider
- Sizing guidance
HA topology concepts
You can define your HA topology depending on your organization's needs. Two common approaches are the active-active and the hot-standby HA topologies. Your risk assessment should provided you with information to decide which approach is better.
Choose a topology that meets needs such as performance, maintainability, security, high availability, and scalability. Before installing IBM RPA, configure the environments where the client and server will be deployed. Follow the HA recommendations specified by the vendors of each of the services that must be previously installed. The following list defines the services you must configure for HA before installing IBM RPA:
- Redis™
- Microsoft™ SQL Server
- Messaging provider
- Network load-balancers and scaling tier
With these services validated, you can define which HA topology to use, and identify which resources you need to guarantee HA for IBM RPA. See how to install the IBM RPA server-side components at Installing the server. After installing the server side components, install the IBM RPA client-side components and validate the connection with a simple script to test connectivity. See how to install the IBM RPA client-side components at Installing the client.
After installing and validating the environment, install the server on the other hosts to ensure high availability. You can hook IBM RPA to one of the following topologies to enable high availability.
Active-active HA topology
The active-active topology needs multiple servers to operate simultaneously In this case, the network load-balancer that will define which workstations to direct the robots to run. It is worth mentioning that the number of servers depends on the needs of each environment and you need to take into account the incremental cost associated with each server that is added to the topology.
Hot stand-by HA topology
The hot stand-by topology does not use servers operating simultaneously all the time, a server is ready to be used but it is configured in the network load-balancer so that it is used only if necessary. In other words, servers are on stand-by mode and become operational, hot, when needed.
Message queue provider
For high availability environments, you must configure the external services considering disaster recovery and high availability. One important service is the message queue provider.
The IBM RPA server can use different message queuing providers: one to communicate with the system's API, and the other to control script runtimes and schedules. Both providers are independent, and they can be managed by different message queue providers. These providers are called system queue provider and default queue provider, and can be defined as follows:
-
System queue provider
This is the message queuing provider that manages system messages. Once the IBM RPA server is installed, you cannot change the system queue provider.
You can configure IBM MQ as the system queue provider during the installation of IBM RPA on premises server. You must configure your own IBM MQ implementation to do so. See the IBM MQ documentation and the Installing the server topic for more information. It is recommended that you use IBM MQ as the system queue provider to achieve high availability.
-
Default queue provider
This is the message queuing provider that manages scripts in operation, orchestration, and schedules. You can change the default queue provider on the tenant configuration page.
By default, the default queue provider is the same as the system queue provider defined during the server installation. You can choose between Microsoft Message Queuing™ (MSMQ) or IBM MQ for the system queue provider.
Sizing guidance
IBM RPA has hardware and software requirements for each of the IBM RPA versions. Refer to IBM Robotic Process Automation detailed system requirements for details.
For products of other vendors that IBM RPA depends on, you need to follow the vendors' guidance to configure HA. Refer to their documentations for instructions.
Post-installation considerations
After defining the topology used to maintain high availability and configuring the system, you need to define strategies to maintain the environment and recover from possible disasters.
Table of contents
- Maintenance considerations
- Disaster recovery
- Data in question
- Checkpoint consistency
- Infrastructure consistency
Maintenance considerations
Follow each release guidance on client and server maintenance.
- The customer's client and bot agents should stay within two minor releases of the server release.
- Subsequent releases of the server can cause extremely back-level clients to stop functioning.
- IBM RPA follows the IBM PSIRT process and product security processes, staying updated with software maintenance to ensure that the customer have the most secure practices.
- New features cannot be available to previous clients.
- As always, exceptional cases can require specific procedures.
Client-side maintenance can be installed independently when convenient for the device, device user or IT support team.
Server-side maintenance is made by applying the maintenance release to a specific installation on a specific host.
Disaster recovery
Various strategies can be employed to resume operations after the loss of a data-center or availability zone. Cost is a significant factor in selecting the most appropriate DR strategy for your deployment.
-
RTO – Recovery Time Objective
The Recovery Time Objective (RTO) is characterized by the time that is required for the entire system to be reestablished before resuming the execution of processes, in short, it would be the acceptable time for the system to be stopped with the least possible loss. -
RPO – Recovery Point Objective
The concept of Recovery Point Objective (RPO) encompasses the amount of data that was lost during system downtime, remember that the actual amount of data takes into account the date of the last system backup up to the time the system has been restored, the more recent the last backup, the smaller the amount of data lost, but remember that this has both a computational and financial cost, it is crucial to measure the actual cost of potential data loss that your company can pay, with this RPO in mind, you can make the best decision about the backup or disaster recovery solutions you need.
Knowing the RPO metrics serves to define the execution of data backups at the correct time interval, backups can be automated which makes the strategy easier to implement. However, the RTO strategy because it is directly related to the complete system reset, can be a little more complex.
Data in question
You must identify which data is most important for system recovery, which one needs to be moved from one data center to another. Define your replication strategy, assess the costs that the loss of each of these data can cause.
In addition to being in compliance with the requirements of the law in force in your country, regarding which data and information need to be stored.
-
SQL Server
There are several databases that are maintained by IBM RPA. They not only contain operational data like jobs, but they also contain resource configuration. In addition to storing all IBM RPA Control Center settings. -
System Queue Provider
The queue configurations must be replicated. The content of messages on the queues might also be crucial to operational restoration. -
File system
IBM RPA uses the file system to store log files, browser plug-ins, and optionally, audit logs. -
Resource Managers
IBM RPA allows business logic (WAL Scripts) to access queue providers and databases that can be required for operational restoration.
Checkpoint consistency
At any given point in time, data in one storage can relate to the data in other stores. Consider an IBM RPA bot that is scheduled to check email accounts for specific content, and then update a local application, which has data that is stored in a database. In this scenario:
- Data in the IBM RPA SQL Server databases records the job details, log of the robot.
- Email account can have been updated to move or delete processed emails.
- Local database for the local applications has been updated.
Backup strategies for these discrete systems need to be coordinated or at least considered when planning DR. IBM RPA bots developed by your organization may change the complexity of your DR considerations and developers should be informed of your DR strategy.
Infrastructure consistency
IBM RPA infrastructure settings are stored in the database. A failure in the database can cause these settings to be lost, consider this information when analyzing which data and systems need a backup system in another environment.
-
Client computer hostnames and configuration
The robots run on client resources, and these computers are known to IBM RPA from their hostname and contained certificates, which uniquely identify them. It is important to consider that your DR strategy includes a device address recovery strategy - for example a schedule has a set of computers and groups assigned, the robot is to run on that set of resources are these computers available in your DR topology? -
Client computer software
IBM RPA client computers can be configured with specific software distributions, which the bot relies on. For example, Microsoft™ Office, SAP GUI™. Symmetric configurations between the normal and DR sites are crucial to successful moves. You can leverage a common client ISO install image for simplicity or manually inventory the software and ensure your DR configurations are symmetric. -
Resource Manager Configuration
Queue providers and other configuration details are managed by the IBM RPA Control Center. The management features allow you to monitor the functioning of the whole system rather than evaluating one machine at a time, this can help the observation of which environment is having problem while the bots are running. For consistent results, you can use the same hostnames and credentials.