What is High Availability?
Wikipedia describes High availability as a characteristic of a system, which describes the duration (length of time) for which the system is operational. For the curious, the wiki page also provides the downtime for the availability goals.
IDC published in its survey report earlier in 2015, ‘Unplanned application downtime costs the Fortune 1000 from $1.25 billion to $2.5 billion every year’.
This article discusses the high available considerations for the WebSphere Commerce solution components, with focus on DB2 HADR as the key database layer high availability.
High Availability of the WebSphere Commerce system components
A typical WebSphere Commerce system spans across various tiers: Web server, application servers, search servers, database server. It will talks to several downstream systems like Order Fulfillment, Payment and multitude others for different purposes.
In general, High Availability is achieved by making systems redundant. Each tier of product has its own specific solution to achieve High Availability. For example, at the Web server tier, Load Balancer is commonly used to distribute traffic across a Web server cluster. At the application server tier, WebSphere Application Server federation and clustering is managed by the Network Deployment Manager. You will need to ensure that you know and understand the high availability capability of all the downstream systems to be able to determine the high availability of your commerce system.
The commerce system is most likely to have one active database, and therefore it can be a single point of failure if it is not setup for High Availability Disaster Recovery (HADR) in case of DB2 and similar high available setup for Oracle database. Without HADR, a partial site failure requires restarting the database management server that contains the database. The length of time that it takes to restart the database and the server where it is located is unpredictable. It can take several minutes before the database is brought back to a consistent state and made available. DB2 HADR may be setup such that a standby database can take over in seconds. Further, you can redirect the clients that used the original primary database to the new primary database by using automatic client reroute or retry logic in the application. DB2 HADR feature provides a high availability solution for both partial and complete site failures. HADR protects against data loss by replicating data changes from a source or the primary server, to one or more standby servers. DB2 HADR supports up to three remote standby servers and is available in all DB2 editions.
DB2 High Availability is about ensuring that a database system or other critical server functions remain operational both during planned or unplanned outages, such as maintenance operations, or hardware or network failures. Reduced database down-time enables you to meet strict SLAs with no loss of data during infrastructure failures. DB2 provides database clustering as well as high availability and disaster recovery capabilities designed to maximize data availability during both planned and unplanned events. It also allows you to quickly and easily adapt to changing workloads with minimal involvement from database administrators, and frees application developers from the underlying complexities of database design and architecture. Mobile, online and enterprise applications need continuously available data to keep transactional workflows and analytics operating at maximum efficiency. Any downtime can leave mission-critical databases inaccessible and applications unresponsive. IBM DB2 pure Scale helps change the economics of continuous data availability. DB2 pure Scale is designed for organizations that require high availability, reliability and scalability for online transaction processing (OLTP) to meet stringent SLAs.
Key considerations - Planning for High Availability and Disaster Recovery
- Site location: Is there a single site, two sites, or more, network bandwidth and connectivity between the sites. This will help layout the strategy for choosing whether you want a single site to serve the main traffic and allow the other to be used in case of site failure or if both your sites will handle the traffic in parallel. If the sites are geographically co-located or connected with super network, the sites can share traffic with ease.
- Active-Active or Active Passive: The Web Tier and the application tiers (commerce and search) can be configured to be active-active or active-passive, however the commerce database will be likely to be setup as active-passive. In case the sites do not have the required high-speed connectivity, there is a challenge as they have to talk to the database which is remote to them.
- Single-cell or dual-cell: The commerce and search application servers can be setup in a single cell or dual-cell topology. The dual-cell topology enables for the updates, rollouts and application deployments to be applied with reduced downtime. With the version of 8 WebSphere Commerce, there is going to be more and more focus on zero downtime deployments.
- Failover/Disaster Recovery Capacity: The capacity of the system which will be available to serve in case of a site downtime, is required to be planned. The IBM tech sizing document will make recommendation on the failover capacity, and this data can be used to work out a starting configuration for the percentage. The failover planning can be at 25%, 50%, 75% or 100% or a math in between.
- Search Application server Managed Configuration: WebSphere Commerce v8 has introduced the advanced configuration topology that allows the search server to be in a Managed Configuration which enables the search master, subordinate & repeater servers with templates that can be managed by the deployment manager.
- Database High Availability topology: The database backup needs to be considered based on the database native capabilities and the site location. Taking an example of DB2 as database, and a 2 Site setup which are physically remote to each other and not connected with the required bandwidth. DB2 will have Onsite Hot Standby in its primary site using DB2 NEARSYNC and another Offsite Disaster Recovery which will use SUPERASYNC. The hot standby will be on automatic knob, whereas the offsite would be on a manual control. DB2 High Availability setup has two options, AIX/Linux/OS clustering or native DB2 HADR and the choice must be made based on the governing factors. The bandwidth calculations for HADR would depend on the calculation of peak incremental data per hour.
- DB2’s geographically dispersed DB2® pureScale™ cluster (GDPC) capability: if you are looking for an active-active high available system, consider the pureScale capability where GDPC provides the scalability and application transparency of a regular single-site DB2 pureScale cluster, cross-site. As described in the developerWorks article, ‘this is the active-active system, where the pureScale members at both sites are sharing the workload between them as usual, with workload balancing (WLB) maintaining an optimal level of activity on all members, both within and between sites. This means that the second site is not a standby site, waiting for something to go wrong. Instead, the second site is pulling its weight, returning value for investment even during day-to-day operation.’
- Automatic Client Reroute options on your application: Once you have your database high availability topology agreed upon, you will be leveraging the automatic client re-route, popularly referred to as ACR. ACR can be configured in multiple ways, and allows to choose properties to set parameters like maxRetriesForClientReroute and retryIntervalForClientReroute. In DB2 HADR you can configure ACR facility where client connections are automatically rerouted to an alternate server when the primary server fails. It is the preferred reroute method. ACR can be used between any two servers, not just HADR primary and standby. It is up to the administrator to set up replication to ensure that the two servers have the same data content. Other replication methods, such as CDC, Q-rep can also be used to sync up the two servers. The ACR functionality is entirely separate from HADR.
The High Availability for a commerce production depends upon several factors like availability requirements, planning for high availability, disaster recovery, site planning, integrated systems, network bandwidth, database topology, being a few which will be under consideration.
The goal of this article is to help you think of your key consideration as you plan, prepare, design, implement and test your different tiers for high availability.
Credits: Anbu Ponniah is our most regular reviewer and provides feedback to ensure that we cover points which will be of interest to our readers. Thank You Anbu for valuable insights, always.
About the authors:
Pravin Kedia is Analytics Solution Architect with IBM helping customers on Data Warehouse and Database Solutions. He is passionate about IBM technologies and shares his insights through the blogs on developer works.
Shweta Gupta is a WebSphere Commerce consultant with IBM helping customers with their ecommerce journey. She is passionate about performance of the systems She shares her insights through the blogs on developer Works. Read about her other publications on linkedin