Karen Reed is a certified IT systems engineering professional with experience in a broad range of hardware and software architectures. She has 20 years of experience in planning and implementation support of IBM system monitoring, analytics and automation software on System z, supporting clients in the western USA. Her areas of expertise include analyzing client business needs and designing IT solutions, and project management. She presented IBM software solutions for systems management and automation at SHARE, CMG and IBM Cloud conferences.
Today’s applications are built upon an infrastructure of servers, routers, disk storage, and network components. The complexity of this IT infrastructure has an impact on application availability, increasing the likelihood of an outage.
There are few events that impact a company like having an IT outage, and then finding the incident reported across the internet and newspapers. Customers, employees, and suppliers expect to be able to do business with you around the clock, and from around the world. Maintaining high availability in normal day-to-day operations is fundamental for success. To improve the availability of business operations a risk analysis of critical applications will help identify single points of failure.
A single point of failure (SPOF) exists when a hardware or software component of a system can potentially make an application unavailable to users. Highly available systems tend to avoid a single point of failure by using redundancy in every operation.
Consider the following diagram of and end-to-end application. Users access the application and database servers through the internet, routers and firewalls. Each piece of the path is required to complete a transaction.
Applications such as this, with no redundancy in hardware and network components rely on the durability of each piece of hardware. However, hardware does not run forever and networks can fail due to weather, construction, and load. A risk analysis of the components will identify those most likely to fail, the cost of duplicating each component, and the potential increase in availability. End-to-end applications, common in today’s environment, can have long pathways that traverse many pieces of hardware, network and software. Companies depending on complex IT infrastructure should prepare for small and large disasters.
In every organization, there are critical applications (necessary for daily business), and other applications that while necessary, but can withstand some down time. The focus for improving availability and IT resiliency should be to look at those critical applications and their infrastructure first. Increasing availability by avoiding painful outages helps businesses thrive.