Karen Reed is a certified IT systems engineering professional with experience in a broad range of hardware and software architectures. She has 20 years of experience in planning and implementation support of IBM system monitoring, analytics and automation software on System z, supporting clients in the western USA. Her areas of expertise include analyzing client business needs and designing IT solutions, and project management. She presented IBM software solutions for systems management and automation at SHARE, CMG and IBM Cloud conferences.
Stage 1 - Denial
The first time a user calls to report an application is not working, we wonder if they made a mistake… went to the wrong website, or entered the wrong password or access information? Surely the application server is up and running, right? We have them try again, then restart the browser, or reboot their system, until the easy options are exhausted. The modern IT infrastructure is general reliable, so denial of an system outage is our first reaction.
Stage 2 - Anger
Wasting time on failed attempts to access a favorite application, or get our work done is frustrating and annoying. Why do things always break when we need them the most? In this day and age of mobile devices, widespread Wi-Fi networks, and mobile apps., people are constantly using the internet. An outage that prevents us from using those systems can be frustrating and make us angry.
Worse yet is the impact on businesses when customers and employees cannot access their applications. Maintaining high availability in day-to-day IT operations is fundamental to success of modern businesses. An outage can have an immediate impact on today’s revenue and a negative effect on a company’s image as news of outages quickly spreads across internet.
Stage 3 - Bargaining
An outage of corporate systems during the work day disrupts our work plans, impacts productivity, and causes us to miss deadlines. Users contact the IT department periodically, or keep retrying applications; adding more load to an already stressed network. Wishing for the systems to return to us or pleading with the heavens for resolution never helps, but at times desperation makes even the most logical thinker want to try bargaining.
End-to-end applications have many hardware, software and networking components, increasing the complexity of implementing high availability. System z provides solutions for hosting end-to-end applications on Linux and z/OS. System z’s reliability can improve availability of applications whether the hardware is in one location or spread across distances, through a Geographically Dispersed Parallel Sysplex (GDPS)
Stage 4 – Quick Fix
When we are in the middle of an application outage, it is a high priority event. A quick fix appears to be just as good as long term resolution, because everyone just wants the problem to go away. And once the systems are returned to normal, we might forget about the cause of the outage. However, after an outage is a good time to focus on improving the availability of critical applications. Analyzing single points of failure and taking action to eliminate them will prevent future outages and make applications more highly available.
Stage 5 – Acceptance of your need for High Availability
System z offers a highly available environment where applications can run on redundant Linux systems and/or z/OS systems. Complex end-to-end applications residing on a System z platform can take advantage of the native Linux HA functions and the additional HA capabilities of GDPS.