IBM Systems Lab Services

Building a culture of high availability

Share this post:

What customer doesn’t expect the applications they use to work all the time?

What business doesn’t want high availability from its IT systems?

In today’s world, both customers and businesses have high expectations. Customers want to bank and shop when it’s convenient for them, and business leaders want the systems providing these services to be always available. But delivering high availability — even with the latest advancements in technology — can remain elusive, because technology alone can’t provide it.

Yes, the mean time between failures on some IT components can now be measured in decades, but we should never forget the adage that “eventually all hardware will break, and eventually all software will work.” Businesses invest in reliable technology, but some stop there. Their technology reliability expectation becomes their availability plan. We come to depend on those services, expecting them to always be there, so what happens when something does break? Is your business ready for that?

If simply buying better technology isn’t the answer, what is it? It’s simple really. Assume everything can fail, and some of it will. Create an IT culture in which everyone keeps asking (and answering) these questions:

  • What can I do to minimize failures?
  • What will happen if something does fail?
  • How do I minimize the impact when it fails?

High availability requires the right recipe of technology, people and processes built around a culture that not only supports high availability but strives for it. Without this, IT organizations will forever be putting a band aid on their outages.

How to create a culture of high availability

A culture of high availability has to start at the top of the business, which requires organizational objectives that support the goal of achieving near-zero downtime. It means everyone in IT is driven to achieve zero downtime, including architecture design, application development, system administration, operations and so on. An application team that’s driven to roll out new features won’t be focused on exploiting the resiliency in their technology. A system administration team that’s not given maintenance windows won’t be able to keep firmware current. An operations team that only monitors components may not find broken services until the customer calls.

This doesn’t mean everyone in IT owns service availability. There should be a role within the IT organization for that. That owner needs to be proactively focused on building, maintaining and delivering highly available services.

Organizations must make the proper investment to achieve zero downtime. All applications are not created equal, and they don’t necessarily require the same investment, but the business and IT must have a clear understanding of what value each service brings and how to invest in each to deliver what’s expected. Does the business side of your company understand the cost of downtime for the services they deem critical? If not, how can they be sure they’ve made the right investment? The services that the business expects to be “always on” require the right investment.

Continuous improvement

A strong service management framework can help set the stage for continuous improvement; however, truly achieving it will depend on the culture of the organization. Does your organization have objectives that lead to continuous improvement? Do those targets change over time? Are all failures (even those that don’t trigger a service outage) inspected to see what can be improved? Does your business have the right metrics in place to provide a warning when things are headed in the wrong direction before an outage occurs?

Experience has shown that many IT outages can be traced back to process errors. In some cases, the percentage of outages from process errors can be as high as 50 percent. Ignoring, or failing to fix, the process errors or gaps you have experienced can only lead to them reoccurring.

Although there are several service management frameworks, a common one seen in IT today is the Information Technology Infrastructure Library (ITIL). While ITIL may not provide all the answers, adopting a particular service management framework with proper education and strong management support can help speed the creation of the right culture for achieving high availability.

Where to start

One way you can assess where your business stands is to use an independent team to review and assess your technology landscape and service management framework. This can help you identify gaps and single points of failure and determine what actions will close those gaps.

The High Availability Center of Competency (HACoC) in IBM Systems Lab Services is a team built exactly for this purpose. To contact us, please send us an email.

More IBM Systems Lab Services stories

Integrating IBM Cloud Automation Manager, PowerVC and IBM Cloud Private

Cloud computing, IBM Systems Lab Services, Power Systems

It’s evident that a “one-cloud-fits-all” approach doesn’t always work, and the IBM Systems Lab Services team’s work on thousands of IBM client engagements demonstrates this. Organizations are now using multiple clouds and integrating them with existing IT systems to generate more value. To compete successfully in today’s dynamic era of multi-cloud, you need flexibility and more

Residency services: Empowering your IT storage staff

IBM Systems Lab Services, Software-defined storage, Storage

Have you ever been in this situation? You come into a new position in a new company, with new staff, and you find out that no one has enough knowledge to completely manage the primary storage system. What do you do? Hire someone? Train everyone? Hope it runs fine on its own? Well, you could more

Learn DevOps, Ansible, Chef and Puppet skills from IBM

Academic initiatives, DevOps, IBM Systems Lab Services

DevOps is a popular topic in IT today. It can be defined as what is necessary to take an idea (feature, code, documentation or other) from inception through delivery to a customer in the most expedient and sustainable way possible. It’s often used to represent IT practices that help reduce the time and cost involved more