When “good enough” isn’t good enough – Building a hybrid multicloud architecture for mission-critical workloads

By ibmblogsMay 22, 2023

We’ve all heard the buzz about hyperscalers, and for good reason. A hyperscale cloud environment offers a low-cost entry point with out-of-the-box availability and scalability. While not the ideal solution for every company and every problem, it’s certainly good enough for businesses looking for a fast way to create and scale an application.

But ‘good enough’ isn’t good enough for government organizations looking to migrate their mission-critical workloads. From border services to policing to revenue and tax collection, the systems that protect and serve citizens cannot fail. Errors, outages, downtime, and data breaches are simply not an option.

Running mission-critical applications on a hybrid multi-cloud infrastructure allows government departments to access and modernize mission-critical applications and data, and respond quickly and securely to changing citizen needs.

However, organizations must ensure the design of their hybrid multicloud architecture supports the requirements of mission-critical workloads. For example, does it deliver the availability and performance needed? Does it provide the security and regulatory compliance required?

An architecture that is greater than the sum of its parts

In a traditional infrastructure, individual units – such as hardware, network nodes and storage – form the building blocks of the operating environment. Each unit comes with its own standards of reliability, availability, and resiliency.

As government organizations adopt a hybrid multicloud environment, the principles of resiliency, availability and security remain crucially important. Yet, individual component parts in the cloud environment are not guaranteed to provide these operational characteristics.

So, how can you achieve the high availability and resiliency required? What are the key considerations in building a hybrid multi-cloud architecture that is greater than the sum of its parts?

That’s where a “fit-for-purpose” hybrid multicloud architecture comes into play. To create the right architecture, organizations must clearly understand the operational characteristics they require. To achieve high resiliency and high availability, for example, the design must take advantage of the newer software capabilities of the hybrid multicloud environment, such as clustering and redundancy management.

For example, if an individual server provides 90% availability, clustering servers together with the right architecture could provide 99.99 percent availability. If one server fails, the business is still running on the other servers while the problematic one is taken offline for recovery. All of this is transparent to the user when the latest architecture and technology is applied.

Bottom line? The hybrid multicloud architecture ensures that the sum of the building blocks provide the breadth and depth of operational characteristics needed.

Operational readiness begins with collaboration

Through step-by-step collaboration, IBM works with government clients to design a hybrid multicloud architecture that far exceeds “good enough.” We assess the business implications of downtime, create a solid set of operational requirements, and then conduct a failure analysis which evaluates readiness in three key areas: technology, processes, and people skills.

Often, the business implication – and the cost – of the failure of a given business process is not well understood. Let’s take an example. An IT leader may be looking for 99.99 percent availability, which translates to just 54 seconds a year of allowed downtime per year.

Leveraging IBM Technology best practices, we may ask things like:

Can the organization survive the failure scenarios? If yes, for how long?
Does the downtime create security risks?
What citizen services would be affected, and how? For example, will citizens still receive their government cheques on time? Would trucks importing fresh fruits and vegetables be stalled at the border?
Does the availability requirement apply to all transactions, or just some?
If there is catastrophic failure at one site, will the second site offer the availability required? If so, at what cost?

Counting the cost of 99.99% uptime

A key consideration for clients is understanding the cost associated with 99.99% availability. Typically, each additional .001 percent of availability after 99% can exponentially increase costs by millions of dollars, because it affects the three key elements of technology, processes and people skills.

Consider a scenario. If a solution in one geographical region is on the cloud, we ask customers to look at the statistics on their cloud provider to see how many times they failed in that region. Historically, the chosen provider may have failed three times and taken 72 hours to recover in each of those failure scenarios.

Clearly, this scenario would not meet their uptime requirements and a second or a third region may be needed. However, this easily doubles or triples the cost because now automation technology and processes are needed to automatically transfer the workload and data to the other region(s). This involves integrating the latest redundancy, clustering and availability software into the hybrid cloud environment.

In addition, no human being can perform that recovery in 54 seconds on a keyboard. Only skilled programmers create that type of automation. Therefore, skills transformation is now required.

Helping you make informed choices

Through our collaborative process, clients can better understand the operational requirements and their associated cost. This gives them the tools needed to make an informed decision about whether the cost can be justified against the consequences of downtime. Clients may reconsider the 99.99% uptime if the cost is millions more than they anticipated – or they may decide to go ahead with it. The choice is theirs.

One thing is certain: When looking for a cloud solution for mission-critical workloads, governments and other regulated enterprises cannot compromise on data security and regulatory compliance. Purposeful collaboration can help you determine the operational requirements needed to design the unique hybrid multi-cloud architecture for your organization.

In our next blog in this series, IBM Canada’s Client Engineering Leader Michelle Zulauf will continue the collaboration conversation with a look at how we co-create with clients. In the meantime, I welcome your feedback!