High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
Highly available systems must be able to withstand outages, including scheduled downtime and site-wide disasters. Typically, HA systems meet two characteristics:
With the growth of digital transformation initiatives and the subsequent move of many services to the cloud, high availability solutions are now offered by many tech and software as a service (SaaS) companies, including Microsoft, Amazon (AWS), IBM®, Red Hat® and more.
High availability of IT systems is particularly important in industries where critical applications rely on having little or no system downtime. For example, in hospitals and data centers, users depend on high availability solutions to perform many routine, daily functions. If users can’t access a system for any reason, it is deemed ‘unavailable.’ The period of time that a system is unavailable to users is known as downtime.
Disaster recovery (DR) consists of IT infrastructure technologies and best practices designed to prevent or minimize data loss and business continuity disruption resulting from catastrophic events. High availability (HA), on the other hand, typically concerns smaller failures or faults that might impact a systems’ availability.
Even though they are different, DR and HA both share the goal of minimizing disruption to IT systems, and both typically employ redundant components and redundant systems as part of an overall strategy. Also, both DR and HA use data backups to make data available in case of a wide range of problems, including hardware failures, software failures and power outages.
Fault tolerance is a system’s ability to operate continuously after one or more of its critical components fail. Like HA, fault tolerance can help make a system available during or after a disruptive event.
However, where fault tolerance and HA differ is in the way they treat downtime. While HA seeks to have as little downtime as possible, the goal of fault tolerance is zero downtime, a goal it can only achieve through redundancy, having a backup or secondary copy of every single component in the infrastructure.
With enterprises relying more than ever on online services and cloud and hybrid cloud architectures to deliver critical applications and services, infrastructure demands are rising, making high availability a priority. Here are some of the most common enterprise benefits of highly available systems.
With digital transformation a key objective for most companies, high availability of systems is critical to giving employees and customers unlimited access to critical applications1.
System failures that cause hours or even minutes of downtime can cause public relations nightmares for enterprises across a broad range of industries, including SaaS, aviation and mobile technology2. High availability infrastructure ensures a brand’s reputations doesn’t suffer due to an outage or unexpected downtime.
Managed Service Providers (MSPs) must deliver high availability of networks or risk not fulfilling their service level agreements (SLAs). HA systems help MSPs deliver networks that their most valuable clients can depend on, like one that helps autonomous vehicles drive safely or a healthcare facility manage patient records.
Whether seeking to achieve zero downtime in an industry like healthcare or finance, or simply looking for ways to avoid reputational damage from outages, businesses looking for high availability typically follow a 4-step process.
Many HA systems use load balancing, the process of distributing traffic among multiple servers to optimize application availability. For example, with a high-traffic website or cloud service, a system receives millions of user requests every day. Load balancing ensures that applications can deliver content from web servers to users promptly and without interruption. Load balancing, especially the use of many load balancers at once, can help guarantee no single component in a system is overwhelmed leading to a single point of failure that might cause downtime or an outage.
Redundancy—having a secondary or backup component available to take over when a primary one fails—is an important part of a high availability system. Redundancy enables databases to remain available to users and applications even when a component isn’t functioning. If a component in a system is not redundant, that component would be considered a single point of failure, as losing it might potentially stop the whole system from working.
High availability clusters, also known as high availability clustering, are groups of connected machines that work together as a single system. When one machine in a cluster fails, cluster managing software transfers their workloads onto another machine. Within a high availability cluster, shared storage between each node (computer) ensures zero data loss if a single node stops functioning.
High availability is measured in relation to a system being 100% operational, or never having a single outage. While no system can be 100% operational, setting that as a goal helps in measuring how available a system is over a period. The most common metric for high-availability systems and services is something called five nines availability.
Five nines availability means that a system can run and perform 99.999% of the time. Typically, only systems in highly critical industries, such as healthcare, transportation, finance or government, require five nines availability. These systems are important to people’s lives, access to food and shelter and economic well-being.
Systems that don’t operate in these highly critical industries typically don’t require as much operational availability and can make do with ‘‘three or four nines’’ (99.9% or 99.99%) availability. Another way this is frequently described is to say a highly available system has “99.9/99.999% uptime.”
In addition to five nines availability, IT systems managers use several other key metrics to measure how available their systems are:
As organizations across many industries undertake broad digital transformation initiatives, the availability demands on their infrastructure are increasing. Remote work and the spread of 5G networks have made it normal for users to expect to be able access data and applications from anywhere at any time. But only if the underlying systems powering the applications and regulating access to the data are available. Here are some examples of highly available systems that help modern enterprises thrive:
Gone are the days when a doctor flipped through files in a cabinet to find the date of your last vaccination. Today, if you show up at the emergency room or specialists office, it’s almost certain your doctor will access your records online. Because of the critical and private nature of this kind of information, EHR are an example of a highly available system that can securely deliver accurate information within seconds with close to zero downtime.
Driverless, or autonomous, vehicles, such as cars, drones and others, rely on fast, powerful internet connections so the artificial intelligence (AI) that controls them can function. When an autonomous vehicle pulls up to a stop light, for example, tens of thousands of pieces of data are being processed in near real time so that it stops at the light where it is supposed to and proceeds on to its destination. High availability is crucial to the safe operation of autonomous vehicles of all kinds
The Internet of Things (IoT) is a network of physical devices, vehicles, appliances and other objects that are embedded with sensors connected to the internet that allow them to collect and share data. As the IoT ecosystem expands into roads, waterways, home appliances, weather monitoring and more, millions and millions of devices are relying on networks. High availability helps ensure that networks supporting IoT devices run smoothly and without interruption.
As enterprises find more ways to use the massive amounts of data they generate in the digital age, high availability is essential to efficient, effective data processing. Data centers and complex analytics platforms perform continuous data processing and real-time analysis and downtime can set back projects by months. HA solutions help enterprises have 24/7/365 access to their most important data.
Discover how a hybrid cloud infrastructure can power your AI strategy. Learn from IBM experts how to transform existing technology into an agile, AI-ready system, driving innovation and efficiency across your business operations.
Explore how hybrid cloud solutions can optimize your AI-driven business operations. Learn from case studies and featured solutions to see how companies are using IBM’s hybrid cloud to achieve greater efficiency, scalability and security.
Learn about the key differences between infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS). Explore how each cloud model provides varying levels of control, scalability and management to meet different business needs.
Discover the hidden costs of scaling generative AI and learn from experts how to make your AI investments more efficient and impactful.
Learn the fundamentals of IT management, including why it's critical for modern organizations and key features that ensure smooth, efficient operations across technology systems.
Discover a range of tutorials and resources to help you manage and support IT infrastructure, from server management to cloud integration, storage systems and network security.
IBM Cloud Infrastructure Center is an OpenStack-compatible software platform for managing the infrastructure of private clouds on IBM zSystems and IBM LinuxONE.
Discover servers, storage and software designed for your enterprise hybrid cloud and AI strategy.
Find the right cloud infrastructure solution for your business needs and scale resources on demand.
1. "Gartner says 89% of Board Directors Say Digital is Embedded in All Business Growth Strategies , Gartner, October 19 2022
2. "The Global IT Outage Provides Several Crisis Management Lessons , Forbes, July 19 2024