My IBM

What is cluster computing?

11 July 2024

Authors

Author, IBM Think

Senior Editorial Strategist

What is cluster computing?

Cluster computing is a type of computing where multiple computers are connected so they work together as a single system. The term “cluster” refers to the network of linked computer systems programmed to perform the same task.

Computing clusters typically consist of servers, workstations and personal computers (PCs) that communicate over a local area network (LAN) or a wide area network (WAN).

Cluster computing is a kind of distributed computing, a type of computing that links computers together on a network to perform a computational task, increase computational power and function as a single computer. Each computer or “node,” in a computer network has an operating system (OS) and a central processing unit (CPU) core that handles the tasks required for the software to run properly.

Because of its high performance and high availability, cluster computing has many applications, including cloud computing, artificial intelligence (AI), scientific research and big data analytics.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

How does cluster computing work?

At its most fundamental level, cluster computing uses a LAN to connect multiple, independent computers in a network. In the architecture of the cluster, each computer on the network is referred to as a “node” and is controlled by middleware, software that enables communication between each machine. Users of the cluster can use each computer’s resources as though they were a single machine, rather than individual machines connected via a LAN.

A computing cluster can connect as few as two nodes or as many as thousands. For example, a Beowulf cluster typically uses commercial grade PCs connected via a LAN and can be a relatively affordable alternative to a supercomputer for certain tasks.¹

Kubernetes, on the other hand—a container-related, cluster-adjacent technology that’s essential to cloud computing—supports clusters of up to 5,000 separate but connected nodes. Kubernetes is used in many kinds of cloud deployments, including hybrid cloud and multicloud architectures, as well as DevOps and application modernization.

Cluster computing architecture

Cluster computing architectures consist of a group of interconnected, individual computers working together as a single machine. Each computing resource is linked via a high-speed connection—such as a LAN—and referred to in the architecture of the system as a single node. Each node has an OS, memory and input and output (I/O) functions.

There are two types of cluster architectures, open or closed. In an open cluster, each computer has its own IP address. In a closed cluster, each node is hidden behind a gateway node. Because the gateway node controls access to the other nodes and IP addresses can be found on the internet, closed clusters are less of a security risk than open clusters.

Other kinds of distributed computing

In addition to cluster computing, there are two other commonly used types of distributed computing that also feature connected networks of computers: grid computing and peer-to-peer computing.

Grid computing: In computer science, a grid computing infrastructure is set up to combine compute resources that are in different physical locations. The available compute resources from the different machines are combined and used together to solve a problem. Like clustering, grid computing uses the resources of multiple, interconnected computers.

However, unlike clustering, only the unused resources on the computers connected via grid architecture are utilized. SETI, the Search for Extraterrestrial Intelligence, was a famous example of grid computing, where the unused compute resources from many computers were used to analyze radio signals from deep space for signs of extraterrestrial life.²

Peer-to-peer computing: Peer-to-peer (P2P) computing or networking, requires two or more computers to be connected as “peers” on a network, meaning they have equal power and permissions. Unlike cluster computing, a P2P architecture doesn’t require a centralized management approach.

On a P2P network, each node acts as both a client machine (a computer that needs access to a service) and a server (a computer that provides a service). Every peer node makes resources available to others on the network, including storage, memory, bandwidth and more.

Adjacent technologies

Cluster computing was invented in the 1960s as a method of distributing compute tasks and data storage across multiple computers. In the 1980s, developments in several adjacent technologies—including PCs, versatile processors and LANs—had significant implications for cluster computing.

Perhaps the biggest was the use of multi-processor computing nodes in high-performance computing (HPC). As HPC use cases grew, so did use cases for cluster computing. Today, those uses include the automotive and aeronautics industry, data analysis from satellites and telescopes and the diagnosis of dangerous diseases, among others.

Today, cluster computing is used in many of the most advanced technologies pushing our world forward, such as artificial intelligence (AI), machine learning (ML) and cloud computing. The largest businesses in the world use cluster computing to move workloads to the cloud, increase processing speed, improve data integrity and more. At the enterprise level, computer clusters are often given a specific task—for example, load balancing, high availability or the large-scale processing of data in a data center.

High Performance Computing

4th Gen Intel® Xeon® Scalable processors on IBM Cloud

Learn about Intel and IBM Cloud's commitment to the next generation of microarchitecture for the cloud industry.

Explore Intel® solutions on IBM Cloud®

Enterprise benefits of cluster computing

Compute clusters are primarily designed to be higher performing and more reliable than other kinds of compute architectures, making them indispensable to the modern enterprise. For example, modern clusters have built-in fault tolerance, a term that refers to their capacity to continue to function even when a single node in a network fails.

Also, large computer clusters rely on distributed file systems (DFS) and a redundant array of independent disks (RAID) allowing for the same data to be stored in different locations on multiple hard disk drives. Cluster computing benefits the modern enterprise in many ways; here are some examples:

Performance

Because of their reliance on parallelism, computer clusters are considered high performing and can typically process data faster and handle larger workloads than a single computer.

Reliability

Cluster computing is considered highly reliable because of its incorporation of DFS and RAID technologies. In a computer cluster, even if a single node fails, the network continues to function and DFS and RAID will continue to ensure that data is backed up in multiple places.

Availability

In addition to being highly reliable, cluster computing is also considered highly available due to its ability to recover quickly from the failure of a single node. If a cluster is working properly, when one node fails, its work is seamlessly transferred to another in the cluster without a disruption in service.

Scalability

Cluster computing is highly scalable because cluster nodes can be added at any time to increase performance. The ability to dynamically adjust resources within a cluster means that the cluster can scale up or down, depending on demand.

Cost

Cluster computing is more cost effective than other kinds of computing. Many modern enterprises rely on cluster computing to improve their IT infrastructure’s performance, scalability and availability at an affordable price.

Types of clusters

Compute clusters vary greatly in both complexity and purpose. Relatively simple, dual-node clusters, for example, connect only a couple of computers, while the Aurora supercomputer, on the other hand, connects over 10,000.³

Clusters have many business use cases due to their high performance, scalability and flexibility, but they are also used by universities and medical schools for scientific research. Based on their characteristics, compute clusters are broken down into three different types—high availability, load balancing and high performance.

High-availability clusters

High-availability clusters automatically transfer tasks from a node that’s failed unexpectedly to another node on the network that’s still functioning. This ability to quickly and easily adapt when one node fails makes them ideal for workloads where avoiding service interruption is critical.

Load-balancing clusters

Load-balancing clusters or simply load balancers, ensure that work is distributed fairly across nodes in a cluster. Without load balancing, nodes would get overwhelmed by the tasks they were assigned and fail more frequently. There are different kinds of load balancing for different purposes. One of the most famous, the Linux Virtual Server, is free and open source and is used to develop high-performance, highly available servers based on clustering technology.

High-performance computing clusters

HPC clusters are a network of powerful processors that can process massive multidimensional data sets, also known as big data, at an extremely high speed. They require high-powered networks and ultralow latency to move files between nodes.

Unlike load balancing and high availability clusters, HPC clusters have more processing power and are specially designed for data analysis, such as diagnosing diseases, analyzing vast amounts of financial data and sequencing genomes. Also, HPC clusters use Message Passing Interface (MPI), a protocol for parallel computing architectures that enables communication between nodes.

Artificial intelligence clusters

AI clusters are computing clusters specially built for AI and ML workloads, such as facial and speech recognition, natural language processing (NLP) and autonomous driving. AI clusters are specially designed for the algorithms that AI models train on.

Cluster computing use cases

Computing clusters have a wide range of uses. From enterprise applications like cloud computing and data analytics, to software that helps create eye-popping 3D special effects for the movies, here are some examples:

Big data analytics

Cluster computing can process large volumes of data swiftly and efficiently, making it an ideal tool for big data analysis. From Google’s powerful search engine, to software that analyzes the stock market, to sentiment analysis on social media, cluster computing’s applications in the data analytics space are great and varied.

3D graphics

Cluster computing’s parallel processing capabilities power the most advanced graphics in video games and film. Using a cluster of independent nodes, each with its own graphical processing unit (GPU), cluster computing rendering (or cluster rendering) generates a calibrated, single image across multiple screens. This process considerably lessens the amount of time that it takes to build high-quality 3D images.

Artificial intelligence and machine learning

In AI and ML workloads, cluster computing helps process and analyze vast datasets quickly and accurately, an essential part of the training of powerful AI models that power popular applications such as ChatGPT.

Risk analysis

Insurance companies and financial trading firms use cluster computing to analyze data and quantify the risks of buying certain stocks or providing insurance to certain customers. Cluster computing is used to analyze big data sets and extract meaningful insights that can be used to make more informed business decisions.

Maximize hybrid cloud value in the generative AI era

Only 1 in 4 enterprises achieve a solid ROI from cloud transformation efforts. Learn how to amplify hybrid cloud and AI value across business needs.

Resources

Why Complex Workloads Still Struggle to Leave the Data Center

IDC’s insights on how enterprises are modernizing high-performance environments and managing scale.

Is Your Storage Ready for HPC? See How Top Vendors Compare

Gartner ranks storage platforms powering today’s high-performance, data-intensive compute environments.

AI Meets Infrastructure: What’s Next in High Performance Computing

Explore Gartner’s top technology trends — including agentic AI and how it impacts compute at scale.

Maximize hybrid cloud value in the generative AI era

Only 1 in 4 enterprises achieve a solid ROI from cloud transformation efforts. Learn how to amplify hybrid cloud and AI value across business needs.

Boosting chip design with IBM Cloud HPC

Explore how Cadence uses IBM Cloud HPC to enhance chip and system design, delivering faster, more efficient solutions.

What is parallel computing?

Learn how parallel computing revolutionizes data processing, delivering faster results for complex tasks and driving enterprise growth.

What is cluster computing?

Authors

What is cluster computing?

The latest tech news, backed by expert insights

Thank you! You are subscribed.

How does cluster computing work?

Cluster computing architecture

Other kinds of distributed computing

Adjacent technologies

4th Gen Intel® Xeon® Scalable processors on IBM Cloud

Enterprise benefits of cluster computing

Performance

Reliability

Availability

Scalability

Cost

Types of clusters

High-availability clusters

Load-balancing clusters

High-performance computing clusters

Artificial intelligence clusters

Cluster computing use cases

Resources

Related solutions