Computing clusters typically consist of servers, workstations and personal computers (PCs) that communicate over a local area network (LAN) or a wide area network (WAN).
Cluster computing is a kind of distributed computing, a type of computing that links computers together on a network to perform a computational task, increase computational power and function as a single computer. Each computer, or “node,” in a computer network has an operating system (OS) and a central processing unit (CPU) core that handles the tasks required for the software to run properly.
Because of its high performance and high availability, cluster computing has many applications, including cloud computing, artificial intelligence (AI), scientific research and big data analytics.
At its most fundamental level, cluster computing uses a LAN to connect multiple, independent computers in a network. In the architecture of the cluster, each computer on the network is referred to as a “node” and is controlled by middleware, software that enables communication between each machine. Users of the cluster can use each computer’s resources as though they were a single machine, rather than individual machines connected via a LAN.
A computing cluster can connect as few as two nodes or as many as thousands. For example, a Beowulf cluster typically uses commercial grade PCs connected via a LAN and can be a relatively affordable alternative to a supercomputer for certain tasks.1
Kubernetes, on the other hand—a container-related, cluster-adjacent technology that’s essential to cloud computing—supports clusters of up to 5,000 separate but connected nodes. Kubernetes is used in many kinds of cloud deployments, including hybrid cloud and multicloud architectures, as well as DevOps and application modernization.
Cluster computing architectures consist of a group of interconnected, individual computers working together as a single machine. Each computing resource is linked via a high-speed connection, such as a LAN, and referred to in the architecture of the system as a single node. Each node has an OS, memory and input and output (I/O) functions.
There are two types of cluster architectures, open or closed. In an open cluster, each computer has its own IP address. In a closed cluster, each node is hidden behind a gateway node. Because the gateway node controls access to the other nodes and IP addresses can be found on the internet, closed clusters are less of a security risk than open clusters.
In addition to cluster computing, there are two other commonly used types of distributed computing that also feature connected networks of computers: grid computing and peer-to-peer computing.
Grid computing: In computer science, a grid computing infrastructure is set up to combine compute resources that are in different physical locations. The available compute resources from the different machines are combined and used together to solve a problem. Like clustering, grid computing uses the resources of multiple, interconnected computers.
However, unlike clustering, only the unused resources on the computers connected via grid architecture are utilized. SETI, the Search for Extraterrestrial Intelligence, was a famous example of grid computing, where the unused compute resources from many computers were used to analyze radio signals from deep space for signs of extraterrestrial life.2
Peer-to-peer computing: Peer-to-peer (P2P) computing, or networking, requires two or more computers to be connected as “peers” on a network, meaning they have equal power and permissions. Unlike cluster computing, a P2P architecture doesn’t require a centralized management approach.
On a P2P network, each node acts as both a client machine (a computer that needs access to a service) and a server (a computer that provides a service). Every peer node makes resources available to others on the network, including storage, memory, bandwidth and more.
Cluster computing was invented in the 1960s as a method of distributing compute tasks and data storage across multiple computers. In the 1980s, developments in several adjacent technologies—including PCs, versatile processors and LANs—had significant implications for cluster computing.
Perhaps the biggest was the use of multi-processor computing nodes in high-performance computing (HPC). As HPC use cases grew, so did use cases for cluster computing. Today, those uses include the automotive and aeronautics industry, data analysis from satellites and telescopes and the diagnosis of dangerous diseases, among others.
Today, cluster computing is used in many of the most advanced technologies pushing our world forward, such as artificial intelligence (AI), machine learning (ML) and cloud computing. The largest businesses in the world use cluster computing to move workloads to the cloud, increase processing speed, improve data integrity and more. At the enterprise level, computer clusters are often given a specific task—for example, load balancing, high availability or the large-scale processing of data in a data center.
Compute clusters are primarily designed to be higher performing and more reliable than other kinds of compute architectures, making them indispensable to the modern enterprise. For example, modern clusters have built-in fault tolerance, a term that refers to their capacity to continue to function even when a single node in a network fails.
Also, large computer clusters rely on distributed file systems (DFS) and a redundant array of independent disks (RAID) allowing for the same data to be stored in different locations on multiple hard disk drives. Cluster computing benefits the modern enterprise in many ways; here are some examples:
Because of their reliance on parallelism, computer clusters are considered high performing and can typically process data faster and handle larger workloads than a single computer.
Cluster computing is considered highly reliable because of its incorporation of DFS and RAID technologies. In a computer cluster, even if a single node fails, the network continues to function, and DFS and RAID will continue to ensure that data is backed up in multiple places.
In addition to being highly reliable, cluster computing is also considered highly available due to its ability to recover quickly from the failure of a single node. If a cluster is working properly, when one node fails, its work is seamlessly transferred to another in the cluster without a disruption in service.
Cluster computing is highly scalable because cluster nodes can be added at any time to increase performance. The ability to dynamically adjust resources within a cluster means that the cluster can scale up or down, depending on demand.
Cluster computing is more cost effective than other kinds of computing. Many modern enterprises rely on cluster computing to improve their IT infrastructure’s performance, scalability and availability at an affordable price.
Compute clusters vary greatly in both complexity and purpose. Relatively simple, dual-node clusters, for example, connect only a couple of computers, while the Aurora supercomputer, on the other hand, connects over 10,000.3
Clusters have many business use cases due to their high performance, scalability and flexibility, but they are also used by universities and medical schools for scientific research. Based on their characteristics, compute clusters are broken down into three different types—high availability, load balancing and high performance.
High-availability clusters automatically transfer tasks from a node that’s failed unexpectedly to another node on the network that’s still functioning. This ability to quickly and easily adapt when one node fails makes them ideal for workloads where avoiding service interruption is critical.
Load-balancing clusters, or simply load balancers, ensure that work is distributed fairly across nodes in a cluster. Without load balancing, nodes would get overwhelmed by the tasks they were assigned and fail more frequently. There are different kinds of load balancing for different purposes. One of the most famous, the Linux Virtual Server, is free and open source and is used to develop high-performance, highly available servers based on clustering technology.
HPC clusters are a network of powerful processors that can process massive multidimensional data sets, also known as big data, at an extremely high speed. They require high-powered networks and ultralow latency to move files between nodes.
Unlike load balancing and high availability clusters, HPC clusters have more processing power and are specially designed for data analysis, such as diagnosing diseases, analyzing vast amounts of financial data, and sequencing genomes. Also, HPC clusters use Message Passing Interface (MPI), a protocol for parallel computing architectures that enables communication between nodes.
AI clusters are computing clusters specially built for AI and ML workloads, such as facial and speech recognition, natural language processing (NLP) and autonomous driving. AI clusters are specially designed for the algorithms that AI models train on.
Computing clusters have a wide range of uses. From enterprise applications like cloud computing and data analytics, to software that helps create eye-popping 3D special effects for the movies, here are some examples:
Cluster computing can process large volumes of data swiftly and efficiently, making it an ideal tool for big data analysis. From Google’s powerful search engine, to software that analyzes the stock market, to sentiment analysis on social media, cluster computing’s applications in the data analytics space are great and varied.
Cluster computing’s parallel processing capabilities power the most advanced graphics in video games and film. Using a cluster of independent nodes, each with its own graphical processing unit (GPU), cluster computing rendering (or cluster rendering) generates a calibrated, single image across multiple screens. This process considerably lessens the amount of time that it takes to build high-quality 3D images.
In AI and ML workloads, cluster computing helps process and analyze vast datasets quickly and accurately, an essential part of the training of powerful AI models that power popular applications such as ChatGPT.
Insurance companies and financial trading firms use cluster computing to analyze data and quantify the risks of buying certain stocks or providing insurance to certain customers. Cluster computing is used to analyze big data sets and extract meaningful insights that can be used to make more informed business decisions.
1 “Beowulf clusters make supercomputing more accessible”, NASA, 2020.
2 “SETI computing project paused after 20 years”, The Guardian, March 5, 2020.
3 "Intel says its supercomputer broke the exascale barrier”, TechRadar, May 24, 2024.
Explore IBM’s roadmap for creating an open quantum software ecosystem. Discover how quantum technologies are poised to reshape industries and accelerate innovation with frictionless quantum development.
Explore the vast potential of supercomputing, where massive computational power meets real-world applications. Learn how supercomputers drive breakthroughs in AI, scientific research and large-scale simulations.
Learn how cloud solutions revolutionize high performance computing (HPC) with enhanced agility, flexibility and security. Discover real-world examples of how businesses are using IBM Cloud HPC to tackle complex workloads while staying scalable and cost-effective.
Explore how the convergence of high performance computing (HPC) and artificial intelligence (AI) is driving industry breakthroughs. Learn about real-world use cases.
IBM Spectrum LSF Suites is a workload management platform and job scheduler for distributed high performance computing (HPC).
Hybrid cloud HPC solutions from IBM help tackle large-scale, compute-intensive challenges and speed time to insight.
Find the right cloud infrastructure solution for your business needs and scale resources on demand.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com