Systems

Correlated system failures occur when an event causes many systems in physical proximity to fail.

Servers that are part of the same storage system are often kept in close physical proximity: within the same rack in the same site. It can lead to correlated unavailability, unreachability, or destruction. Events causing correlated system failures include the following items:
  • Environmental (heating or cooling) failures
  • Network connectivity loss in data center
  • Switch or router failure in a rack
  • Power supply failure in a rack
  • Site destruction (fire, earthquake, flood)

While system deployments can exist entirely within one site or at one physical site, it increases the chance of unavailability or data loss, since a network failure at a site could make the data unavailable. Multiple power outages within a site can also cause loss of recently written data. Methods for mitigating these risks through an appropriate system design are discussed in Avoiding and mitigating failures.

In a system using Concentrated Dispersal, the failure of a single device may have greater impact because a device may contain more than one slice.