IBM Support

Basic guide for Troubleshooting Cluster Failover issues

How To


Summary

In a Windows Server 2012 R2 cluster, unstable network connectivity between cluster nodes or between nodes and the File Share Witness (FSW) host can lead to quorum loss. This may cause the Cluster Service to unexpectedly failover between nodes (e.g., from Node A to Node B) or shut down clustered resources entirely. The FSW is a quorum resource, and its failure to come online — due to network issues or share unavailability — can prevent the cluster from maintaining quorum.

Objective

How to troubleshoot the cluster service when it's frequently switching between nodes, disrupting high-availability and potentially leading to downtime or degraded performance.

Environment

Windows server 2012R2 and newer

Steps

  1. Review Logs:
    • Check system, application, and cluster logs for errors like “Remote endpoint unreachable,” “quorum loss,” or “File Share Witness failed.”
    • Note timestamps to correlate with failover events.
  2. Run Validation Tools:
    • Use the Validate a Configuration wizard in Failover Cluster Manager to check network, storage, and node health.
    • Look for warnings about network adapters or witness access.
  3. Update Drivers:
    • Ensure network card drivers and hypervisor agents (for VMs) are current on all nodes and the witness host.
  4. Check Antivirus:
    • Verify antivirus software isn’t blocking port 3343. Temporarily disable it for testing (in a non-production environment).
  5. Inspect Network Infrastructure:
    • Confirm stable connectivity between nodes and the witness host.
    • Check for firewall blocks or issues in switches/hubs.
  6. Monitor Packet Loss:
    • Use Performance Monitor to track “Network Interface\Packets Received Discarded” on nodes and the witness host.
    • Adjust network settings if high packet drops are detected.

Additional Information

-For additional information, you may search online for the below MS articles:

"Failover Cluster Troubleshooting"

"Unexpected cluster failover troubleshooting guidance"

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSTIPK","label":"Microsoft Windows"},"ARM Category":[{"code":"a8mKe000000004NIAQ","label":"Windows"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Document Information

Modified date:
29 August 2025

UID

ibm17243573