Deploying Vault on OpenShift Container Platform
Deploy a production-ready HashiCorp Vault on OpenShift Container Platform (OCP) using Vault Enterprise s390x images with integrated Raft storage, high availability, and disaster recovery replication.
Deployment architecture
This procedure deploys a Vault Enterprise cluster with 3 Vault pods, each running on a separate OpenShift compute node. This ensures high availability and fault tolerance by preventing a single node failure from impacting multiple Vault instances.
Pod anti-affinity is configured to ensure that OpenShift schedules each Vault pod on a different node, eliminating single points of failure and improving cluster resilience.
Prerequisites
Before you begin, ensure you have the following:
- OCP 4.22 or later (Kubernetes 1.35) with
PodTopologyLabelsadmission controller. - Network TCP ports 8200 and 8201 open between primary and DR clusters
- Kubernetes secret containing your Vault Enterprise license
- For production deployments, TLS encryption is mandatory. Create a Kubernetes secret containing your CA certificate, TLS certificate, and private key.
- Priority class created to ensure Vault pods are treated as high priority and are less likely to be evicted during node resource pressure.
Procedure
The primary Vault cluster is production-ready.
To set up disaster recovery, see Configuring disaster recovery replication.
For performance replication across multiple regions, see Configuring performance replication.
Vault Disaster Recovery (DR) on OpenShift
A Vault cluster can perform DR replication to any Vault cluster on any platform. This procedure focuses on deployment of a Vault DR cluster that runs on Red Hat OpenShift on IBM Z and LinuxONE. It follows HashiCorp Validated Design (HVD) recommendations and highlights key networking, security, topology, and operational considerations specific to Red Hat OpenShift Container Platform (OCP).
Prerequisites
Before you begin, ensure the following prerequisites are met:
- The DR cluster must run the same version of Vault as the primary cluster.
- Network connectivity between primary and secondary clusters on ports 8200 (API) and 8201 (replication).
- DNS or IP resolution configured between clusters.
LoadBalancerservice configured for both API and cluster ports.- TLS certificates properly configured for end-to-end encryption.
Procedure
DR replication is now configured between the primary and secondary Vault clusters. The secondary cluster maintains a near-real-time copy of the primary cluster's data and can be promoted to active status, if needed.
Key design consideration
For DR replication, do not use an OpenShift Route for port 8201.
| Traffic type | Port | Allowed exposure |
|---|---|---|
| API traffic | 8200 | Route or LoadBalancer |
| Replication traffic | 8201 | LoadBalancer only |
Reasons for using LoadBalancer for replication traffic:
- Vault replication uses
gRPCoverHTTP/2on port8201. - Replication requires end-to-end TCP passthrough.
- OpenShift Routes (HAProxy) may break
HTTP/2unless carefully configured. - Routes operate on port
443, requiring port mapping(443 to 8201).
LoadBalancer service for Vault:
Create a LoadBalancer service (Layer 4 TCP) to expose both API and cluster
ports:
apiVersion: v1
kind: Service
metadata:
name: vault-active-lb
namespace: vault
spec:
type: LoadBalancer
selector:
app.kubernetes.io/name: vault
vault-active: "true"
ports:
- name: api
port: 8200
targetPort: 8200
protocol: TCP
- name: cluster
port: 8201
targetPort: 8201
protocol: TCP
The LoadBalancer must allow TCP passthrough. Any provider integrating with the
Kubernetes LoadBalancer Service type is sufficient, including cloud-native providers, F5
BIG-IP, and MetalLB.
Networking requirements:
Ensure the following network configuration between primary and secondary clusters:
- Allow ingress and egress traffic on TCP port
8200(API). - Allow ingress and egress traffic on TCP port
8201(replication). - Ensure DNS or IP resolution between clusters.
- Avoid TLS termination between clusters. Use TLS passthrough.
- Manage network latency between clusters. Latency should be less than 8 ms for intra-cluster Raft peers. Replication is asynchronous, but sustained latency increases lag.
Internal cluster networking for Raft peer communication.
Vault pods use port 8201/TCP for Raft consensus traffic, including log
replication, heartbeats, and leader election. The headless service
<release>-internal provides DNS records for Raft peer discovery.
Use auto_join with the Kubernetes provider (go-discover) for dynamic peer
discovery:
storage "raft" {
path = "/vault/data"
retry_join {
auto_join = "provider=k8s namespace=<namespace> label_selector=\"app.kubernetes.io/name=<chart-name>,component=server\""
auto_join_scheme = "https"
leader_tls_servername = "vault.apps.example.com"
leader_ca_cert_file = "/vault/userconfig/vault-tls/ca.crt"
}
}
When embedding in Helm ha.raft.config, use Helm template expressions:
auto_join = "provider=k8s namespace={{.Release.Namespace}} label_selector=\"app.kubernetes.io/name={{ template \"vault.name\" . }},component=server\""
Verify DNS resolution before cluster initialization:
oc exec vault-0 -n vault -- nslookup vault-0.vault-internal.vault.svc
oc exec vault-1 -n vault -- nslookup vault-1.vault-internal.vault.svc
oc exec vault-2 -n vault -- nslookup vault-2.vault-internal.vault.svc
Service topology
In high availability (HA) mode, the Vault Helm chart creates the following services:
| Service | Purpose |
|---|---|
| <release> | All Vault server pods. Exposes ports 8200 and 8201. Use for client API traffic. |
| <release>-active | Active Vault pod only. Use for replication traffic on port 8201. |
| <release>-standby | Standby pods only. Use only when you need to specifically target standbys. |
| <release>-internal | Headless service for Raft peer DNS discovery. Do not use for client traffic. |
OpenShift Routes:The Vault Helm chart creates a route when server.route.enabled:
true.
Default recommendation: Passthrough TLS termination:
server:
route:
enabled: true
activeService: false
host: vault.apps.example.com
tls:
termination: passthrough
Setting activeService: false routes traffic to all Vault pods. With Vault
Enterprise, performance standbys serve most read operations locally. Distributing traffic across all
pods improves throughput.
| Strategy | How it works | When to use |
|---|---|---|
| Passthrough (recommended) | Router forwards raw TLS to Vault. End-to-end TLS preserved. | Default for production. Required when unbroken TLS is a compliance requirement. |
| Re-encrypt | Router terminates client TLS, re-encrypts to backend. | When organization policy requires router to terminate TLS, or Layer 7 HTTP visibility is needed. |
| Edge | Router terminates TLS and forwards plaintext HTTP. | Not applicable for production Vault. |
Best practices
- Always use
LoadBalancerfor port 8201 in production DR replication. - Use TLS end-to-end with no interception or termination on the replication path.
- Keep clusters time-synchronized. Time synchronization is critical for token validity and Raft elections.
- Use passthrough TLS termination (recommended).
- Rotate secondary tokens securely and promptly (default time to live is approximately 30 minutes).
- Deploy Vault on dedicated infrastructure nodes for production workloads.
- Automate Vault Raft snapshots with secure off-cluster backups and periodic restore validation.