Deploying Vault on OpenShift Container Platform

Deploy a production-ready HashiCorp Vault on OpenShift Container Platform (OCP) using Vault Enterprise s390x images with integrated Raft storage, high availability, and disaster recovery replication.

Deployment architecture

This procedure deploys a Vault Enterprise cluster with 3 Vault pods, each running on a separate OpenShift compute node. This ensures high availability and fault tolerance by preventing a single node failure from impacting multiple Vault instances.

Pod anti-affinity is configured to ensure that OpenShift schedules each Vault pod on a different node, eliminating single points of failure and improving cluster resilience.

Figure 1. Deployment architecture
OpenShift Container Platform deployment architecture showing Vault pods distributed across multiple nodes

Prerequisites

Before you begin, ensure you have the following:

  • OCP 4.22 or later (Kubernetes 1.35) with PodTopologyLabels admission controller.
  • Network TCP ports 8200 and 8201 open between primary and DR clusters
  • Kubernetes secret containing your Vault Enterprise license
  • For production deployments, TLS encryption is mandatory. Create a Kubernetes secret containing your CA certificate, TLS certificate, and private key.
  • Priority class created to ensure Vault pods are treated as high priority and are less likely to be evicted during node resource pressure.

Procedure

  1. Create a priority class file named priorityclass.yaml:
    
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: vault-priority
    value: 100000
    globalDefault: false
    description: "Priority class for Vault pods"
    preemptionPolicy: Never
    
  2. Apply the priority class:
    
    oc apply -f priorityclass.yaml
    
  3. Add the HashiCorp Helm repository and update the local cache:
    
    helm repo add openshift-helm-charts https://charts.openshift.io/
    helm repo update
    
  4. Create a file named values.yaml with the Helm configuration:
    
    # ─── GLOBAL SETTINGS ───────────────────────────────────────────────────────────
    # Enables the Helm chart, activates OpenShift-specific resources (Routes, SCCs),
    # and enforces TLS across all Vault communication.
    global:
      enabled: true
      openshift: true
      tlsDisable: false
    
    # ─── SERVER IMAGE ──────────────────────────────────────────────────────────────
    # Pulls IBM's enterprise Vault image from the IBM Cloud Container Registry.
    # IfNotPresent avoids unnecessary re-pulls in air-gapped or bandwidth-limited environments.
    server:
      image:
        repository: registry.connect.redhat.com/hashicorp/vault-enterprise 
        tag: 2.0.0-ent.hsm-ubi
        pullPolicy: IfNotPresent
    
      # ─── ENTERPRISE LICENSE ──────────────────────────────────────────────────────
      # References a pre-existing Kubernetes Secret that holds the Vault Enterprise
      # license key. The Secret must exist before the chart is installed.
      enterpriseLicense:
        secretName: vault-ent-license
        secretKey: license
    
      # ─── EXTRA ENVIRONMENT VARIABLES ─────────────────────────────────────────────
      # Injects VAULT_CACERT so Vault's own CLI and SDK trust the internal CA
      # when making TLS calls (e.g. health checks, raft join).
      extraEnvironmentVars:
        VAULT_CACERT: /vault/userconfig/vault-tls/ca.crt
    
      # ─── EXTRA VOLUMES ────────────────────────────────────────────────────────────
      # Mounts the vault-tls Secret (containing ca.crt, vault.crt, vault.key) into
      # the pod so the listener and retry_join stanzas can reference those files.
      # The Secret should be a kubernetes.io/tls type populated by cert-manager or
      # provisioned manually; defaultMode 0400 restricts read access to the vault user.
      extraVolumes:
        - type: secret
          name: vault-tls
    
      # ─── SECURITY CONTEXT ─────────────────────────────────────────────────────────
      # Hardens the pod: prevents running as root, pins the UID to 100 (vault user),
      # and sets the fsGroup so mounted volumes are writable by the vault process.
      # Required on OpenShift because the platform enforces restricted SCCs by default.
      securityContext:
        runAsNonRoot: true
        runAsUser: 100
        fsGroup: 1000
    
      # ─── RESOURCE REQUESTS & LIMITS ───────────────────────────────────────────────
      # Requests guarantee the scheduler places the pod on a node with headroom;
      # setting memory requests == limits produces Burstable QoS, which reduces
      # eviction risk during node memory pressure compared to BestEffort (no requests).
      # The memory limit caps runaway growth; no CPU limit avoids throttling during
      # leader elections and crypto-heavy seal/unseal operations. On shared nodes,
      # add a CPU limit sized from vault-benchmark measurements to cap noisy-neighbour
      # impact. Refer to Hardware sizing for Vault servers for baseline sizing guidance.
      resources:
        requests:
          cpu: "500m"
          memory: "500Mi"
        limits:
          memory: "1000Mi"
    
      # ─── PRIORITY CLASS ───────────────────────────────────────────────────────────
      # Assigns a high-priority class so the Vault pod is the last to be evicted
      # under node memory pressure. The PriorityClass object must exist in the cluster
      # before this chart is installed. Set preemptionPolicy: Never on the
      # PriorityClass to avoid preempting lower-priority application pods during
      # scheduling. System-critical DaemonSet pods (networking, monitoring) should
      # carry a higher PriorityClass value than Vault to protect them from eviction.
      priorityClassName: vault-priority
    
      # ─── LIVENESS PROBE ───────────────────────────────────────────────────────────
      # Restarts the container if Vault becomes unresponsive. The health endpoint
      # query parameters ensure the probe does not restart pods that are merely sealed
      # (204) or uninitialized (204), which are expected transient states rather than
      # hard failures. failureThreshold: 2 allows one transient failure before restart.
      # initialDelaySeconds: 60 gives Vault time to complete startup and join Raft
      # before the first check runs.
      livenessProbe:
        enabled: true
        path: "/v1/sys/health?standbyok=true&perfstandbyok=true&drsecondarycode=200&sealedcode=204&uninitcode=204"
        port: 8200
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 2
    
      # ─── READINESS PROBE ──────────────────────────────────────────────────────────
      # Gates traffic routing: the pod is removed from Service endpoints until this
      # probe passes. standbyok=true and perfstandbyok=true allow standby nodes to
      # receive traffic, which is correct for Vault Enterprise where performance
      # standbys serve most read requests locally. drsecondarycode=200 marks DR
      # secondary pods ready so they can serve metrics and accept replication traffic.
      # initialDelaySeconds: 15 is intentionally shorter than the liveness delay
      # because readiness is checked more conservatively — a 307 redirect from a
      # standby would still signal not-ready without standbyok=true.
      readinessProbe:
        enabled: true
        path: "/v1/sys/health?standbyok=true&perfstandbyok=true&drsecondarycode=200"
        port: 8200
        initialDelaySeconds: 15
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 2
    
      # ─── DATA STORAGE ─────────────────────────────────────────────────────────────
      # Persistent volume for Raft's on-disk data (secrets, policies, tokens).
      # Ceph RBD (ReadWriteOnce) is suitable because each Raft peer owns its own data
      # directory; ReadWriteMany must NOT be used as integrated storage requires
      # exclusive access. Choose a StorageClass with low latency and high IOPS — slow
      # disk on any single voter delays the entire Raft commit pipeline and can trigger
      # unnecessary leader elections. Target p99 write latency < 10 ms on
      # vault.raft-storage.put; validate with vault-benchmark before production.
      dataStorage:
        enabled: true
        size: 100Mi
        mountPath: /vault/data
        storageClass: ocs-storagecluster-ceph-rbd
        accessMode: ReadWriteOnce
    
      # ─── AUDIT STORAGE ────────────────────────────────────────────────────────────
      # Separate persistent volume for the audit log device. Keeping audit logs on
      # their own PVC prevents a busy audit device from filling the data volume and
      # corrupting Raft state. Vault halts all client operations when every configured
      # audit device fails, so treat this PVC with the same reliability care as the
      # data PVC. Consider adding a second audit device (e.g. a socket device pointing
      # to an external TCP collector) so a single device failure does not block writes.
      auditStorage:
        enabled: true
        size: 100Mi
        mountPath: /vault/audit
        storageClass: ocs-storagecluster-ceph-rbd
        accessMode: ReadWriteOnce
    
      # ─── STANDALONE MODE ──────────────────────────────────────────────────────────
      # Disabled in favour of HA mode below. A standalone instance has no
      # replication and is unsuitable for production.
      standalone:
        enabled: false
    
      # ─── HIGH AVAILABILITY ────────────────────────────────────────────────────────
      # Runs three Vault replicas backed by Raft integrated storage. With Raft, one
      # node is the active leader and the remaining two are hot standbys (performance
      # standbys in Vault Enterprise) that can take over immediately on failure.
      # Three replicas provide a quorum of 2 — the cluster tolerates 1 simultaneous
      # pod failure.
      ha:
        enabled: true
        replicas: 3
    
        # ─── POD DISRUPTION BUDGET ─────────────────────────────────────────────────
        # Limits voluntary disruptions (node drains, cluster upgrades, manual pod
        # deletions) to one pod at a time, preserving the quorum of two nodes required
        # for Raft consensus in a 3-replica cluster. PDBs do not protect against
        # involuntary disruptions such as hardware failures. The Kubernetes API server
        # rejects eviction requests that would violate this budget, causing drain
        # operations to evict pods sequentially as PDB conditions allow.
        disruptionBudget:
          enabled: true
          maxUnavailable: 1
    
        # ─── RAFT INTEGRATED STORAGE ───────────────────────────────────────────────
        # Enables Raft as the storage backend — no external Consul cluster required.
        # setNodeId derives the Raft node ID from the pod name (e.g. vault-0) for a
        # stable identity that survives pod restarts and PVC reattachment.
        raft:
          enabled: true
          setNodeId: true
    
          config: |
            # Enable the Vault web UI (also toggled by the ui.enabled key below).
            ui = true
    
            # api_addr is the externally reachable address clients use to contact Vault.
            # Must match the Route hostname and a SAN on the TLS listener certificate;
            # TLS validation will fail for external clients if these do not align.
            # cluster_addr uses the pod's stable DNS name on the headless internal Service
            # for peer-to-peer Raft replication traffic on port 8201. The $(POD_NAME)
            # environment variable is injected by the StatefulSet controller.
            api_addr = "https://vault.ibm.com"
            cluster_addr = "https://$(POD_NAME).vault-internal:8201"
    
            # Registers the active pod with the Kubernetes API so the vault-active label
            # is updated automatically. The <release>-active Service selector uses this
            # label to direct replication traffic to the active node; requires RBAC
            # permissions (get/list/watch on pods, update on pods) for the Vault service account.
            service_registration "kubernetes" {}
    
            # ─── TELEMETRY ───────────────────────────────────────────────────────────
            # Exposes Prometheus-format metrics at /v1/sys/metrics. prometheus_retention_time
            # must exceed the Prometheus scrape interval to prevent series gaps; 12h is
            # recommended over the 24h default to reduce Vault's in-memory metric footprint.
            # disable_hostname prevents pod hostnames from being embedded in metric names,
            # which would break time-series continuity when pods are replaced. The
            # unauthenticated_metrics_access listener setting (below) must also be true
            # to allow scraping from standby and sealed pods without a Vault token.
            telemetry {
              prometheus_retention_time = "12h"
              disable_hostname = true
            }
    
            # ─── TCP LISTENER ────────────────────────────────────────────────────────
            # Binds on all interfaces (IPv4 + IPv6 via [::]) on the standard Vault
            # API port (8200) and cluster port (8201). Mutual TLS is enabled: Vault
            # presents its certificate and validates client certificates against the
            # internal CA. The certificate SANs must include the external Route hostname
            # (for passthrough TLS and external clients), the internal Service DNS name
            # (vault.<namespace>.svc), and 127.0.0.1 (for in-pod Vault CLI calls).
            # tls_client_ca_file enables optional mTLS for the cert auth method; remove
            # it if your deployment does not use TLS certificate authentication.
            listener "tcp" {
              address         = "[::]:8200"
              cluster_address = "[::]:8201"
    
              tls_cert_file      = "/vault/userconfig/vault-tls/vault.crt"
              tls_key_file       = "/vault/userconfig/vault-tls/vault.key"
              tls_client_ca_file = "/vault/userconfig/vault-tls/ca.crt"
    
              # Allows Prometheus (and other unauthenticated collectors) to scrape
              # /v1/sys/metrics on every pod, including sealed and standby pods that
              # cannot validate tokens locally. Without this, standby pods redirect
              # metric requests to the active node, losing per-pod visibility.
              telemetry {
                  unauthenticated_metrics_access = "true"
                }
            }
    
            # ─── RAFT STORAGE & AUTO-JOIN ────────────────────────────────────────────
            # Stores Raft data on the PVC mounted at /vault/data (see dataStorage above).
            # node_id is set from POD_NAME to give each peer a stable Raft identity.
            #
            # retry_join uses the Kubernetes go-discover provider to locate peer pods
            # by label selector, eliminating hardcoded pod IP lists and adapting
            # automatically when the StatefulSet scales. The Vault service account needs
            # RBAC get/list on pods in the release namespace for this to work.
            #
            # All join traffic uses HTTPS; leader_ca_cert_file verifies the leader's
            # TLS certificate against the internal CA. leader_tls_servername overrides
            # the hostname used for SNI validation during the bootstrap handshake — it
            # must match a SAN on the listener certificate because pod IPs are never
            # included as SANs. Without this field, TLS validation fails and cluster
            # formation is blocked.
            storage "raft" {
              path = "/vault/data"
              node_id = "$(POD_NAME)"
    
              retry_join {
                auto_join             = "provider=k8s namespace=vault label_selector=\"app.kubernetes.io/name=vault,component=server\""
                auto_join_scheme      = "https"
                leader_tls_servername = "vault.ibm.com"
                leader_ca_cert_file   = "/vault/userconfig/vault-tls/ca.crt"
              }
            }
    
      # ─── OPENSHIFT ROUTE ───────────────────────────────────────────────────────────
      # Exposes Vault externally via an OpenShift Route backed by the HAProxy router.
      # Passthrough termination forwards raw TLS to the Vault pod — the router does
      # not decrypt the stream, preserving end-to-end TLS between clients and Vault.
      # This is the recommended mode: it supports all Vault auth methods including
      # TLS certificate auth, and is consistent with HashiCorp's standard TLS guidance.
      #
      # activeService: false (default) routes through the <release> Service, which
      # targets all pods. With Vault Enterprise performance standbys, distributing
      # traffic across all pods improves read throughput. Set activeService: true only
      # for replication endpoints that must reach the active node exclusively.
      #
      # NOTE: Replace "vault.ibm.com" with your actual Route hostname. This hostname
      # must also be present as a SAN in the TLS certificate (see the TLS volumes
      # above), otherwise TLS validation will fail for external clients. The same
      # hostname must match api_addr and leader_tls_servername in the Raft config.
      route:
        enabled: true
        host: "vault.ibm.com"
        tls:
          termination: passthrough
    
      # ─── LOG FORMAT ────────────────────────────────────────────────────────────────
      # JSON format enables field-level filtering in log aggregation pipelines
      # (e.g. ClusterLogForwarder, Fluent Bit, Vector). Operational logs write to
      # stderr; audit logs are configured separately via Vault audit device commands
      # after initialization. Do not use debug or trace in production — they
      # significantly increase log volume and degrade Vault performance.
      logFormat: "json"
    
    # ─── WEB UI ──────────────────────────────────────────────────────────────────────
    # Enables the Vault browser UI at the api_addr host. Also toggled by the
    # `ui = true` directive in the HCL Raft config above; both must agree.
    ui:
      enabled: true
    
    # ─── AGENT INJECTOR ──────────────────────────────────────────────────────────────
    # It is not released yet
    injector:
      enabled: false
    
    # ─── SERVER TELEMETRY ────────────────────────────────────────────────────────────
    # Controls Prometheus metrics collection and alerting rule deployment.
    serverTelemetry:
    
      # ─── SERVICE MONITOR ─────────────────────────────────────────────────────────
      # Creates a ServiceMonitor custom resource for Prometheus Operator (or the
      # OpenShift Cluster Monitoring Operator with user workload monitoring enabled).
      # interval: 30s provides sufficient granularity for alerting and dashboards
      # without adding excessive metric storage or CPU overhead.
      #
      # matchLabels overrides the default fallback of vault-active: "true", which
      # would scrape only the active pod and lose per-pod standby metrics. Setting
      # vault-internal: "true" targets the headless internal Service, ensuring all
      # pods — including performance standbys and DR secondaries — are scraped.
      # This is required to collect per-pod Raft health, read latency from standbys,
      # and write-forwarding latency signals.
      serviceMonitor:
        enabled: true
        interval: 30s
    
        matchLabels:
          vault-internal: "true"
    
      # ─── PROMETHEUS RULES ────────────────────────────────────────────────────────
      # Creates a PrometheusRule custom resource containing alert definitions.
      # These rules are evaluated by Prometheus (or the user-workload Prometheus
      # instance on OpenShift) and route through the configured Alertmanager.
      # On OpenShift with user workload monitoring enabled, alerts appear in the
      # Console under Observe > Alerting alongside platform alerts. Define thresholds
      # that detect degraded Vault health (e.g. sealed pods, leader election storms,
      # high Raft commit latency) before users are affected.
      prometheusRules:
        enabled: true
    
  5. Install Vault using Helm with the custom values file:
    
    helm install vault openshift-helm-charts/vault -n vault -f values.yaml
    
    Note: To upgrade an existing deployment, use: helm upgrade --install vault openshift-helm-charts/vault -n vault -f values.yaml.
  6. Verify that all pods are running:
    
    oc get pods -n vault -l app.kubernetes.io/name=vault
    

    Expected output:

    
    NAME      READY   STATUS
    vault-0   1/1     Running
    vault-1   1/1     Running
    vault-2   1/1     Running
    
  7. Initialize the Vault cluster on a single pod:
    Important: Initialization should only be performed once on a single pod.

    Vault always boots in a sealed state. With Shamir, an operator must provide three of five unseal keys to each pod after every restart, which is impractical in a Kubernetes environment where pods are restarted by the kubelet for upgrades, node drains, and probe failures. Auto-unseal delegates the unseal-key reconstruction to a hardware (or cloud-managed) Key Management System, allowing each pod to self-unseal at startup using the protected key material.

    • Note: For Vault unsealing, HSM is recommended. Multi-card redundancy, the HSM domain lifecycle is decoupled from the Vault pod lifecycle, and Vault pods can move freely between OpenShift compute nodes without losing their auto-unseal credential. See Auto-unseal with HSM.
    • For manual unseal, Vault is unsealed by providing a threshold of Shamir key shares (default: 3 of 5) manually after every restart. Key shares should be distributed across separate, trusted individuals.
      
      oc exec -n vault -it vault-0 -- /bin/sh
      vault operator unseal <UNSEAL_KEY_1>
      vault operator unseal <UNSEAL_KEY_2>
      vault operator unseal <UNSEAL_KEY_3>
      vault status
      exit
      
      Warning: With manual unseal, every pod restart requires operator intervention. Ensure key holders are available and have a documented on-call process before using this in production.

The primary Vault cluster is production-ready.

To set up disaster recovery, see Configuring disaster recovery replication.

For performance replication across multiple regions, see Configuring performance replication.

Vault Disaster Recovery (DR) on OpenShift

A Vault cluster can perform DR replication to any Vault cluster on any platform. This procedure focuses on deployment of a Vault DR cluster that runs on Red Hat OpenShift on IBM Z and LinuxONE. It follows HashiCorp Validated Design (HVD) recommendations and highlights key networking, security, topology, and operational considerations specific to Red Hat OpenShift Container Platform (OCP).

Restriction: Client implementation specifics and application integrations are out of scope. For consuming Vault secrets from OpenShift applications, see Vault Secrets Operator.

Prerequisites

Before you begin, ensure the following prerequisites are met:

  • The DR cluster must run the same version of Vault as the primary cluster.
  • Network connectivity between primary and secondary clusters on ports 8200 (API) and 8201 (replication).
  • DNS or IP resolution configured between clusters.
  • LoadBalancer service configured for both API and cluster ports.
  • TLS certificates properly configured for end-to-end encryption.
Important: If the DR cluster is demoted, DR replication is not possible from a promoted cluster running a higher version of Vault.

Procedure

  1. From your terminal, set the Vault address for the primary cluster:
    
    export VAULT_ADDR=https://<primary-route-or-lb>
    
  2. Enable DR primary replication on the primary cluster:
    
    vault write -f sys/replication/dr/primary/enable \
      primary_cluster_addr="https://<loadbalancer-address>:8201"
    
    Important: This operation briefly makes Vault unavailable during the configuration change.
    • The primary_cluster_addr parameter must point to the LoadBalancer address on port 8201.
    • Never use an OpenShift Route for this address.
  3. Generate a secondary token for the DR cluster:
    
    vault write sys/replication/dr/primary/secondary-token \
      id="dr-secondary"
    

    The command outputs a wrapping_token. Save this token securely.

    Attention: The token has a limited time to live (default approximately 30 minutes). Proceed promptly to the next step.
  4. Switch to the secondary cluster by setting the Vault address:
    
    export VAULT_ADDR=https://<secondary-route-or-lb>
    
  5. Enable DR secondary replication on the secondary cluster:
    
    vault write sys/replication/dr/secondary/enable \
      token="<wrapping_token>" \
      primary_api_addr="https://<primary-api-address>" \
      ca_file="/vault/userconfig/vault-tls/ca.crt"
    

    Parameter descriptions:

    Table 1. DR secondary enable parameters
    Parameter Points to Port
    primary_api_addr Primary API endpoint (Route or LoadBalancer) 8200
    primary_cluster_addr (set in step 2) Primary LoadBalancer 8201
  6. Verify the replication status on both clusters:
    
    vault read sys/replication/dr/status
    

    Expected output:

    Table 2. Expected replication status
    Cluster Mode
    Primary mode = primary
    Secondary mode = secondary

DR replication is now configured between the primary and secondary Vault clusters. The secondary cluster maintains a near-real-time copy of the primary cluster's data and can be promoted to active status, if needed.

Key design consideration

For DR replication, do not use an OpenShift Route for port 8201.

Table 3. Vault traffic types and exposure methods
Traffic type Port Allowed exposure
API traffic 8200 Route or LoadBalancer
Replication traffic 8201 LoadBalancer only

Reasons for using LoadBalancer for replication traffic:

  • Vault replication uses gRPC over HTTP/2 on port 8201.
  • Replication requires end-to-end TCP passthrough.
  • OpenShift Routes (HAProxy) may break HTTP/2 unless carefully configured.
  • Routes operate on port 443, requiring port mapping (443 to 8201).

LoadBalancer service for Vault:

Create a LoadBalancer service (Layer 4 TCP) to expose both API and cluster ports:


apiVersion: v1
kind: Service
metadata:
  name: vault-active-lb
  namespace: vault
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: vault
    vault-active: "true"
  ports:
    - name: api
      port: 8200
      targetPort: 8200
      protocol: TCP
    - name: cluster
      port: 8201
      targetPort: 8201
      protocol: TCP

The LoadBalancer must allow TCP passthrough. Any provider integrating with the Kubernetes LoadBalancer Service type is sufficient, including cloud-native providers, F5 BIG-IP, and MetalLB.

Networking requirements:

Ensure the following network configuration between primary and secondary clusters:

  • Allow ingress and egress traffic on TCP port 8200 (API).
  • Allow ingress and egress traffic on TCP port 8201 (replication).
  • Ensure DNS or IP resolution between clusters.
  • Avoid TLS termination between clusters. Use TLS passthrough.
  • Manage network latency between clusters. Latency should be less than 8 ms for intra-cluster Raft peers. Replication is asynchronous, but sustained latency increases lag.

Internal cluster networking for Raft peer communication.

Vault pods use port 8201/TCP for Raft consensus traffic, including log replication, heartbeats, and leader election. The headless service <release>-internal provides DNS records for Raft peer discovery.

Use auto_join with the Kubernetes provider (go-discover) for dynamic peer discovery:


storage "raft" {
  path = "/vault/data"
  retry_join {
    auto_join = "provider=k8s namespace=<namespace> label_selector=\"app.kubernetes.io/name=<chart-name>,component=server\""
    auto_join_scheme = "https"
    leader_tls_servername = "vault.apps.example.com"
    leader_ca_cert_file = "/vault/userconfig/vault-tls/ca.crt"
  }
}

When embedding in Helm ha.raft.config, use Helm template expressions:


auto_join = "provider=k8s namespace={{.Release.Namespace}} label_selector=\"app.kubernetes.io/name={{ template \"vault.name\" . }},component=server\""
Note: The Vault service account requires RBAC permissions to get and list pods in the namespace.

Verify DNS resolution before cluster initialization:


oc exec vault-0 -n vault -- nslookup vault-0.vault-internal.vault.svc
oc exec vault-1 -n vault -- nslookup vault-1.vault-internal.vault.svc
oc exec vault-2 -n vault -- nslookup vault-2.vault-internal.vault.svc

Service topology

In high availability (HA) mode, the Vault Helm chart creates the following services:

Table 4. Vault services in HA mode
Service Purpose
<release> All Vault server pods. Exposes ports 8200 and 8201. Use for client API traffic.
<release>-active Active Vault pod only. Use for replication traffic on port 8201.
<release>-standby Standby pods only. Use only when you need to specifically target standbys.
<release>-internal Headless service for Raft peer DNS discovery. Do not use for client traffic.
Important: Always use <release>-active for replication traffic. The <release> service distributes traffic to all pods. Standby nodes cannot process replication requests.

OpenShift Routes:The Vault Helm chart creates a route when server.route.enabled: true.

Default recommendation: Passthrough TLS termination:


server:
  route:
    enabled: true
    activeService: false
    host: vault.apps.example.com
    tls:
      termination: passthrough

Setting activeService: false routes traffic to all Vault pods. With Vault Enterprise, performance standbys serve most read operations locally. Distributing traffic across all pods improves throughput.

Table 5. TLS termination strategy comparison
Strategy How it works When to use
Passthrough (recommended) Router forwards raw TLS to Vault. End-to-end TLS preserved. Default for production. Required when unbroken TLS is a compliance requirement.
Re-encrypt Router terminates client TLS, re-encrypts to backend. When organization policy requires router to terminate TLS, or Layer 7 HTTP visibility is needed.
Edge Router terminates TLS and forwards plaintext HTTP. Not applicable for production Vault.

Best practices

  • Always use LoadBalancer for port 8201 in production DR replication.
  • Use TLS end-to-end with no interception or termination on the replication path.
  • Keep clusters time-synchronized. Time synchronization is critical for token validity and Raft elections.
  • Use passthrough TLS termination (recommended).
  • Rotate secondary tokens securely and promptly (default time to live is approximately 30 minutes).
  • Deploy Vault on dedicated infrastructure nodes for production workloads.
  • Automate Vault Raft snapshots with secure off-cluster backups and periodic restore validation.