Container pod move actions

Move a pod that is controlled by a ReplicationController, ReplicaSet, or Deployment (through a ReplicaSet) to another node (VM) to address performance issues or improve infrastructure efficiency. For example, if a particular node is congested for CPU, you can move pods to a node with sufficient capacity. If a node is underutilized and is a candidate for suspension, you must first move the pods before you can safely suspend the node.

During a pod move, Turbonomic copies and persists all labels, except for the following labels with keys:

  • pod-template-hash

  • deployment

  • deploymentconfig

If the pod move was successful, these labels are copied to the pod.

The following items impact the generation and execution of pod move actions:

Placement policies

You can create placement policies to enforce constraints for pod move actions. For example, you can have a policy that allows pods to only move to certain nodes, or a policy that prevents pods from moving to certain nodes.

For more information, see Creating Placement Policies.

Taints and tolerations

Turbonomic considers taints and tolerations as constraints. For example, if a pod has a toleration attribute that restricts it from moving to a certain node, Turbonomic will not move that pod to the restricted node.

Node labels

Turbonomic imports node labels and treats them as constraints. For example, if a pod has a defined node label, Turbonomic will move that pod to a node with a matching label.

Auto-generated policies for pods

For the following workloads, Turbonomic auto-generates groups and policies.

  • Pods with topology spread constraints that have a setting of DoNotSchedule for the whenUnsatisfiable constraint apply an auto-generated policy that disables pod move actions.

  • StatefulSet pods apply an auto-generated policy that sets the action acceptance mode for pod move actions to recommend-only.

For information on the action handling mechanism for these pods in Turbonomic, see this topic.

Pod affinity and anti-affinity rules

Turbonomic recognizes pod affinity and anti-affinity rules.

Turbonomic uses a generic construct to represent entity-to-entity affinity and anti-affinity using commodities bough and sold by relevant entities. Special topologies, where the topology key used in the affinity or anti-affinity rule is not hostname, are handled through an internal provider entity that represents a group of nodes. These groups are created by processing the labels on the cluster nodes.

Turbonomic uses a proprietary mechanism to process the commodities such that the affinity and anti-affinity rules are respected while continuously driving the target environments toward the desired state. If the rules break because of changed labels or recreated pods in the Kubernetes cluster, or conflicting policies created by users in the Turbonomic user interface, Turbonomic generates move actions to alleviate the situation. To review these actions in Action Center, look for container pod move actions with an action category of Compliance and a risk of Unsatisfied Affinity Constraints or Unsatisfied Anti-Affinity Constraints.

The following table describes the scenarios where Turbonomic generates an action to move a source or destination pod to another provider. If there is no valid provider in the cluster, Turbonomic generates a reconfigure action.

Scenario Kubernetes action Turbonomic action
Source pod to destination pod affinity with the topology key kubernetes.io/hostname

Source and destination pods are on the same node.

Originally schedule pods correctly by placing both pods on the same node. No additional action
Source pod to destination pod affinity with the topology key kubernetes.io/hostname

Source and destination pods originally scheduled by Kubernetes are on the same node. Kubernetes restarted and scheduled the destination pod to another node without considering the source pods affinities during scheduling.

Schedule pods correctly when the source was scheduled, but might schedule pods incorrectly when only the destination pod was scheduled. Generate an action to move the source or destination pod to the same node.
Source pod to destination pod affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods are on the same topology.

Schedule pods correctly by placing both pods on node(s) on the same topology. No additional action
Source pod to destination pod affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods originally scheduled by Kubernetes are on the same topology. Kubernetes restarted and scheduled the destination pod to another node without considering the source pods affinities during scheduling.

Schedule pods correctly when the source was scheduled, but might schedule pods incorrectly when only the destination pod was scheduled. Generate an action to move the source or destination pod to nodes on the same topology.
Source pod to destination pod affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods originally scheduled by Kubernetes are on different nodes but are on the same topology. Labels on the nodes changed later, putting the nodes on two different topologies.

Pods remain incorrectly placed. Generate an action to move the source or destination pod to node(s) on the same topology.
Source pod to destination pod anti-affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods are on different topologies.

Originally schedule pods correctly by placing pods on different topologies. No additional action
Source pod to destination pod anti-affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods originally scheduled by Kubernetes are on node(s) on different topologies. Labels on the nodes changed later, putting the nodes on the same topology.

Pods remain incorrectly placed. Generate an action to move the source or destination pod to node(s) on different topologies.
Source pod to destination pod anti-affinity with a topology key that is not kubernetes.io/hostname

Source and destination pods originally scheduled by Kubernetes are on node(s) on different topologies. Labels on the nodes changed later, putting the nodes on the same topology. As a result, there is no valid topology node that the source or destination pod can move to.

For example, nodes were distributed into type=type1 and type=type2. The source pod is on a type1 node while the destination pod is on a type2 node. At some point, all type2 nodes were changed to type1. This means that all cluster nodes are now on the type=type1 topology.

Pods remain incorrectly placed. Generate a reconfigure action since there is no valid topology node that the source or destination pod can move to.

Eviction thresholds

Turbonomic considers the memory/storage eviction thresholds of the destination node to ensure that the pod can be scheduled after it moves. Eviction thresholds for imagefs and rootfs are reflected as node effective capacity in the market analysis.

Temporary quota increases

If a namespace quota is already fully utilized, Turbonomic temporarily increases the quota to allow a pod to move.

For details, see Namespace Actions.

Security context constraints (SCCs)

Red Hat OpenShift uses SCCs to control permissions for pods. This translates to permissions that users see within the containers of the pods, and the permissions for the processes running inside those pods.

When executing pod move actions, Kubeturbo normally runs with Red Hat OpenShift cluster administrator permissions to create a new pod and remove the old one. Because of this, the SCCs for the new pod are those that are available to a cluster administrator. It is therefore possible for the new pod to run with an SCC that has higher privileges than the old pod. For example, an old pod might have restricted scc access, while the new one might have anyuid scc access. This introduces a privilege escalation issue.

To prevent privilege escalation when moving pods, Kubeturbo enforces user impersonation, which carries the user-level SCCs of the old pod over to the new pod. To enforce user impersonation, Kubeturbo performs the following tasks:

  • Create a user impersonation account for each SCC level.

  • Create a service account and treat it as a user account for each SCC level currently running in a given cluster.

  • Provide role-based access to SCCs used for impersonation via the service accounts. A service account is allowed to use only one SCC resource in the cluster.

  • Create a role binding resource to allow service account access to a particular role.

All resources created to enforce user impersonation are removed when Kubeturbo shuts down.

Be aware that by default, an arbitrary pod running in a given cluster does not recognize the namespace it is configured to run in, which is a requirement for user impersonation enforcement. For Kubeturbo to recognize the namespaces for pods, it is recommended that you add an environment variable named KUBETURBO_NAMESPACE via the downward API. The Red Hat OpenShift standard installation methods add the following environment variable to the Kubeturbo deployment spec.

          env:
            - name: KUBETURBO_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

With this environment variable, Kubeturbo can successfully create the resources needed to enforce user impersonation. Without this variable, Kubeturbo creates the resources in the namespace called default. This might cause issues if you need to run multiple instances of Kubeturbo in the same cluster. For example, one instance might run as an observer, and another as an administrator. To ensure multiple Kubeturbo instances within the same cluster do not conflict when creating and removing user impersonation resources, run the instances in separate namespaces.