Container pod move actions
Move a pod that is controlled by a ReplicationController, ReplicaSet, or Deployment (through a ReplicaSet) to another node (VM) to address performance issues or improve infrastructure efficiency. For example, if a particular node is congested for CPU, you can move pods to a node with sufficient capacity. If a node is underutilized and is a candidate for suspension, you must first move the pods before you can safely suspend the node.
During a pod move, Turbonomic copies and persists all labels, except for the following labels with keys:
-
pod-template-hash
-
deployment
-
deploymentconfig
If the pod move was successful, these labels are copied to the pod.
The following items impact the generation and execution of pod move actions:
Placement policies
You can create placement policies to enforce constraints for pod move actions. For example, you can have a policy that allows pods to only move to certain nodes, or a policy that prevents pods from moving to certain nodes.
For more information, see Creating Placement Policies.
Taints and tolerations
Turbonomic considers taints and tolerations as constraints. For example, if a pod has a toleration attribute that restricts it from moving to a certain node, Turbonomic will not move that pod to the restricted node.
Node labels
Turbonomic imports node labels and treats them as constraints. For example, if a pod has a defined node label, Turbonomic will move that pod to a node with a matching label.
Auto-generated policies for pods
For the following workloads, Turbonomic auto-generates groups and policies.
-
Pods with topology spread constraints that have a setting of
DoNotSchedule
for thewhenUnsatisfiable
constraint apply an auto-generated policy that disables pod move actions. -
StatefulSet pods apply an auto-generated policy that sets the action acceptance mode for pod move actions to recommend-only.
For information on the action handling mechanism for these pods in Turbonomic, see this topic.
Pod affinity and anti-affinity rules
Turbonomic recognizes pod affinity and anti-affinity rules.
Turbonomic uses a generic construct to represent entity-to-entity
affinity and anti-affinity using commodities bough and sold by relevant entities.
Special topologies, where the topology key used in the affinity or anti-affinity
rule is not
hostname
, are handled through an internal provider entity that
represents a group of nodes. These groups are created by processing the labels on
the cluster nodes.
Turbonomic uses a proprietary mechanism to process the commodities such that the affinity and anti-affinity rules are respected while continuously driving the target environments toward the desired state. If the rules break because of changed labels or recreated pods in the Kubernetes cluster, or conflicting policies created by users in the Turbonomic user interface, Turbonomic generates move actions to alleviate the situation. To review these actions in Action Center, look for container pod move actions with an action category of Compliance and a risk of Unsatisfied Affinity Constraints or Unsatisfied Anti-Affinity Constraints.
The following table describes the scenarios where Turbonomic generates an action to move a source or destination pod to another provider. If there is no valid provider in the cluster, Turbonomic generates a reconfigure action.
Scenario | Kubernetes action | Turbonomic action |
---|---|---|
Source pod to destination pod affinity with the topology key
kubernetes.io/hostname Source and destination pods are on the same node. |
Originally schedule pods correctly by placing both pods on the same node. | No additional action |
Source pod to destination pod affinity with the topology key
kubernetes.io/hostname Source and destination pods originally scheduled by Kubernetes are on the same node. Kubernetes restarted and scheduled the destination pod to another node without considering the source pods affinities during scheduling. |
Schedule pods correctly when the source was scheduled, but might schedule pods incorrectly when only the destination pod was scheduled. | Generate an action to move the source or destination pod to the same node. |
Source pod to destination pod affinity with a topology key that
is not
kubernetes.io/hostname Source and destination pods are on the same topology. |
Schedule pods correctly by placing both pods on node(s) on the same topology. | No additional action |
Source pod to destination pod affinity with a topology key that
is not
kubernetes.io/hostname Source and destination pods originally scheduled by Kubernetes are on the same topology. Kubernetes restarted and scheduled the destination pod to another node without considering the source pods affinities during scheduling. |
Schedule pods correctly when the source was scheduled, but might schedule pods incorrectly when only the destination pod was scheduled. | Generate an action to move the source or destination pod to nodes on the same topology. |
Source pod to destination pod affinity with a topology key that
is not
kubernetes.io/hostname Source and destination pods originally scheduled by Kubernetes are on different nodes but are on the same topology. Labels on the nodes changed later, putting the nodes on two different topologies. |
Pods remain incorrectly placed. | Generate an action to move the source or destination pod to node(s) on the same topology. |
Source pod to destination pod anti-affinity with a topology key
that is not
kubernetes.io/hostname Source and destination pods are on different topologies. |
Originally schedule pods correctly by placing pods on different topologies. | No additional action |
Source pod to destination pod anti-affinity with a topology key
that is not
kubernetes.io/hostname Source and destination pods originally scheduled by Kubernetes are on node(s) on different topologies. Labels on the nodes changed later, putting the nodes on the same topology. |
Pods remain incorrectly placed. | Generate an action to move the source or destination pod to node(s) on different topologies. |
Source pod to destination pod anti-affinity with a topology key
that is not
kubernetes.io/hostname Source and destination pods originally scheduled by Kubernetes are on node(s) on different topologies. Labels on the nodes changed later, putting the nodes on the same topology. As a result, there is no valid topology node that the source or destination pod can move to. For example, nodes were distributed into
|
Pods remain incorrectly placed. | Generate a reconfigure action since there is no valid topology node that the source or destination pod can move to. |
Eviction thresholds
Turbonomic considers the memory/storage eviction thresholds of the
destination node to ensure that the pod can be scheduled after it moves. Eviction
thresholds for imagefs
and rootfs
are reflected as
node effective capacity in the market analysis.
Temporary quota increases
If a namespace quota is already fully utilized, Turbonomic temporarily increases the quota to allow a pod to move.
For details, see Namespace Actions.
Security context constraints (SCCs)
Red Hat OpenShift uses SCCs to control permissions for pods. This translates to permissions that users see within the containers of the pods, and the permissions for the processes running inside those pods.
When executing pod move actions, Kubeturbo normally runs with Red Hat OpenShift cluster
administrator permissions to create a new pod and remove the old one. Because of this, the SCCs for
the new pod are those that are available to a cluster administrator. It is therefore possible for
the new pod to run with an SCC that has higher privileges than the old pod. For example, an old pod
might have restricted scc
access, while the new one might have anyuid
scc
access. This introduces a privilege escalation issue.
To prevent privilege escalation when moving pods, Kubeturbo enforces user impersonation, which carries the user-level SCCs of the old pod over to the new pod. To enforce user impersonation, Kubeturbo performs the following tasks:
-
Create a user impersonation account for each SCC level.
-
Create a service account and treat it as a user account for each SCC level currently running in a given cluster.
-
Provide role-based access to SCCs used for impersonation via the service accounts. A service account is allowed to use only one SCC resource in the cluster.
-
Create a
role binding
resource to allow service account access to a particular role.
All resources created to enforce user impersonation are removed when Kubeturbo shuts down.
Be aware that by default, an arbitrary pod running in a given cluster does not recognize the
namespace it is configured to run in, which is a requirement for user impersonation enforcement. For
Kubeturbo to recognize the namespaces for pods, it is recommended that you add an environment
variable named KUBETURBO_NAMESPACE
via the downward API. The Red Hat OpenShift standard
installation methods add the following environment variable to the Kubeturbo deployment spec.
env:
- name: KUBETURBO_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
With this environment variable, Kubeturbo can successfully create the resources needed to enforce
user impersonation. Without this variable, Kubeturbo creates the resources in the namespace called
default
. This might cause issues if you need to run multiple instances of Kubeturbo
in the same cluster. For example, one instance might run as an observer, and another as an
administrator. To ensure multiple Kubeturbo instances within the same cluster do not conflict when
creating and removing user impersonation resources, run the instances in separate namespaces.