Configure the MachineConfig
The applied machine configuration performs a set up on the compute node where cards are attached.
Note: If you apply a MachineConfig to the
worker role it is applied to all compute
nodes, and each affected node is rebooted, including the compute node with the Spyre card attached.
To avoid impacting workloads on other compute nodes, assign a dedicated role to the compute node
with the Spyre card attached. Create a separate MachineConfigPool (MCP) for that role, and set the
machineconfiguration.openshift.io/role label in your MachineConfig. This ensures that
configuration changes and reboots apply only to the compute node with the Spyre card attached.
Label the compute node
Label the compute nodes with attached spyre cards by using the spyre role. This identifies the
nodes as spyre enabled and ensures that the following MachineConfigs are applied only to the
attached compute nodes with spyre cards.
- Apply the YAML by running the following command:
oc label node <node-name> node-role.kubernetes.io/spyre=""Note: Replace the<node-name>with the actual name of your compute node. - Verify whether the label is applied correctly by running the following command:
oc get nodes - Check for the output status:
NAME STATUS ROLES AGE VERSION bootstrap-0.ocp-ai-zvm.pok Ready spyre,worker 3h47m v1.32.9The node shows both
spyreandworkerin the ROLES column. - Create a YAML named
mcp.yaml:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: spyre spec: machineConfigSelector: matchExpressions: - key: machineconfiguration.openshift.io/role operator: In values: - spyre - worker nodeSelector: matchLabels: node-role.kubernetes.io/spyre: "" - Apply the MachineConfigPool to your cluster by running the following command:
oc apply -f mcp.yamlThe following output appears after creation:machineconfigpool. machineconfiguration.openshift.io/spyre created. - Verify the MachineConfigPool status by running the following command:
oc get mcp - The expected output is as follows:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-3e7ce7499da16f3d02d4a6984a1185b6 True False False 3 3 3 0 42h spyre rendered-spyre-7f5b176768608f5da2378bc237cbb2bc True False False 0 0 0 0 14s worker rendered-worker-7f5b176768608f5da2378bc237cbb2bc True False False 2 2 2 0 42h - The
spyreMachineConfigPool should show:UPDATED:TrueUPDATING: FalseDEGRADED: FalseMACHINECOUNTmatching the number of Spyre-labeled nodes
Apply the MachineConfig
- Unbind the PCI devices from their default drivers.
- Bind the devices to the VFIO driver.
- To complete the pass through setup, create a YAML named
99-vhostuser-bind.yaml for the required symbolic links under
/sys/bus/pci/devices:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: spyre kubernetes.io/arch: s390x name: 99-vhostuser-bind spec: config: ignition: version: 3.4.0 systemd: units: - name: vhostuser-bind.service enabled: true contents: | [Unit] Description=Vhostuser Interface vfio-pci Bind Wants=network-online.target After=network-online.target ignition-firstboot-complete.service ConditionPathExists=/etc/modprobe.d/vfio.conf [Service] Type=oneshot TimeoutSec=900 ExecStart=/usr/local/bin/vhostuser [Install] WantedBy=multi-user.target storage: files: - contents: inline: vfio-pci filesystem: root mode: 0644 path: /etc/modules-load.d/vfio-pci.conf - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKc2V0IC1lCgpQQ0lfREVWSUNFUz0kKGxzcGNpIC1uIC1kIDEwMTQ6MDZhNyB8IGN1dCAtZCAiICIgLWYxIHwgcGFzdGUgLXNkICIsIiAtKQoKZWNobyAiJFBDSV9ERVZJQ0VTIgpJRlM9JywnIHJlYWQgLXJhIERFVklDRVMgPDw8ICIkUENJX0RFVklDRVMiCgpmb3IgVkZJT0RFVklDRSBpbiAiJHtERVZJQ0VTW0BdfSI7IGRvCiAgICBjZCAvc3lzL2J1cy9wY2kvZGV2aWNlcy8iJHtWRklPREVWSUNFfSIgfHwgY29udGludWUKCiAgICBpZiBbICEgLWYgImRyaXZlci91bmJpbmQiIF07IHRoZW4KICAgICAgICBlY2hvICJGaWxlIGRyaXZlci91bmJpbmQgbm90IGZvdW5kIGZvciAke1ZGSU9ERVZJQ0V9IgogICAgICAgIGV4aXQgMQogICAgZmkKCiAgICBpZiAhIGVjaG8gLW4gInZmaW8tcGNpIiA+IGRyaXZlcl9vdmVycmlkZTsgdGhlbgogICAgICAgIGVjaG8gIkNvdWxkIG5vdCB3cml0ZSB2ZmlvLXBjaSB0byBkcml2ZXJfb3ZlcnJpZGUiCiAgICAgICAgZXhpdCAxCiAgICBmaQoKICAgIGlmICEgWyAtZiBkcml2ZXIvdW5iaW5kIF0gJiYgZWNobyAtbiAiJFZGSU9ERVZJQ0UiID4gZHJpdmVyL3VuYmluZDsgdGhlbgogICAgICAgIGVjaG8gIkNvdWxkIG5vdCB3cml0ZSB0aGUgVkZJT0RFVklDRTogJHtWRklPREVWSUNFfSB0byBkcml2ZXIvdW5iaW5kIgogICAgICAgIGV4aXQgMQogICAgZmkKCmRvbmUK filesystem: root mode: 0744 path: /usr/local/bin/vhostuser - Apply the YAML by running the following
command:
oc apply -f 99-vhostuser-bind.yaml - Create a YAML named
05-worker-aiu-kernel-vfiopci.yaml:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: kubernetes.io/arch: s390x machineconfiguration.openshift.io/role: spyre name: 05-worker-aiu-kernel-vfiopci spec: config: ignition: version: 3.4.0 storage: files: - contents: compression: gzip source: data:;base64,H4sIAAAAAAAC/2TMQQrDIBCF4X1O4QFaUBqMFDzLYOsUHqgjGRvw9oVCFyW7B+/nkz4gTc3xglz7EwZZo7NuvVuftstvheUUckuPwqQ75Iju7ydIrW8as7MzuSbiNvZJBRUj+s3ZEPx6FjP0SyIXpnyLk3X5BAAA//9a8ioOoAAAAA== mode: 420 path: /etc/modprobe.d/vfio-pci.conf - contents: compression: "" source: data:,vfio-pci%0Avfio_iommu_type1%0A mode: 420 path: /etc/modules-load.d/vfio-pci.conf - contents: compression: "" source: data:;base64,W2NyaW8ucnVudGltZV0KZGVmYXVsdF91bGltaXRzID0gWwogICJtZW1sb2NrPS0xOi0xIgpdCg== mode: 420 path: /etc/crio/crio.conf.d/10-custom - contents: compression: "" source: data:,SUBSYSTEM%3D%3D%22vfio%22%2C%20MODE%3D%220666%22%0A mode: 420 path: /etc/udev/rules.d/90-vfio-3.rules - contents: compression: "" source: data:,%40sentient%20-%20memlock%20134217728%0A mode: 420 path: /etc/security/limits.d/memlock.conf
Apply SELinux Policy (for Spyre Operator version above 1.1.0)
You must apply the SELinux policy MachineConfig for the device plug-in to work properly from the root of the aiu-operator repository.
- Create a YAML named
50-spyre-device-plugin-selinux-minimal:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: spyre name: 50-spyre-device-plugin-selinux-minimal spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,bW9kdWxlIHNweXJlX2RldmljZV9wbHVnaW5fbWluaW1hbCAxLjA7CgpyZXF1aXJlIHsKICAgIHR5cGUgY29udGFpbmVyX3Q7CiAgICB0eXBlIGNvbnRhaW5lcl9ydW50aW1lX3Q7CiAgICB0eXBlIGNvbnRhaW5lcl92YXJfcnVuX3Q7CiAgICBjbGFzcyB1bml4X3N0cmVhbV9zb2NrZXQgY29ubmVjdHRvOwogICAgY2xhc3Mgc29ja19maWxlIHdyaXRlOwp9CgojIEdyYW50IE9OTFkgdGhlIHNwZWNpZmljIHBlcm1pc3Npb25zIG5lZWRlZCBmb3IgQ1JJLU8gY29tbXVuaWNhdGlvbgphbGxvdyBjb250YWluZXJfdCBjb250YWluZXJfcnVudGltZV90OnVuaXhfc3RyZWFtX3NvY2tldCBjb25uZWN0dG87CmFsbG93IGNvbnRhaW5lcl90IGNvbnRhaW5lcl92YXJfcnVuX3Q6c29ja19maWxlIHdyaXRlOwo= mode: 0644 path: /etc/selinux/spyre_device_plugin_minimal.te systemd: units: - contents: | [Unit] Description=Install minimal SELinux policy for spyre device plugin After=multi-user.target [Service] Type=oneshot ExecStartPre=/bin/bash -c 'if [ ! -f /etc/selinux/spyre_device_plugin_minimal.pp ]; then checkmodule -M -m -o /etc/selinux/spyre_device_plugin_minimal.mod /etc/selinux/spyre_device_plugin_minimal.te && semodule_package -o /etc/selinux/spyre_device_plugin_minimal.pp -m /etc/selinux/spyre_device_plugin_minimal.mod; fi' ExecStart=/bin/bash -c 'semodule -i /etc/selinux/spyre_device_plugin_minimal.pp || true' RemainAfterExit=true [Install] WantedBy=multi-user.target enabled: true name: install-spyre-selinux-minimal-policy.service - contents: | [Unit] Description=Setup device plugin directories with permissions and SELinux context After=network-online.target Before=kubelet.service [Service] Type=oneshot # Fix kubelet directory permissions for device plugin socket operations ExecStart=/usr/bin/chmod 770 /var/lib/kubelet/plugins_registry ExecStart=/usr/bin/chmod 770 /var/lib/kubelet/device-plugins # delete device plugin directories before creating it ExecStart=/usr/bin/rm -f /usr/local/etc/device-plugins/complete ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/spyre-config ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/spyre-metrics ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/spyre-sockets ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/metadata # Create device plugin directories ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/spyre-config ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/spyre-metrics ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/spyre-sockets ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/metadata # Set permissions for group write access (device plugin runs as UID 1001, GID 0) ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/spyre-config ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/spyre-metrics ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/spyre-sockets ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/metadata # Fix SELinux context for container access ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/spyre-config ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/spyre-metrics ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/spyre-sockets ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/metadata RemainAfterExit=true [Install] WantedBy=multi-user.target enabled: true name: setup-device-plugin-directories.service - Apply the YAML by running following command:
oc apply -f 50-spyre-device-plugin-selinux-minimal.yamlNote:- When applying machine config compute nodes become unschedulable.
- In a single-node compute cluster, do not taint the single compute node.
- Tainting the compute node can block workload scheduling and leave cluster workloads pending.
- You can taint the compute node by running the following:
command:
oc adm taint nodes <node-name> ibm.com/spyre=:NoSchedule - Applying the MachineConfig makes the node temporarily NotReady and unschedulable for about 10–15 minutes, so workloads might be evicted or stay pending.
- Ensure that you wait until the node returns to a Ready state and scheduling is re-enabled before proceeding with further steps.