IBM Support

Best Practice for Patching PCM Clusters with kusu-repopatch

Troubleshooting


Problem

Best Practice for Patching PCM Clusters with kusu-repopatch

Resolving The Problem

To keep the PCM cluster up to date with the online updates provided by the Operating System provider.

Platform Cluster Manager (PCM) contains the tool called kusu-repopatch (herein after repopatch) for patching cluster nodes with latest security updates from operating system’s online repository - Red Hat Network, for example. This tool can be used to patch all the Compute nodes as well as the Installer node, but can cause problems if not used correctly. The aim of this article is to document the best practice for using repopatch, and also to document common mistakes people make, and misunderstandings about how the tool works.

NOTE: repopatch performs a "bulk update" of your PCM repository, similar to using yum update to patch an individual RHEL server. There is no way to update individual packages with repopatch. repopatch can also be accomplished by using downloadonly plugin for yum

NOTE: If you are using Platform OFED kit and the version of PCM is < 2.1, then you cannot use repopatch to patch your cluster. In this case, you must patch individual packages following steps in "Best Practice for Patching Individual Packages in PCM" article. See section "Special Considerations if using Platform OFED kit" below, for more details.

Before Using repopatch

The following section describes the things that you should do before you ever use repopatch. This is a best practice advice only, and you may choose which of the following you want to do yourself.

Provision Compute Nodes Using Certified Cluster Stack

You should always deploy PCM cluster with the certified stack only. This means, if you are using PCM 2.0.1 RHEL, you should RHEL 5.5 to install the Installer node and to provision the compute nodes. Make sure that any configuration and/or customization of the compute nodes be done using the certified stack first. Only if everything is OK, proceed to apply the patches.

Snapshot the Original OS Repository

Before to using repopatch for the first time you should create a snapshot of the original OS repository. To do this you can use the repoman command like this:

# repoman -r rhel-5.4-x86_64 -s

This will create a snapshot repository with the date stamp in the name, like this:

# repoman -l
Repo name:      rhel-5.5-x86_64
Repository:     /depot/repos/1000
Installers:     172.20.1.1;172.27.1.23
Ostype:         rhel-5-x86_64
Kits:           base-2.0-1-x86_64, dell-vendor-5.5-1-x86_64,
               java-jre-1.5.0-16-x86_64, nagios-2.12-7-x86_64,
               PCM_GUI-2.0-1-x86_64, platform-hpc-2.0-3-x86_64,
               platform-isf-ac-1.0-1-x86_64, platform-lsf-7.0.6-1-x86_64,
               platform-mpi-7.1-1-x86_64, platform-ofed-1.5.1-1-x86_64,
               platform-rtm-2.0.1-1-x86_64, rhel-5.5-x86_64


Repo name:      rhel-5.5-x86_64-snapshot_Wed_Oct__6_17.04.18_2010
Repository:     /depot/repos/1001
Installers:     172.20.1.1;172.27.1.23
Ostype:         rhel-5-x86_64
Kits:           base-2.0-1-x86_64, dell-vendor-5.5-1-x86_64,
               java-jre-1.5.0-16-x86_64, nagios-2.12-7-x86_64,
               PCM_GUI-2.0-1-x86_64, platform-hpc-2.0-3-x86_64,
               platform-isf-ac-1.0-1-x86_64, platform-lsf-7.0.6-1-x86_64,
               platform-mpi-7.1-1-x86_64, platform-ofed-1.5.1-1-x86_64,
               platform-rtm-2.0.1-1-x86_64, rhel-5.5-x86_64
 

The above shows that you now have two repositories: the original one under /depot/repos/1000 and the new one under /depot/repos/1001.

Associate a Test Node Group with the Snapshot Repo

Next, you must associate at least one node group with the snapshot repo (id=1001). Use kusu-ngedit (hereinafter ngedit) tool to create a new node group from the default node group for packaged provisioning.

NOTE: If you do not associate a node group with a repository prior to patching it, you can easily corrupt your original kernel and initrd images used for PXE boot under /tftpboot/kusu/ directory!!

The following command will create node group called compute-patched-test that you can use to test the patches before applying them to your production repository.

# ngedit -c compute-rhel-5.5-x86_64 -n compute-patched-test

Next, use ngedit to associate this node group with snapshot repo 1001.

Register to Red Hat Network

If you are using RHEL system, you will need to register to Red Hat Network.

First, run rhn_register on your Installer node to register your cluster

Second, update your /opt/kusu/etc/updates.conf file and enter valid username/password to login to RHN as well as Server ID (a.k.a RHN System ID).

Use repopatch for the First Time

Now you are ready to patch your cluster. Start by running repopatch on your snapshot repository, and then testing the compute nodes using the patches.

Run repopatch

Use the -y option to say yes to kernel updates.

NOTE: If you don't specify -y option, then you will need to answer "no" to kernel updates manually before repopatch can finish.

# kusu-repopatch -y -r rhel-5.5-x86_64-snapshot_Wed_Oct__6_17.04.18_2010
Getting updates for rhel-5-x86_64. This may take a while...

This step may take several hours to complete, based on how many packages need to be patched and the speed of your network.

It is important to understand what happens during repopatch execution. The order of things that happen is as follows:


1. Repopatch scans the repository being patched, and checks if there are updates to these packages online.
2. Kernel package is treated specially - repopatch will prompt user if he wants to update the kernel RPMs; if the user says "no" the packages will be skipped - alternatively, the packages are included in the update. You can skip this check by automatically accepting kernel updates with -y flag to repopatch.
3. Repopatch builds a kit called rhel-updates and automatically adds it to the repository being patched, as well as it associates the component with the node groups using this repository.
4. If kernel packet is included in the update, kusu-driverpatch is called by repopatch to create a new PXE kernel and initrd images for the node group(s) using the new repository. The new files are stored in /tftpboot/kusu/ directory. The database is modified automatically to reflect that these node groups will be using new kernel and initrd images. If kernel packages are not updated, kusu-driverpatch is never called.
5. The patched repository is refreshed

Apply the Updates

At this point repopatch is done, and the only thing you need to do is "apply the updates" What this means is that repopatch is made sure that the patches are available in the repository for the node groups using the patched repository. But, the repopatch does not automatically trigger the updates to be installed on the nodes. To do this, you need to use cfmsync command manually like this:

# kusu-cfmsync -n compute-patched-test -u

This will update all the nodes in the compute-patched-test node group. If kernel packages were patched, then its better to simply re-install the nodes to boot with the new kernel. You can use the kusu-boothost command for that:

# kusu-boothost -n compute-patched-test -r


Test the Updates

Now you need to test that the patches did not break any of the functionality. Test the following:


1. Re-installation of the nodes works without problems. Nodes can use the new kernel and initrd PXE images without problems.
2. All the hardware is detected correctly as before the updates. No errors in output of dmesg.
3. Nodes are partitioned correctly as before.
4. User authentication, passwordless SSH, NFS, automounter are all OK.
5. Node perform as before or better over an extended period of time - at least 2 days without any problems.

Patch the Production Repository

If testing the patched repository is OK, then you can proceed to patch your production repository. In this example we assumed that your production repository will be the default PCM repository (id=1000). Now, lets assume that your production node group is a copy of the default compute-rhel-5.5-x86_64 node group that you crated right after cluster installation. Best practices is always to leave the default node group unmodified, but this procedure will work even if you use the default node group for production.

Add the Updates Kit to the Production Repository

Since you have verified that the updates do not break anything in PCM, you can use kusu-repoman tool to add the updates kit to the production repository.

# kusu-repoman -r rhel-5.5-x86_64 -a -k rhel-updates

Don't forget to refresh the repo.

# kusu-repoman -r rhel-5.5-x86_64 -u

Associate the Updates Kit with Production Node Group

Follow these steps to patch your production node group:


1. Use kusu-ngedit tool and select your production node group to Edit.
2. On the Boot Time Parameters screen, change the PXE kernel and initrd images. The node group must use the new files, created with repopatch.
3. On Components screen, select the rhel-updates component and select Next.
4. On Summary of Changes screen, Accept the changes
5. Say OK to perform cfmsync
6. Exit kusu-ngedit

NOTE: It is very important to select the new PXE kernel and initrd images in step 2. If you forget, you will corrupt the original PXE files in /tftpboot/kusu/ directory.

If you have said OK to kusu-cfmsync in step 5, the updated packages should already be installed on the compute nodes. Otherwise, you need to run kusu-cfmsync -u manually on this node group.

Finally, if you have updated the kernel package, you will need to re-install the nodes to run the new kernel.

Patch the Installer Node

The last thing remains to patch your Installer node. To do this, follow exactly the six steps from the last section. Again, you MUST change their PXE kernel and initrd for the Installer node group as well. Make sure they point to the new ones created by repopatch. Cfmsync will trigger the updates to be installed "on-the-fly". You need to reboot the Installer node to boot the new kernel.

NOTE: Before rebooting the Installer node, you may want to modify the /boot/grub.conf file to add the old kernel/initrd to the Grub menu just as a precaution.

Special Considerations if Using Platform OFED Kit

This section only applies if you are using PCM version < 2.1 and are using Platform OFED kit for InfiniBand support.

How to determine if you are using Platform OFED kit?

To determine if you are using Platform OFED kit in your cluster you can use kusu-kitops -l command. If you ARE using the kit, you will see something like this:

Kit:            platform-ofed
Kid:            9
Version:        1.5.2
Release:        1
Architecture:   x86_64
Description:    Platform OFED Kit
Supported OS:   centos-5-x86_64, rhel-5-x86_64,
               scientificlinux-5-x86_64, scientificlinuxcern-5-x86_64
OS Kit:         No
Removable:      Yes
Repositories:   rhel-5.5-x86_64
Node Groups:    installer-rhel-5.5-x86_64, compute-rhel-5.5-x86_64,
               lsf-management-candidate, compute-packaged-produciton

This output shows you tat Platform OFED kit is installed in your cluster and that it is used by a repository rhel-5.5-x86_64. This means that you cannot use repopatch to patch this repository. The reason is that repopatch will download updates for the packages provided by the Platform OFED kit. But, when you try to apply the updates, yum (which is called internally by cfmsync) will fail trying to update these packages. The reason is that component-platform-OFED, requires all OFED packages to be at a specific version, and does not allow updates. This is just how the Platform OFED kit was built to prevent any corruption of the OFED stack. Since yum cannot resolve all the dependencies, it will not apply any of the updates.

NOTE: You can still patch your cluster even if you are using Platform OFED kit, but you will need to download the patches individually.

How to Remove Platform Ofed Kit

Platform OFED kit is installed by default. So, it may be the case that the kit is installed, but you are not using InfiniBand in your cluster. In this case, it is recommended that you remove the Platform OFED kit from the repository you want to patch. To do this you will need to do the following


1. Remove any Platform OFED kit components from node groups using the repository you want to patch
2. Remove the Platform OFED kit from the repository you want to patch

Please refer to Platform Cluster Manager User Guide for details on how to remove kit components from node groups using kusu-ngedit command.

[{"Product":{"code":"SSDV85","label":"Platform Cluster Manager"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.2;2.0;2.0.1;2.1;2.1.0;3.0;3.0.1;3.1;3.2","Edition":"Enterprise;Standard;Workgroup","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2021

UID

isg3T1015954