POWER8 Scale-Out models have Adapter Hot-Plug (Hot-Swap)
nagger 100000MRSJ Comments (5) Visits (13681)
In the past with the POWER7 and POWER7+ based machines at the lower end like the Power710 to Power740 Models, users had to shut down the whole machine to add, remove or replace an adapter safely. In fact, some machines would power off as you removed the lid.
This is no longer the case with the whole POWER8 range including the Scale-Out Models like the S822 and S824. These machines do not have the blind swap cassettes which allow the adapters to be removed from the rear of the machine. So you must have the cable management arm setup to allow the machine to be pulled out at the front on the rails. Then you raise the lid and you will find there are a number of items inside with Red Orange handles - these can be removed with the machine running. Things like the fans at the front, internal adapter cards and power supplies round the back.
This is covered in the below article. Note: Emily Barrett and Gareth Coates did all the work and screen captures. I (Nigel) just edited the results in to this blog.
POWER8 Concurrent Maintenance (hot swap or hot-plug)
By Emily Barrett & Gareth Coates - EMEA Power Advanced Technology Support
In the picture below, Emily is pointing to the rear of the POWER8 machine, a Model S824 and at the PCIe3 adapter that needs to be replaced. It is powered up with LED lights showing it is active.
The picture above right shows the adapter slot is the label "C5" - We need this information, when we access the Hardware Management Console (HMC) as we see below, to start a procedure on the adapter slot.
In the above screen shot of the HMC, first the Whole Machine is selected and then from the button the Properties panel selected, then
Before we go further, the adapter has to be removed from the Logical Partition (LPAR) also called a Virtual Machine (VM) so that it is not active and no deice drivers will be accessing it as it is being removed. In this case, the adapter has been assigned to the Virtual I/O Server. So to investigate, the "padmin" user of the VIOS runs the lsdev command below (the output fields have been limited to just those we really need).
Login to the VIOS as padmin user
$ lsdev -field name physloc | grep C5
Above we have the 2 ports of the adapter, a number of device drivers running features and a fibre channel disk called hdisk0.
To illustrate what happens if you do not remove all these resources - let us try the HMC procedure to remove the device.
On the HMC again and find the Field Replacement Unit (FRU) panel.
Whole Machine then:
Then as below, select the Main machine unit in which the adapter is found (the alternative would be a Remote I/O drawer unit) and then PCI Adapter Card
Then click on Next.
Above the "C5" adapter slot position is selected and then click Add. Then click on "Launch Procedure"
Above the HMC panel asks, if you want to do the operation now or at a later time. Then it wants you to check if any LPAR involved is running Linux - in this case it is the Virtual I/O Server (VIOS), so the answer is" no" then click on Next.
But below the HMC determines the PCIe3 adapter card is still in use and needs you do fix that before proceeding:
So next we go back to the VIOS and remove all the devices. This stops the VIOS from trying to use the device while or during the time the adapter card is removed or a replacement is in the adapter slot.
On the VIOS LPAR
status name parent connection
Enabled hdisk0 fscsi0 500507680210ac8c,0
Enabled hdisk0 fscsi0 500507680220ac8c,0
Enabled hdisk0 fscsi4 500507680210ac8c,0
Enabled hdisk0 fscsi4 500507680220ac8c,0
$ rmdev -dev fcs4 -recursive
$ rmdev -dev fcs5 -recursive
The -recursive option makes sure all related resources are also removed.
Above we are back on the HMC, select the "Try Again" is selected and click on Next. Read the two warning and information panels ...
Making sure we change the right adapter card in the right machine
Below the HMC via the service processor will switch on the Identify LED at both the Machine front Ops Panel and the read Adapter slot.
This is important - imagine if you had twenty racks of identical machines.
Next - back on the HMC it wants to check the Adapter width (single or double width). I have not seen a double width adapter for a long time. I guess it would want to power down both slots. So select "single slot" and Next.
Then the HMC will give you plenty of diagrams to find the correct adapter slot and how to unscrew the machine and pull it out on the rails - just Next through them if you are sure you know the adapter slot and how to do that:
Opening the box
So Emily gets busy with the screwdriver, releases the machine and slides it out on the rails - this assumes the cable management arm have been fitted at the back of the machine and ALL that cables are using the arm.
It is worth checking the cabling before pulling the POWER8 machine out at the front and breaking all the cables or damage an adapter card!
The HMC then tells you how to unclick and remove the machine lid:
You guessed it ... Emily does the same below and we can see the light path LED lit up next to the adapter - it would be a shame to yank the wrong one out at this point! Note the LED is a triangle "arrow" pointing at the adapter card.
Removing the adapter card
On the HMC, below, it describes how to remove the adapter:
Oddly the diagram is actually removing the adapter card from the same adapter slot as Emily. That is a nice touch from the hardware documentation editors. The picture shows a white click to release but it is really Red / Orange in colour. Emily carefully removes the adapter card - this is not a $10 PC Ethernet NIC adapter card but a rather expensive super fast enterprise level card so treat it with care. You can't quite see it but near Emily's elbow she is wearing an approved electrostatic earth strap that is also clipped to the frame of the rack.
At last the job is done - - - well - - - not quite. This task is to replace the adapter with another one (yes you might have guessed we are going to put the same adapter card back in the slot for this practice run). This could have been a Add adapter card or just plain Remove adapter card task but you get the idea, I am sure.
The HMC tells you how to finish off using the reverse procedure:
Emily points out the blue little arm clips you raise to release the drawer so it will slide back into the rack, in below pictures and the Adapter card is showing LEDs.
Below, the HMC plays it safe and waits a bit to let the Adapter power up and has completed it initialisation functions.
If we are happy, we can close the "Replace Hardware" Task as below:
Back on the Operating System, here the VIOS, we tell it to investigate its busses for new adapter cards.
Back on the VIOS as the padmin user
$ cfgdev $ lspath status name parent connection Enabled hdisk0 fscsi0 500507680210ac8c,0 Enabled hdisk0 fscsi0 500507680220ac8c,0 Enabled hdisk0 fscsi4 500507680210ac8c,0 Enabled hdisk0 fscsi4 500507680220ac8c,0 $
The Adapter card a associated devices are ready to use again. Once you have done this a couple of times it is very simple to follow but the HMC does guide you and makes sure you don't miss a step.
- - - The End - - -