Well hello again! Today, we're going to take our first dip into Power Firmware Maintenance. This is a fairly large topic so I'm going to break up these posts. This first one will focus on a general overview, the terminology, and how the release process works.
Power Firmware Maintenance Overview
There are millions and millions of lines of source code within (FSP, PHYP, PFW) and outside (HMC) your Power Systems. The picture below shows these different subsystems in more detail. Firmware updates are used for two main purposes.
1. Provide new function to the existing system or support a new system (Major Release)
2. Fix bugs (Service Pack)
- Release Level: A major, new function (software or hardware enablement), disruptive firmware upgrade.
- Service Pack: Typically a minor, bug fix, code update which is one of the following types:
- Concurrent: A code update that allows the operating system(s) running on the Power system to continue running while the update is installed and activated.
- Deferred: A code fix that is concurrent but does not activate on the system until it has been rebooted. Many times this type of fix is related to chip initialization changes.
- Partition Deferred: A code fix that is concurrent but not activated until the partition is rebooted.
- Disruptive: A code fix which requires the system to be rebooted during the code update process.
There are also a variety of severities associated with service packs. The most important one to look out for is HIPER (High Impact, Pervasive). One of the worst feelings we have in development is when we see a customers system hit a problem for an issue we already had released a fix for. Just because you haven't hit the issue yet doesn't mean you won't, so update! Please review all HIPER fixes and install any applicable ones as soon as possible. There are all sorts of ways to stay informed and be notified when they come out (see links below).
Under the Hood of Power Firmware Maintenance
The FSP is running an embedded operating system with some extraordinary (I'm sure you have no guesses on which area I work in :) power firmware applications running on it; one of which is an application responsible for handling code updates. A firmware update to the FSP originates from either the HMC (via ethernet over to the FSP), a USB stick, or via the OS running on the Power System. Either way, the firmware is sent to the FSP as a series of binary images. These binary images are written into a special location in the FSP filesystem. This special location is actually an alternate boot filesystem (i.e. the FSP has 2 boot locations to support firmware updates and redundancy). The firmware images are not only for the FSP, but also for PHYP and PFW, etc. The FSP is then rebooted and when it comes up it is using the new firmware. Depending on the type of firmware update (concurrent or non-concurrent) or upgrade, the system is either power cycled (disruptive) or remains booted throughout the entire firmware update process (concurrent or deferred). If there is an error during the firmware update, the FSP will automatically go back to the previous level of firmware. The technology involved in doing a concurrent code update to the PHYP firmware is pretty amazing stuff (i.e. your partition continues to run and operate as the firmware underneath it updates itself).
The HMC code update process is a bit simpler in that it's running on a more standard computer system. It has the same features as the FSP based code update process. You can always go back to the previous version if something bad happens during the update.
The Power Firmware team averages 2 major releases a year. Though most releases include new hardware enablement, the focus of one of the releases is new function. Since the application of Service Packs has impacts to our customers, we've been expanding the interval between service packs with the goal to release service packs for each release every six months (with one release, three months after the initial release availability).
The release process is similar between major releases and service pack updates. Major releases (especially ones which involve a new chip technology) have a lot more software simulation involved in them throughout the release process. Each developer has access to their own full system simulator in which they can run and validate the code as they write it. Once the hardware arrives, we shift into test phases, with the test phases becoming more complex and intricate as we get close to the release date. The service packs follow a similar version of this process. Release readiness is based, in part, by looking at previous releases (i.e. P6, P5, ...) and analyzing types and numbers of defects being found by the test teams as well as the interval between defect discovery.
Links and Tools
Fix Central: Location for all Power Firmware updates.
Fix Level Reccomendation Tool (FLRT): A tool which compares current levels of firmware and software and recommends appropriate levels to update to.
Firmware Overview and Recomendations Presentation (A presentation from our Product Engineering team with more details and links)