AIX Down Under
Matching: outage X
AnthonyEnglish 270000RKFN Tags:  integrated_virtualization... sea management ethernet lpar virtual ivm id outage shared rob_mcnelly hmc v126.96.36.199 dlpar power server aix adapter ibm hardware_management_conso... migration 12,568 Views
In September 2009 Rob McNelly wrote on his AIXChange blog about Migrating from the IVM to the HMC. I have documented my own experience of this procedure. You can download it from here, at a very affordable price of USD 0.00 (no refunds).
The IVM or Integrated Virtualization Manager, is a browser interface to the VIO server on smaller systems, and it has HMC-like functionality, such as Dynamic LPAR, the ability to configure LPARs, stop and start them and so on.
The HMC (Hardware Management Console, as you know) is able to manage several physical servers and is mandatory for larger systems. It can also be used for smaller systems, and is a worthwhile investment, in my view, once you get beyond a single small server.
Two servers, two IVMs
I had a client who had bought a production Power6 550 and a P6-520 for Dev and Test. After some months of discussion, their Business Partner convinced them of the benefit of investing in an HMC to manage these two systems with their growing number of LPARs. The challenge was migrating each of the servers from being IVM-managed to the HMC. I have put together a document of my own experience of the migration. It doesn't attempt to be a step-by-step guide. More of a diary for my own benefit but you may find it useful.
Forward planning brings us unstuck
We thought we were being safe by getting some work done ahead of the outage time. We racked and cabled the HMC and put it on the network, in preparation for the scheduled outage two weeks hence. Problem was, no one told the HMC the planned go live date. To our surprise, it immediately discovered the two servers. At the same time, the HMC was reporting the two servers were in "Recovery" state, but it wouldn't take further control of the systems or their LPARs until the outage which was scheduled for after a huge month end. The IVM had been effectively disabled, so any IVM-specific commands were out of bounds. No profile backups, no DLPAR, no shutdown and activation of LPARs was permitted, either from the IVM or from the HMC. Nothing would undo it - not even powering off and disconnecting the HMC from the network.
We had a VIO server, but no IVM and no HMC that we could do anything useful with. It was the technological equivalent of a hung parliament.
All's well what ends well
In the end, it all worked, and the customer has been running happily on the HMC for many months now. Still, it was a challenge. You can find my comments about the migration from IVM to HMC Migration - A Customer's Experience
Looking back, it was quite funny, I suppose. As long as you weren't me.
AnthonyEnglish 270000RKFN Tags:  powerha live lpm migrate cookbook high_availability aix hacmp workload_partition uptime migration wpar outage live_partition_mobility 2 Comments 11,791 Views
High Level high[-er] availabilty options
IBM HACMP (High Availability Cluster Multiprocessing) has been available since 1991. It was renamed to IBM PowerHA and, more recently PowerHA System Mirror for AIX. While studying up on this recently, I came across some excellent comparisons between PowerHA, Live Partition Mobility and migration of Worload Partitions (WPARs). This is a high-level comparison of how the three might help with managing outages and make for higher availability, even if they aren't all PowerHA.
LPM and HA
First, Live Partition Mobility (LPM). This is the facility which allows you to migrate a running partition with its applications from one physical server to another without disrupting services. The Redbook for IBM PowerVM Live Partition Mobility explains:
.Live Partition Mobility increases global availability, but it is not a high availability solution. It requires that both source and destination systems be operational and that the partition is not in a failed state. In addition, it does not monitor operating system and application state and it is, by default, a user-initiated action.So LPM is good if the outage is planned.
"Live" for a reason
The difference between HA and LPM came home to me recently when I was watching a presentation by an IBMer on PowerHA. Shawn Bodily has worked on HACMP/PowerHA for over ten years. He was on the team that wrote the excellent PowerHA for AIX Cookbookand he presented webinars (see below) over three days in July 2009 on PowerHA (as it was then called). In the second of those days, Shawn answers a question about LPM as an alternative to PowerHA. Here is my transcript of the relevant section:
"When we talk about clustering we talk about reducing planned outages and unplanned outages. Most people associate it [HACMP] with unplanned outages but by far, most downtime today is still planned maintenance and Live Partition Mobility is great for planned maintenance if it's hardware related. If I'm upgrading a server and people can do firmware updates dynamically [i.e. concurrently] and some people choose not to. If I'm doing something like that, Live Partition Mobility is great. Even most of the maintenance is software maintenance. If you have to upgrade your application, update AIX, Live Partition Mobility does nothing for you. Here's why: the fact you're running the exact same rootvg and application on another frame."WPARs for AIX update