AIX Down Under
Just a reminder that AIX 5.3 goes out of support on 30 April 2012. That means the end of standard support. If you must keep your AIX 5.3 systems and can't upgrade it (you probably can, more easily than you realise), you could create a versioned WPAR running AIX 5.3 on an Power7 system within and AIX 7.1 logical partition. You can build it all with a mksysb restore.
For more details, see this document.
AnthonyEnglish 270000RKFN Marcações:  upgrade hardware_management_conso... hmc lshmc firmware lsmcode ptrconf fix_central update ibm_web_site ibm lssyscfg support aix 13.395 Visualizações
Where do you get HMC and firmware updates?
Recently some clients asked me to help them upgrade their HMC and system firmware. I was coming on site but I asked them to download the firmware updates ahead of time. They found them a little difficult to find. This post shows where (at the time of writing) you can get hold of HMC and firmware updates. This doesn't look at how to upgrade (or update) the HMC or system firmware. All I'm showing here is how to get hold of the firmware you're after.
Incidentally, if you prefer to upgrade your HMC over the network rather than burning it onto physical media and heading to a chilly data centre, join the club. You can skip this post and jump straight to Rob McNelly's AIXchange post on Remote HMC Upgrades. If you're not into remote network upgrades, or you want to upgrade system or device firmware as well, stay with us.
For those who are left behind
First, you'll need to log onto your HMC and check its version. You can see this in the HMC GUI or via the HMC command line using:
You'll find it helpful to have a valid managed system model number (the Power system or system p server model), even if you're not upgrading the managed system's firmware. You can do that by running the ptrconf command from any LPAR running AIX on a system which is managed by the HMC.
prtconf | more
If you want to update your managed system firmware, you'll need to find your current firmware version. Once again, from any AIX LPAR on the managed system, the command is
Getting down to business
Armed with this info, we're ready to scroll through a few IBM Support pages to find the way to the HMC firmware. If you want to skip the nice and fluffy intro, jump right ahead to the IBM Fix Central page, or scroll down below to the tour bypass for old and grumpy sys admins. If you're not that battle-scarred yet, stick with me for the scenic tour.
Tour through IBM support
First, we can go to the IBM web site:
Now dynamic organisations like IBM have a habit of changing and improving their web sites. These links are valid at the time of writing, but caveat emptor and all that.
Select Support & downloads and you can enter your list of products.
"What list of products?"
Thought you'd ask. If you type in "HMC" here, you're going to get a discouraging message:
When the support page asks for "products" it's really after a machine type or model number. You'll see I entered 9117-MMA, which is known in the business as the Power6 570. You'll need to enter your own.
You can get your managed System's model number in a few ways:
Did you note the Firmware and HMC updates link which I have conveniently circled? Click on it.
tour bypass for old and grumpy sysadmins:
IBM Fix Central
Welcome back to the scenic tour
Once you get to here, you still need to select your machine type. That's not the HMC. It's the managed system that the HMC manages - the server, if you like to call it that.
First you select your Product Group. If your managed system is a Power6 or Power7, select Systems then Power. If you're on something older than that, scroll further down in the drop down box to System p. I'm going to go with Power here, because I'm updating the HMC for a Power6 server.
Now you can choose from a few items, but the most common ones you'll need when you're on this web site are probably these:
Now it's time to select your machine type-model. Remember, you got this when you ran the AIX ptrconf command.
As I've selected Power as the Product Group, the choices are all Power6 or Power7 servers. If you don't see your model number here, you may have to go back to the Product Group and make a different selection.
Once I select the model - in my case 8203-E4A (the p6-520), and hit Continue I'm given a choice of
This next question is merely a test of your resolve:
They just want to see who's going to blink first.
You already ran the lsmcode -c and you know the version of firmware you're currently on. Mine was EL350_039, but yours may not start with EL.
Here you have to say the Installed lsmcode, in other words, the version of firmware that you are currently running:
If your managed system is an IVM and not managed by an HMC, then all firmwre updates are disruptive (system shutdown, I'm afraid). If your system firmware is jumping a major release, e.g. from EL340 something to EL350 something, that is also a disruptive update. But if you're going up within the same major release, and the managed system is managed by an HMC, this is a concurrent firmware update. No outage required.
Since the firmware can be installed concurrently, that's how I'd like to do it. At this point I download the system firmware and make it accessible for the HMC, for example on an FTP server which the HMC can access. There are other ways of doing this, such as getting the HMC to download the firmware updates directly from IBM. Now that you know where to get hold of the system firmware, you can install it from wherever you want.
and it's an update (within the same release), not an upgrade to a new release.
Back to the HMC
At last, we've got a list of System Firmware, HMC updates and Devices. There's the latest package - in this case EL350_085 and then there's the receommended package which is a little earlier: EL350_071. I'm actually more interested in the HMC updates, which, to my dismay, are V7R3.5.0M2. That's commonly known as 7.3.5 service pack 2, and I know I want to go to the latest HMC update, which is (as at 1 Feb 2011) V7R7.2.0M1 or 7.7.2 Service Pack 1.
I'll actually track back a few pages to where we were offered all firmware components. Instead, I'll just go for the HMC Firmware, which may show me some more recent HMC levels.
Now we can see some HMC firmware release levels. Unlike the lsmcode we saw before, this shows the levels available which I may want to install. It's not asking you what HMC level you're at.
And here we are:
You can see there is a Recovery Image and a Release Update Package. The recovery image description file explains:
HMC, System firmware, device firmware, OS
There is a helpful advisory page on determining if your server firmware level and HMC machine code level are compatible. It gives this warning:
AnthonyEnglish 270000RKFN Marcações:  dump alert service anonymous snap pax flood firmware ftp aix error /tmp/ibmsupt hardware ibmsupt ibm ecurep support snap.pax.z request snapcore emea 2 Comentários 37.456 Visualizações
When you place a support call with IBM, you're usually likely to be asked to run the snap command. If there was a core dump, you may also be asked to run the snapcore command. We're going to look at them both now.
snap to it
The snap command, according to the snap command documentation, "
I have heard that it originated with an IBMer who got fed up with asking 20 questions every time a support call came across the desk. When you run snap, you save some poor soul from asking you:
So you may be asked to run snap with some of its flags. I'm often asked to run a snap -gc. The -g gathers general system information, as well as some stuff which will reproduce the operating system environment. The -c compresses the output of the snap command into a file called snap.pax.Z which by default goes to a directory called /tmp/ibmsupt (This directory structure gets created if it doesn't exist).
You could try to send that snap.pax.Z file via email, but mail administrators being the sort of people they are, it will probably bounce. So you may find yourself renaming the file to PMR number (that's the number you were given when you placed the service request), followed by identifiers for your branch office, country and so on:
And if you ever wondered what your IBM country code is, you can find it here:
Just for curiosity, Australia's country code is 616.
Anonymous ftp with -A
These days you don't have to use ftp or email, but if you do want to use anonymous ftp, your ftp client may support anonymous login using the -A switch. That way you don't have to enter "anonymous" and some dummy password such as your email address:
Alright. The TV version of this is:
and send your /tmp/ibmsupt/snap.pax.Z to IBM with the right naming convention.
More details are available from http://www-01.ibm.com/software/support/exchangeinfo.html#ecurep
Last week AIX turned 25 years old. I, like AIX, have been around a little while. (I'm older than I look). And in all of those years I have never been asked to run the snapcore command. Until this week. The snapcore command, so its documentation tells us,
So IBM support asked for a snap.pax.Z, which I gathered using snap -gc. After they looked at it, they asked me to run not snap, but snapcore with two arguments: the core file (full path) and the program which they identified as the problem child, which in this case was the slp_svrreg command. Once again, you have to specify the full path to the offending command or program, unless it happens to be in the $PATH.
I'm not sure why the slp_svrreg command should have caused an OS crash, but that's why we have a support contract with IBM who can find out and make some recommendations.
Floods and support calls
Incidentally, I like to shock the call centre operator when placing a call by saying it's non-urgent. I like to work under the hope (or delusion) that other customers with non-urgent problems do the same, so that when I (or they) have a genuine production system down problem, it might get treated with some attention. In fact, the "non-urgent" calls usually get dealt with very promptly anyway.
Well, as you know, we've had serious flooding in parts of Australia since Boxing Day, so I was pleased that IBM had taken account of this by asking me up front "are you in a flood-affected area?" I was also pleased that the answer was no.
AnthonyEnglish 270000RKFN Marcações:  remote disaster_recovery aix christchurch dr support new_zealand nz earthquake 7.946 Visualizações
You would have heard by now of the earthquake in Christchurch, New Zealand on Saturday. It was 7.1 on the Richter scale and remarkably – I would say miraculously – there have been no deaths directly from the earthquake reported so far. As the Sydney Morning Herald reports,
Anywhere but here
Although Christchurch is the largest city on the South Island of New Zealand, from my experience, larger companies in NZ work primarily out of Auckland, the largest city in the North Island.
Very often NZ users connect to systems in Australia. Not too far away, as you can see on this map.NZ is two hours ahead of Sydney, where we often host systems and arrange occasional outages with hopefully not too much inconvenience to the Kiwis. They don't complain too much anyway. This remote hosting of systems has been crucial for some insurance companies working out of Australia and New Zealand, especially in the aftermath of the earthquake. Thankfully, there have been no deaths directly from the earthquake, but infrastructure has been hit hard. In an event like that, you'd be glad to know a company's main systems, and a reliable DR, are far away from you ... and from each other.
Over 50 of the buildings in the Christchurch Central Business District have been substantially damaged. Roads are often unpassable, water and electricity is still sporadic. With all of these factors, when you want to place a call to get some help from your insurance company, you hope their systems are not down. If your city has been hit by an earthquake, it's good to know that your company's Production systems are hosted somewhere else. A Disaster Recovery plan may not even need to be implemented. You call your insurance company and a friendly Aussie voice on the other end will hopefully be there to say Good day ("g'day"), ready to listen and help.
Some insurance companies I have worked with have taken great advantage of virtualisation. Comms links being what they are these days, you can easily consolidate and virtualise your systems to work out of a different country. When you have to place that sad call to say your house or business premises were wiped out by an earthquake, it’s a great relief to know that the insurers' systems are on the other side of the Tasman Sea, or “across the drink”, as we say.
“Good as gold”
Sometimes the work we do on AIX systems doesn’t seem so important. When a tragedy like this occurs, we realise that the work we did retiring a standalone box in a remote location, or migrating that web server and database to newer hardware in a secure data centre – means that real people can start the business of rebuilding their lives a little more easily. You also understand why the DR centre had to be distant from the prod site.
My prayers go to the many people who will have to rebuild their lives after this tragedy. The habitual cheerfulness and resilience of the Kiwis makes them a delight to work with. I am sure it's what will carry them through following the earthquake. As they like to say in NZ, I hope that before too long everything will be “good as gold.”
AnthonyEnglish 270000RKFN Marcações:  kernel support aix device sysdumpdev dumpcheck capture crash dump data operating_system system 9 Comentários 27.319 Visualizações
Care factor zero?
Now "automatically copying selected areas of kernel data" may not be right up there on the scale of life-changing news, so let's put it in layman's terms: you need to store a system dump in the event of a
First the good news ...
You almost certainly already have a system dump in place. Use the command sysdumpdev -l to list the primary dump device. If you prefer to avoid the command line you can use smitty dump.
In AIX 5.x the default dump device was /dev/hd6, except for servers or LPARs which at installation time had more than 4GB of memory. Then the default dump device was /dev/lg_dumplv. That's probably what you see as the primary dump device now.