My advice? Just keep hd6 as a default paging space, even if you have other paging spaces in other volume groups.
AIX Down Under
Some time ago I experimented with removing the default paging space /dev/hd6 from an LPAR and creating a separate volume group for paging space. This worked ... until I had to restore a mksysb backup, as you may recall from this blog post.
Well, what about creating a separate paging space in rootvg instead of in a separate volume group, and then removing hd6? Not exactly sure why you'd do that, but I suppose if you were to create a smaller paging space to replace hd6, in case it's been extended too much, that might be a reason. Whatever your reason for renaming, removing or retiring hd6, if you decide to do it, you should read technote T1010881.
Didn't read it did you? (Or if you did, you probably found it more interesting than coming back here). Well never mind. Here's the key sentence:
"Because certain scripts are currently hard-coded to activate /dev/hd6, it is recommended to create the new paging space under the same name."
If you change the paging space from hd6 to something else, you have to update /sbin/rc.boot which is a script. The thought of editing /sbin/rc.boot horrifies me. It's just inviting trouble. Do changes get retained when you upgrade the OS to a new release? I hope so, but I don't know. Still, if you do decide to edit the rc.boot file, then make sure you follow the instructions in that technote, or you could be risking a mksysb that can't be cloned or at the very least a system that doesn't activate paging space after the next reboot.
My advice? Just keep hd6 as a default paging space, even if you have other paging spaces in other volume groups.
If you're still keen on resizing or relocating the hd6 paging space, see this document.
AnthonyEnglish 270000RKFN Marcações:  logical_volume_manager exportvg umount mount lvm varyoffvg 2 Comentários 22.994 Visualizações
One of the great strengths of AIX is the Logical Volume Manager (LVM). It may not be completely foolproof, but it does provide quite a lot of protection from administrator errors.
Here are a few common LVM tasks that I've deliberately attempted to make a mess of. See how the LVM reacts
(Don't Try This at Home)
Let's start with three Physical Volumes. rootvg has one PV (booting from SAN) and datavg has two PVs.
hdisk0 00c5a47ecf9edd3e rootvg active
hdisk1 00c5a47ed021ea7d datavg active
hdisk2 00c5a47ef172e84d datavg active
So, let's try to do some damage:
import a volume group that is already active:
# importvg -y datavg hdisk1
0516-360 getvgname: The device name is already used; choose a
0516-776 importvg: Cannot import hdisk1 as datavg.
"Just a moment, please!"
If you try to remove a disk from the operating system Object Data Manager (ODM) while it's still in an active volume
group, you'll get this warning:
rmdev -dl hdisk1
Method error (/etc/methods/ucfgdevice):
0514-062 Cannot perform the requested function because the
specified device is busy.
What about trying to remove an active disk from a volume group?
reducevg datavg hdisk2
0516-016 ldeletepv: Cannot delete physical volume with allocated
partitions. Use either migratepv to move the partitions or
reducevg with the -d option to delete the partitions.
0516-884 reducevg: Unable to remove physical volume hdisk2.
"Sorry, I'm busy!"
Not getting very far, are we? Alright, let's try to export a volume group that's in use:
0516-764 exportvg: The volume group must be varied off
OK, I'll vary it off then:
0516-012 lvaryoffvg: Logical volume must be closed. If the logical volume contains a filesystem, the umount command will close the LV device.
So I have to unmount the file systems in the volume group first. But if they're in use, I'm also protected from doing damage:
umount: 0506-349 Cannot unmount /dev/lvmksysb: The requested resource is busy.
The order of exporting a volume group is:
I can think of many other instances where the LVM protects me from my own stupidity. I don't have to remember to increase a logical volume first before expanding a file system. If I do try to overallocate beyond the MAX Logical Partitions (shown as MAX LPs in the output of the lslv command), I get an error.
Relax - this is AIX!
Good software should protect you (at least a little) from glaringly dumb mistakes. The AIX Logical Volume Manager does that. This can give you a little more confidence when you're breaking the mirror of a logical volume or adding the wrong PV to a volume group. You can still do damage (like running rm /dev/* - but don't say you saw it here), and there are in some cases flags which allow you to force the action, but the very existence of those flags are part of the checks and balances. That means you can be a bit more relaxed than you might have to be on other platforms.
The LVM is another reason why AIX is a true enterprise operating system.
AnthonyEnglish 270000RKFN 6.101 Visualizações
There are no coincidences.
Whenever I hear of any sort of problem at all, I wait to see if an entirely unrelated problem is reported soon after. This is especially worthwhile for performance issues, or when something suddenly stops working.
I came across a site where a printer was down. Within half an hour a controller error was reported from a disk subsystem which wasn't used by the system that had the printer go down. Coincidence? I was suspicious.
Temporary problem: permanent solution
We soon found out that the printer was down because all printers on the virutal machine (the LPAR) were unable to print And that was because you couldn't create any new files in /tmp. You could update existing files. This indicated the file system inode map was corrupted. That would require an fsck on /tmp, which would need a reboot of the LPAR.
We could fix the /tmp but that still didn't address the cause of what made the file system corrupt in the first place. Usually that indicates a link to the storage had been interrupted or the storage subsystem itself was damaged.
Sure enough, we soon found that many other AIX virtual machines and other Power or Wintel systems had storage errors at about the same time. Most of them had redundancy to the disk subsystems, but the ones that didn't have were the ones that suffered some sort of data or OS impact, such as the corrupt /tmp file system.
As it turned out, a SAN switch module had rebooted itself and was the cause of all the problems from disk subsystem error to file system corruption to printers being down. There was some work involved in repairing the switch module and removing it as a single point of failure. And it all began with the report that a printer was down.
AnthonyEnglish 270000RKFN Marcações:  vios thin_provisioning 2 Comentários 9.896 Visualizações
Nigel Griffiths (of IBM Power Systems Advanced Technology Support, Europe - but he is also known as Mr Nmon) gave a great presentation on Shared Storage Pools at Copenhagen. This was also presented in the excellent Webinar Series on Power Systems Virtualisation.
Here's a small extract on why you would use Shared Storage Pools:
In smaller companies, there is no SAN team. The server guys are the SAN guys. The idea of Shared Storage Pools is to allocated LUNs to the VIOS(s) and a single VIOS command can allocate space to a Virtual Server (an LPAR). You can run a single command using cfgassist (VIOS's smitty) to allocate disk space. You can even use the Virtual Storage Management feature in the HMC GUI.
You can replay Nigel's webninar presentation in WMV format and download the presentation materials.
In the two years since this post was first written, there have been a number of other presentations on Shared Storage Pools. You can get hold of these from the Webinar series linked to in this post. (Links have been updated).
Whenever I see a script with a five or ten minute wait in it, I ask myself what it's waiting for. I recently saw a script which started a database, waited 10 minutes (using sleep 600), then started an app server (sleep 600), then started another application (sleep 600, then exit!). That script gets a lot of sleep.
The reason behind the delays was to wait for the previous process to complete. But that could be confirmed in a few ways. Ordinarily, the previous command will have an exit code which you can test for, or you could use the Logical AND (&&) operator.
cmd1 && cmd2
This means only run cmd2 if cmd1 completes successfully (exit code 0).
If the first process is run in background, then you may have reason to delay before proceeding to the next process. But a ten minute wait is no guarantee that the first process completed successfully, so it would be smarter to use a shorter delay, or even set up a loop which checks periodically that the first process has completed.
In the example script I gave above, the procedure delayed 600 seconds (that's 10 minutes if you're not a mathematician) , then 600 seconds then 600 seconds again, or 30 minutes in total. When the script was first written it was meant for a single database. But then the script was used for two other databases, and soon enough a restart would take 90 minutes out of an outage window.
If you really must do a sleep in your script, set the time as a variable in the start of the script or passed as an argument from the command line:
bigsleep=600 # Wait 10 mins for something (or nothing) to happen
Then you can do the sleep if you really need to:
If you've been following this blog for a while, you might remember a post on the Systems Management Interface Tool (SMIT), better known as SMITTY to his friends. That post covered the frequently-asked question about the difference between smit and smitty. It also provided some handy SMIT Keyboard shortcuts.
Well, here are some new keyboard shortcuts that should be real time savers for you. They're not strictly new, as they're about 25 years old, but some of them were new to me, and they may be to you. Some of them are real gems, such as the shortcuts to do with editing fields and moving to the top or bottom of the selection screen.
I've grouped the shortcuts into: Cursor movement, Selecting and editing, and Menu navigation / Miscellaneous. This is from a short article I wrote for PowerITPro. The PowerITPro AIX Tips & Tricks are currently published on the 2nd and 4th Tuesday of every month and you can subscribe to them here.
Here is a summary of the keyboard shortcuts. You can see the full article SMIT Keyboard Shortcuts on the PowerITPro web site.
It wasn't a hostile question from a bean counter looking to cut costs. It was from a visitor - a school student on work experience (a week or two of trying out life in "the real world"), and I was asked what I did. How could I answer that? "AIX"? Too vague and obscure for someone who's not in the industry. System administration? So does the guy who supports the laptops. How about "I manage stuff that no one understands"? How do you justify your existence in 30 seconds?
I looked over my day, and thought about how to answer. I tried to make myself sound intelligent (but not overbearing), and to make the work sound important (but I always had time for a friendly question from a visitor on work experience) and even urgent (but not so much that I left the impression that I wasn't prepared for it). In the end, the visitor left as confused about what I did as I was.
After a few nanoseconds of deep reflection, I had to admit the day included a few less glamorous jobs. Like crushing cardboard boxes. Sure, they had contained the latest and greatest equipment, but the boxes did need to be crushed and carted downstairs.
What else? I thought of saying "troubleshooting" but then that sounded like there might be a lot of trouble to be shot. "What sort of trouble?"
So what exactly did I do? Can I justify my pay packet in 30 seconds?
How about these?
And I chatted to someone whose AIX 5.2 legacy system with dedicated adapters and processors is about to be extinguished and tried to assure that person I wasn't just hanging out for the hardware the very moment the LPAR gets shut down.
Whose idea was it anyway?
I'm not a conspiracy theorist (usually) but I have to wonder whose idea it was to go asking people throughout the organisation what they did. I mean, techies like me are either going to be stumped for a way of summarising the job or enable the verbose flag and then bury the questioner with detail. If I was looking after the organisation's social media strategy I would have been ready for the question and I could tweet the answer, but instead I'm a techie.
When it comes to giving your job description out, it may not be all that exciting, at least in parts. It's hard to make it sound interesting, essential to the organisation, unique and yet not mysterious or suitable for eccentrics. Perhaps I should have stuck with the troubleshooting answer. Or maybe the cardboard box one.
If you mix production and non-prod LPARs on the same managed system, you may be worried that the non-prod LPARs could chew up much-needed production processing juice. What are your options here? Here are some common solutions:
Before looking at the Implicit capping, here are some of my thoughts on the first two options:
Separate shared processor pools
Separate shared processor pools are not available on all systems, And they do also isolate the LPARs to their own shared processor pool, which is (if you'll pardon the momentary escape into a sporting analogy) a bit like a hard wall separating the lanes in a swimming pool. The two pools are separate, which can be a bit restrictive, especially if you're on the crowded side of the wall.
Capping of the non-prod LPARs
This is like a wall in the swimming pool for the non-prod swimmers. The prod swimmers are allowed to swim in and out of the non-prod area, but the non-prod guys have to stay where they are. Good for prod. If you prefer, here's a back yard analogy: you are allowed to wander into your neighbour's yard when you're pressed for space, but your neighbour isn't allowed into yours.
Now the let's see what the Redpiece says about implicit capping.
Implicit capping of non-prod LPARs
First, a word about using DLPAR:
It's good to have the dynamic LPAR option, but as it requires user intervention or at least scheduling of the DLPAR operation (not available on all systems), you're better to leave it to the hypervisor.
So here's the suggestion:
The number of VCPUs [they're referring to Virtual Processors] that are available to a partition limits the number of whole physical CPUs that this LPAR can address, because one virtual CPU represents at maximum one physical CPU. The LPAR is therefore effectively capped at a maximum physical processor utilization that is equal to its number of configured VCPUs.
The configured VCPUs is the entitlement for Virtual Processors.
Here's what they see about the LPAR weight:
The advantage of implicit capping versus explicit capping is:
I'll have to think through this but there's a lot of very helpful information on virtualisation in this SAP Applications on IBM PowerVM Redpiece. It's worth reading even if you're not running SAP systems. Be sure to read the Micropartitioning design option in section 4.2.
The team from IBM in the UK have been running some great webinars, and the next one promises to be a beauty. It's on Virtualisation (British spelling) Best Practices.
It's true in philosophy, it's true in maths (British usage), it's true in taking directions and it's true in setting up your virtualised environment. It's much easier to use good planning before implementation than to have to untangle everything afterward.
This webinar promises to cover how to:
The webinar is scheduled for Sept 14th from 10:00 to 11:00 BST (UK time) and you can both register for the free webinar or download it afterwards by connecting to:
I'm looking forward to it.
If you thought IBM Watson winning Jeopardy! was just a gimmick, think again. There are people who have seen Watson in action and say: "I want one!" IBM has now established a Watson Solutions team.
I understand that Watson is going to be back in the spotlight very soon, with a re-broadcast of the Jeopardy shows in the week of 12 September.
It sounds like there's a lot more planned to apply Watson's technology in various industries.
I found a video of how Watson answers questions very interesting. As they say in the video:. "A computer understands code: ones and zeroes." So how does Watson do it? Have a look: