Shared Storage Pools 2 - Thin provisioning, monitoring free space + Alerts
nagger 100000MRSJ Comments (2) Visits (20024)
Hi, I just release a fifth hands-on movie this month and on this interesting topic.
You can find the movie here: Shared Storage Pools 2 - Thin provisioning, monitoring free space + Alerts
Shared Storage Pool Thin provisioning is pretty cool and saves a lot of disk space. Effectively the 100's of GBs of unused disks space in 100's of Virtual Machines (LPARs) are bought together in one place in the Shared Storage Pool and then can be used for real. All this without the use of clever disk subsystems. The risk is that you use all the disk space of the pool and the next VM that tries to write to a new chuck of disk (a chunk is 64MB), fails to get it gets disk errors. Thus monitoring the free space and getting alerts is very important. But the alerts may not work as you expected. The "alert" command sets the threshold simply enough but then it gets complicated:
Here is an example of the Alert error log entry (it is not pretty):
Note: There are other log entries that are very similar but have "Storage Pool Up Event" and don't say "Threshold Exceeded." They are NOT free space low alerts.
I keep thinking of those jokes about a black cat, at night and dark coal bunker! I am sure the developers have not tried to hid the alert messages but that is what we have got - well hidden high impart warning messages hidden in the cluster.
Making the low free space issue more visible
So even if you know the free space has just gone below the limit it is a lot of work finding the Alert - not what I would call pro-actively letting you know there is a problem that needs fixing urgently. Various ideas are covered in the movie like some people escalate the VIOS system logs to other tools and could find the error condition there.
We could have cron based scripts regularly sending Shared Storage Pool free space stat email messages or only when the free space is low.
One alternative was using Systems Director 6.3 (ISD) - which I tried. I Discovered all four VIOS, gained access and ran Inventory - this only took a few minutes. Then I set the free space threshold percentage just below the current use and started the client VMs writing to new large files. Before I could change to the Systems Director browser to check it had already detected the alert event and reported it on the Problem panel. See below:
Alternatively we could use "cron" to regularly make checks and escalate email messages.
The movie also covers a script or two to reformat the "lssp" and "alert" command output to calculate percentages and amount of over-commit.
The Alert BUG
The next movie may be for SSP2 snapshots and perhaps Live Partition Mobility. I would like to encourage lots of people to try out Shared Storage Pools - I known the technology is not developed to help out poor hard working Power Systems techies like myself but it is so simple to use, quick compared to mucking around with LUNs and Zones and very flexible too as it opens up LPM to every LPAR I will create in 2012. Saves me time, saves disk space, makes for flexible VM environment - it is a win:win:win.