Business hates risk. Everyone does. That's why we buy insurance, or faster, better, more reliable computers. It's also why we avoid spending. It's all about risk and how we manage it.
Now, if a business is running IBM Power Systems, there's a chance they've made a significant investment in getting that hardware in place. So why is it that so many businesses expose themselves to some very basic risks? Risks which are avoidable, and which - on the whole - don't require a huge investment of hardware.
Here are the seven top risks that I have identified, and why they're risks for a business. I think these are valuable talking points, but first let me give a small disclaimer.
Let me say from the outset that these aren't the only risks, and maybe I've overlooked some even more important and obvious ones. But these are the risks that keep appearing on the radar of businesses that are running Power.
I'd also point out that this is not a technical post as much as a business-focused post. The reason is that the CIO or IT manager or whoever controls the budget doesn't necessarily care too much about the technical side of things. For example, they probably won't know what redundant paths are for, but that decision-maker will understand if some key business reports are unavailable, for example.
Risk #7: Redundancy is broken
Redundancy means that there is not a single point of failure (SPOF). In layman's terms, that means that if this road gets blocked, there's another way we can get there. Redundancy for data can happen at the storage level (that's what the "R" in RAID is all about), but very often businesses are dependent on one cable, one switch, or one other component that is enough to bring the business to its knees.
Often the broken redundancy is simply a matter of bad configuration on the operating system, and can very easily be fixed. I'm thinking of Shared Ethernet Adapters, Link Aggregation, MPIO configuration and so on.
Risk #6: Poor Performance Tuning
System performance is always a tricky topic, because you never quite know what success looks like. But you certainly know when the system isn't performing properly. So, rather than asking: "is our system optimised?" it's better to look at this as a business problem. What are the pain points for users.
For example, if a company has to lock out users every month end because the reports are too slow, then it may be a simpler thing to fix than having to buy new hardware. I've seen where tuning the queue depth fixed exactly this problem. But waiting for the month end to fall over is not the time to tweak this, and having an IT specialist spending a few hours diagnosing and tuning this is less of a business impact than locking out hundreds of users every month.
Risk #5: No handoff
In smaller businesses, there is no CIO or IT manager role. It's one person who knows it all: the operating system, storage, network setup, backups, database administration and a whole lot more. On the one hand, this is very impressive. On the other, it's a huge exposure for the business. People get sick, or leave, or get moved to another role within the company, or have some other reason why they can't be available 24 X 7 for your business.
In cases of the one person who is the jack of all trades, the business is at risk. Isn't it easier to bring someone on board for some knowledge transfer, even if it's an external vendor?
Risk #4: No Disaster Recovery Plan
Probably my favourite question is to ask a CIO: "if your mission-critical IBM Power System was down for a couple of weeks, would that have any impact on your business?" DR is simply a way of protecting the business from this very obvious single point of failure.
If a business has its system down, and if that means that revenue stops, staff are laid off, stores and distribution centres are closed, and business reputation is in freefall, just how much money are they saving by not putting in a DR strategy?
Risk #3: A woefully unrealistic DR plan
The good thing about this risk is that you can't have it at the same time as risk #4. For risk #3, there is a DR plan, but it either hasn't been properly tested, or it doesn't give users access to the data in anything near an acceptable time for the business. The same goes for restoring data. Many companies assume (wrongly) that the restore process and the time it takes to get there is very fast. In practice, this is often not the case, and logistical reasons are often to blame.
Risk #2: No support agreements
It's unfortunately all too common to see that businesses assume they have hardware and software agreements in place, but they don't. This might be because the hardware is assumed to be under warranty, or (for software), there is no one really to call on when things head south. In situations like this, you need to know who to call, even if it's just for some high-level advice. Businesses shouldn't rely only on the good will of vendors.
Risk #1: No support (even though there's an agreement!)
This is, in a way, even worse for a business than not having support agreements in place, and for two reasons.
1. The business wrongly thinks they're running a supported configuration
2. The business is paying for a support agreement, but they are not, in fact, supported.
So, how does this come about?
Generally, by doing nothing. If your policy is to run n-1 (the penultimate version of the software), then if you wait long enough and do nothing, your n-1 policy will become an n-25 policy.
Some prime examples for IBM Power Systems are:
- System firmware is no longer current
- Virtual I/O Server is way out of date and no longer supported
- AIX or other operating systems are in an unsupported state
- applications have reached the end of life for their support
I'd say this is the highest risk because it's unfortunately very common, very expensive for the business (paying for support) and - if help is needed, it's expensive for the business if the support isn't available.
It's all about the risk
I'm not sure that these are the only risks, or even the greatest risks, that businesses need to think about when they're running IBM Power Systems. In fact, for most of the risks apply for other platforms as well. But my focus is on Power.
When you're running IBM Power Systems, it makes a lot of business sense to spend some time and money protecting your business from single points of failure, or going out of support. Most importantly, you don't want to risk an emergency upgrade.