I am looking for some high level direction on the use of PowerHA in our environment.
Since we have LPM in place, we are generally covered for scheduled downtimes, by moving LPARs across frames. Our organization is ok with downtimes for OS upgrades etc. so we are not trying to be so highly available that application/os upgrades can be done live.
In case of application failure, manually rebooting on the same server generally takes about the same time that it takes for PowerHA to start services on the passive node. Also, many a times our applications are such that they are not suitable to be auto-started as there are a lot of dependencies and sequence issues with other servers
In case of an LPAR crash, which is extremely rare in our case, rebooting the LPAR itself will again take simiilar amount of time, give and take a few minutes. This is acceptable with in our organization.
Given the above scenario, can the experts please chime in on whether it makes for a case to use PowerHA? Kindly provide a specific example, if possible, that can highlight the need for PowerHA vs. manual etc.
Thanks in advance and looking forward to your thoughts on this.
Pinned topic PowerHA vs. Live Partition Mobility vs. Manual failover
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2011-07-26T20:08:59Z at 2011-07-26T20:08:59Z by SystemAdmin
tony.evans 0600007X8X3 Posts
Re: PowerHA vs. Live Partition Mobility vs. Manual failover2011-07-26T07:47:09ZThis is the accepted answer. This is the accepted answer.How long does it take for some monitoring to notice your node is down, to getting it restarted at 3am? Or out of work hours?
PowerHA has diminishing returns when used with highly available hardware, but there are still times when it can react faster than manual intervention.
Last week we had a hardware failure in some pSeries hardware which resulted in half the LPARs being down. They would remain down until the hardware component was replaced.
PowerHA moved the service to another server in a different building in 4 minutes from the moment of the hardware failure. Four minutes is less time than it takes to raise an incident ticket covering the detail of the problem.
It depends on a lot of factors, your hardware, your organisation, your application profile. There is no simple, single answer.