WebSphere Peformance - Alexandre Polozoff's Point of View
Matching: large X
With WebSphere Application Server v8 the IBM JRE provides a new garbage collection policy known as balanced. One should consider trying the balanced policy if running on the 64-bit JVM using a Java heap size over 4GB and still experiencing occasional long pauses with the gencon policy. It does impose a slight performance hit but based on how some applications are written or coded it may be necessary for the runtime operations team to try this policy in an attempt to avoid large pause times. The balanced policy can also take advantage of non-uniform memory access (NUMA) hardware architecture available on System x® and System p® using current versions of AIX®, Linux® or Windows®.
polozoff 110000N2A2 Tags:  machines sla small peformance planning nfr failover frame large capacity 2,470 Views
Yesterday at my "Performance Testing and Analysis" talk a question was asked is it better to have one large machine and virtualize the environment or to have lots of small machines?
Both strategies work. If deciding to go with large capacity frames then you want to have at least 3 frames. This way if one frame is taken out of service there are at least two other frames running. Otherwise, if one builds out only 2 large frames and one is taken out of service then the remaining frame becomes a single point of failure. I don't like SPOFs and so would have at least 3. This is if one frame can take the entire production load during the outage. That information has to be culled from the performance testing to see if there is enough capacity in one frame to carry the entire production load. If not then more likely than not there will need to be additional frames to be able to ensure that even if half the infrastructure is taken out of service the remaining frames can carry the entire production workload.
Likewise, having lots of smaller machines also works. Odds are less likely for a massive hardware outage with smaller machines so as one fails it can be safely taken out of service and replaced without impacting the production workload.
Which strategy is better? In my opinion they are both valid strategies as long as the proper capacity planning is conducted to ensure that when an outage occurs (and think worst case scenario here) that the remaining infrastructure is able to continue processing the production workload without impacting the SLA (Service Level Agreement including the non-functional requirements [i.e. response time, resource utilization, etc]). You do have a defined SLA, right?