Comments (11)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 Stian Lund commented Permalink

Thanks for this article, very insightful article on a not very well documented "feature". <div>&nbsp;</div> Apparently the JVM argument -Xgc:preferredHeapBase was introduced in Java6 SR7, before that there was no was to avoid these native OOM crashes apart from setting possibly higher Xmx. <div>&nbsp;</div> However - could it not be argued that this argument should be the default for JVMs with large heaps? Since otherwise it seems IBM is just saying "if you have a large heap, you will get random crashes unless you set this JVM argument". <div>&nbsp;</div> We've had some problems with this exact issue and have had long discussions with IBM java support for what is the best solution for these crashes. We have a lot of JVMs and having to set it for all of them is quite a bit of manual work. <div>&nbsp;</div> Do you know if this issue would affect the Hotspot JVM in the same way? If not, I think IBM should find a more permanent solution than having the customers set this argument, like changing the default behaviour of the IBM JVM.

2 kgibm commented Permalink

Hey Pathduck, I'm not sure I understand your question. It is (and has been) the default to move the heap base above 4GB if -Xmx is increased. The reason it doesn't always do this (i.e. relatively small heaps) is that there's a performance boost for keeping the heap under 4GB. One option would be to set the heap base to 4GB by default and then leave it to customers to explicitly lower the heap base if they want the performance boost (and have the native memory to spare). Is this what you were getting at?

 
As far as the HotSpot JVM, this page has some good details: https://wikis.oracle.com/display/HotSpotInternals/CompressedOops. The page says, "if java heap size < 4Gb and it can be moved into low virtual address space (below 4Gb) then compressed oops can be used without encoding/decoding." However, it also says, "The Hotspot VM's data structures to manage Java classes are not compressed." I expect what this means is that PermGen will be in the data segment which is probably above 4GB by default, and so presumably Oracle will not run into the same issues. However, I also expect this means that, ceteris paribus, IBM Java will be faster than HotSpot in this particular (64-bit, compressed native structures) dimension. Also note that I believe that I think that G1 significantly changes the design of PermGen, so I don't know if the above applies to it or not.

3 Stian Lund commented Permalink

Hi Kevin - I think that's what I was aiming at - if there is a way to avoid having to actually set this JVM argument. For those of use that have a lot of WAS instances, having to set this for every JVM, to avoid having it crash, is a lot of work. It would be better to have IBM Java behave in a way to not cause this crash by default, even if it means a small performance loss. Stability is in my world at least a lot more important than performance. <div>&nbsp;</div> <div>&nbsp;</div> I've actually opened a PMR to support to get some information on whether or not this has been solved in later versions of IBM Java; we're running 6.0 SR9. <div>&nbsp;</div> <div>&nbsp;</div> Apparently it's supposed to be fixed in: 6.0.1 SR6 and 7.0.0 SR5 . <br /> But not sure what "fixed" actually means ... <div>&nbsp;</div> Stian

4 kgibm commented Permalink

Hey Stian, A couple percentage point performance hit is pretty significant. In my experience, while this issue does affect some customers, it is not terribly common. How many times has this affected you? What operating system are you on? Is there anything particular about your applications that might cause them to use more classes or threads (or large stack sizes) or monitors? Maybe we can make the default smarter by taking into account some of these variables that make the problem more likely to occur.

5 Stian Lund commented Permalink

This has happened to us some times in production, maybe a 4-5 times. It's nasty because there is no way to know in advance when it happens and it cannot be monitored like one can for heap OOMs. <div>&nbsp;</div> I do think it's the applications that is to blame mostly, because they use a *lot* of JIT, we have increased the JIT cache a lot, and they have large heaps, 4-8GB. This is on RHEL5 were we've seen it, and also on only physical machines for some reason, then again the apps on VMware are a lot smaller. <div>&nbsp;</div> I found this reference on the support: <br /> http://www-01.ibm.com/support/docview.wss?uid=swg1IV37797 <br /> Apparently it's been fixed but possibly the fix is just the addition of the preferredHeapBase argument. <div>&nbsp;</div> It states among other things: "This problem can occur on any 64 bit JVM running in the compressed reference mode except 64 bit zLinux and 64 bit zOS JVM." So I guess memory handling is quite different on zOS for this to not be an issue. I was thinking maybe something of the same could be done for the x86 JVM, but not sure if possible. <div>&nbsp;</div> Stian

6 kgibm commented Permalink

Hey Stian, Thank you for that APAR link. I will discuss with the Java team and get back to you...

7 kgibm commented Permalink

Hi Stian, I reviewed APAR IV37797 with the Java development team and it is only for the case when the Java heap cannot be fit below 4GB, then it is placed above 4GB. Before this APAR, part of the Java heap might still be below 4GB, thus squeezing that space.

8 Stian Lund commented Permalink

Thanks for the update Kevin, that's cleared it up a bit. <br /> I opened a PMR earlier asking a similar question so I guess the answer coming from there will be about the same in regards to fixes for the latest FP for WAS7. <div>&nbsp;</div> Thanks for your help! <div>&nbsp;</div> cheers, <br /> Stian

9 kgibm commented Permalink

Hey Stian, Sure, thanks for your thoughts. I will start an internal discussion with the Java team to figure out if we should change the default to always start the Java heap outside 0-4GB (and let performance conscious users change that), or find another creative solution. If I have any updates, I'll post them as a comment here...

10 kgibm commented Permalink

Hey Stian, I will be submitting a feature request to change the default as you requested. If you (or anyone else reading this) can please contact me (kevin.grigorenko@us.ibm.com) with your company information and the business impact these crashes have had for you, that will help justify the change. Thanks!

11 kgibm commented Permalink

A feature request has been opened and is being reviewed.