Comments (8)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 reganjeremy commented Permalink

I was able to successfully use the -qipa=level=0 to solve this issue. I ended up using the -O5 level for every component except two where I used this one in stead. The follow-up issues were that with O5 the build takes about 35 hours which is a bit high but not bad if you just make a performance build at the end of all other development activities (I guess maybe another UAT run or something if it is considered a high risk but there do not seem to be any ill effects). <div>&nbsp;</div> A main issue that I ran into was the use of the -qmkshrobj flag for the link step in stead of using makeC++SharedLib. In this case, I had problems because we use a "template"-style makefile that builds the commands for each component automatically: compilation and then the link step. In this case I found out I have to pass the library and includes exactly the same between compilation and linking or you have problems with template method instantiation. It will look like it works but at runtime if you have template methods that do static methods or stuff like that then those methods will return null pointers. <div>&nbsp;</div> Once I got the -qmkshrobj option working then I could finish the -O5 build. We are currently now running performance tests and seeing a small improvement over O3 but maybe not enough to justify the longer compile time. We are hoping to either use the Quantify tool or some of the run-time tuning stuff, or follow up with our IBM contact regarding kernel or scheduler tuning. <div>&nbsp;</div> I looked for a long time at the -qtempinc and -qtemplaterepository link options but for one the tempinc required me to change too many makefiles and was very long compile times, I expected to add an extra week of effort to convert our app to use that style of template compilation. The templaterepository I couldn't get to work on its own either and I think it will only work as it says in the migration guide that templaterepository will only work if tempinc and notempinc BOTH already work. Also, perhaps you can respond as to whether or not it is really true that -qtemplaterepository negates the use of -qipa and how does this affect O4 and O5? I am very interested in this piece of information.<br /> Thanks for your articles, very informative.<br /> Jeremy

2 Sean_Perry commented Permalink

Hi Jeremy.<div>&nbsp;</div> Let me try to address the issues you have seen. Firstly, you noted that you need to pass the same compilation options to the link step as for compilation. You don't have to always do this. You only need to do this if the link step will, or could, result in a recompilation of a source file. It appears that was happening for you. The -qtemplateregistry is one option that can cause compilation during the linking process. What options were you using? The options that "need" to be passed to the link step will depend on the organization of your source code. If an option must be specified (like some -D or -I option) then it should be included during the link step if that link step can cause a recompilation.<div>&nbsp;</div> You also mentioned about exploring the -qtempinc and -qtemplateregsitry options. These options are independent. I'll check out the manuals to make sure they don't say you can't use -qtemplateregistry unless the code compiles with -qtempinc and -qnotempinc. The manuals should have an example of how to organize your code so you can use -qtempinc, -qnotempinc, -qtemplateregistry or another compiler's implicit instantiation method. Most code that already compiles and links is ready to use -qtemplateregistry. I think you should be able to use -qtemplateregistry without making any source changes.<div>&nbsp;</div> You mentioned you are building shared libraries. These complicate managed template instantiation. If you don't share object files that contain template instantiations between shared libraries then you can use the -qtemplateregistry option and use a different registry for each shared library.<div>&nbsp;</div> Finally, You can can use -qtemplateregistry with -qipa/-O4/-O5.

3 Sean_Perry commented Permalink

Does your build work without -qtempinc or -qtemplateregistry? The first tricky issues I see is making sure that your singletons that are template instantiations are only defined in one shared library. By defined, I am talking about at the object file level. You want to make sure the that the singletons are only defined and exported from one shared library and that the rest import that definition. Once all that is straight adding in -qtemplateregistry should be fairly simple. I would recommend one registry per shared library and to not use a registry when building the static libraries. You can add the -qfuncsect option to all the compile commands to help eliminate the extra duplicate instantiations in the static libraries.<div>&nbsp;</div> I'd recommend using -qtemplateregistry over -qtempinc. As you have noticed, -qtempinc requires source changes to use it and can have problems when linking. The -qtemplateregistry option can cause recompiles during the link step, but those are rarer and only happen when doing an incremental make.

4 reganjeremy commented Permalink

Hi Sean<div>&nbsp;</div> I think I can describe the code structure only to a point but I think enough for understanding. There is a root-level directory and then each folder in there is either a library or a server but each server can share many of the other libraries. So in the makefiles there is a generic "target" for each component and then a list of the libs it depends on. This list is used for both the includes in the compile step and the libs in the link step.<div>&nbsp;</div> When I used tempinc what I did was I specified this one area ../tempinc from each component so that the libs and the servers both shared the same tempinc stuff. I divided up our lowest level template class as a test into the .c and .h format; this is a persistent object factory class. It looked good to me and what tipped me off is how I determined that as I went up the components in the list, it was building onto the file and what would fail to compile was due to missing symbols from #include files that could be reached from a particular template class instantiation. The tempinc/pfactimf.o it was building got to be about 50 megs by the time I gave up. Another factor is that I had to change the makefiles in new ways because a side effect of that incremental building is how it now links against an object file that has symbols from every other template class instantiation before it. This is not desirable and when I explained this to one of our senior architects he said he did not like this approach and he actually knew it would involve changing the makefiles this way before I told him about it. He recommended trying one with -qnotempinc I think also and that one did not work either for me. <div>&nbsp;</div> What happens is this template class has static methods in it and they are using a class-local singleton to access stuff that would be instantiated with the class when a concrete implementation calls the method. In the case of -qnotempinc and with -qtemplateregistry, the servers would fail to start up giving an unresolved external for each of the template instantiations that it needed but couldn't find. I tried all sorts of permutations of changing if each component made its own templateregistry per folder, or using one big one in the source root, but they both yielded the same results. What I realized is that I had some other build where I had introduced using -qmkshrobj and I thought it was only working for libraries but not for servers. I found out though that it was working and this involves the least code changes of all, none! =) When I used tempinc I only got about 50% through the makefiles and it had already taken me days to get there. Not only that, but the link times were ridiculously getting longer and longer, up to an hour per component by the end.<div>&nbsp;</div> It is true that shared libraries complicates the template instantiation and you can see what I mean about how that is not a ready change here. I think they originally went with the shared memory model for two main reasons:<div>&nbsp;</div> 1) link time on solaris used to be astronomical even in its current state, we know compile times would be even longer now and we can not afford a 48-hour build on modern hardware when win32 build times are so short, usually 3 hours.<div>&nbsp;</div> 2) ease of maintenance for delivering new shared libs<div>&nbsp;</div> I think another purpose and maybe the most important at least to me is the possibility that the shared lib memory model should be faster on the AIX kernel. What I have been curious about as well is if all of the processes are sharing the libraries, then doesn't that mean there is contention there or does it generate a new copy? And if so, then isn't managing the copies of the code expensive?<div>&nbsp;</div> Thanks for your assistance,<br /> Jeremy

5 reganjeremy commented Permalink

Sorry one additional piece of information, the current AIX build times are about:<div>&nbsp;</div> No optimization:<div>&nbsp;</div> 3 hours<div>&nbsp;</div> O3 optimization, no qipa:<div>&nbsp;</div> 6 hours<div>&nbsp;</div> O5 optimization:<div>&nbsp;</div> 30 hours

6 reganjeremy commented Permalink

Yes, the build does work with neither qtempinc nor qtemplateregistry. The template instantiations are only defined in one lib but when i used tempinc it was putting the symbols to be exported for each template instantiation into the pfactimf.o file (the template class itself object file in the tempinc directory). <div>&nbsp;</div> <a class="jive-link-adddocument" href="http://www-949.ibm.com/software/rational/cafe/community-document-picker.jspa?communityID=&amp;subject=quote+sean+perry">quote sean perry</a><br /> I would recommend one registry per shared library and to not use a registry when building the static libraries. You can add the -qfuncsect option to all the compile commands to help eliminate the extra duplicate instantiations in the static libraries.<br /> <a class="jive-link-adddocument" href="http://www-949.ibm.com/software/rational/cafe/community-document-picker.jspa?communityID=&amp;subject=end+quote">end quote</a><div>&nbsp;</div> So you're saying if you use templateregistry you use it only one per component? What I'm concerned about is the examples and the way tempinc works would lead me to believe that for templateregistry you would have to do the same thing, so that the templateregistry has the symbols defined in there, and then you link your executables into that lib so that it can find them at runtime.<div>&nbsp;</div> I am okay without funcsect at least at first and I had tried to use -qtemplateregistry -qtemprecompile or what have you the first time I did it because that seemed like what was the "recommended" combination. You said "The -qtemplateregistry option can cause recompiles during the link step, but those are rarer and only happen when doing an incremental make." I am not sure what you mean by an incremental make here; I am doing an incremental build of components and then they sometimes need other components' template instantiations inside them. In the end the executables are built and must be able to find everything, we are using the runtime-linking option so I don't know about some things until I try to boot it. I can understand some motivation for moving to static libs but the executables would be very large, I think +100MB and link times are slow for that. Also I would have to figure out all the 3rd party linking again and the last few things I figured out were not too fun.<div>&nbsp;</div> I am going to try to create another build using templateregistry and just keep the current build as a backup of a working O5 build. Do you think though that this method of template instantiation will really have a measurable impact on performance over just using the best optimization?

7 Sean_Perry commented Permalink

I'm not sure what you mean by component. I think you mean a component is the output of a link step. If so, yes use one registry for each component. Think of the registry as a database that keeps track of where a template has been instantiated. To avoid unresolved symbols, all of the .o files the registry knows about (i.e. .o files that contain instantiations) would need to be included in the link step.<div>&nbsp;</div> You don't need to use the -qtemprecompile option. It is on by default. <div>&nbsp;</div> An incremental build is a build you do after changing a few files. Make will only recompile the source files that are needed to update the out of date .o files. <div>&nbsp;</div> The -qtemplateregistry option will help compile time more than run time at any level of optimization. The higher levels of optimization will be able to discover most of the duplication that -qtenplateregistry would find. It really comes down to either avoiding the extra code up front or letting the compiler discover it.

8 SiyuanZhang commented Permalink

The Chinese version of this blog can be found at: <div>&nbsp;</div> https://www.ibm.com/developerworks/mydeveloperworks/blogs/12bb75c9-dfec-42f5-8b55-b669cc56ad76/entry/toc_e7_b3_bb_e5_88_97_e4_b9_8b__e8_ae_a9xl__e7_bc_96_e8_af_91_e5_99_a8_e5_b8_ae_e6_82_a8_e5_a4_84_e7_90_86toc_e8_a1_a8_e6_ba_a2_e5_87_ba5?lang=en