Pinned topicPortal Search on System I - Very very slow
This question has been answered.
This question has not been answered yet.
I thought I'd try here first in the hope that I don't have to raise a PMR.
I've set up search across the WCM content on our Portal 18.104.22.168 System I environment and when I kick it off it seem to take forever to gather any documents and the processor load and paging is practically zero.
For example, the run I just finished I had set for a maximum of 4 hours and in that time it gathered 110 documents (out of about 10,000).
We do have errors in the logs as shown below:
11/27/09 3:21:06:478 CST 0000002f OutsideInConv E com.ibm.hrl.ws.util.OutsideInConverter convertInputStreamToByteWithTimeLimit: thread didn't fnish normaly... 11/27/09 3:21:06:481 CST 00000032 SystemOut O Stellent Conversion Error code: 11/27/09 3:21:06:481 CST 00000032 SystemOut O The process was interrupted from the Java side 11/27/09 3:26:06:517 CST 0000002f OutsideInConv E com.ibm.hrl.ws.util.OutsideInConverter convertInputStreamToByteWithTimeLimit: thread didn't fnish normaly... 11/27/09 3:26:06:519 CST 00000032 SystemOut O Stellent Conversion Error code: 11/27/09 3:26:06:519 CST 00000032 SystemOut O The process was interrupted from the Java side
I cannot see any other errors. I will probably try again shortly with tracing set to a higher level.
Has anyone seen this before? If so, what was the solution?
If you are not running into any memory issues then it does appears there may be some documents conversion issues .
My first guess would be that you do a thread dump (dmpjvm) of the portal server job. Ideally a couple of them a few mins apart during the crawling process.
Ideally you should see some "stuck" threads that may lead us to the problem
<property name="ODC_LOG" value="ODC_DEBUG"/>
Restart server after that.
It should generate more DCS logging which may be helpful ther is an issue with a particular document being converted
Yeah a pmr is the best route.. Certainly all the information I asked about will be the same for a pmr. It is clearly obvious that some aspect of document conversion is failing . Where exactly si difficult to ascertain. I would think if you supply all the information requested to the pmr it should take no more than a few days to determine where exactly the problem lies and a possible fix
Often the delay is with getting the necessary data
Actually the PMR is still ongoing. (if your from IBm then see here -> PMR 87628,999,616)
We had the Christmas break which didn't help.
We're going through the process now and it appears that crawling individual files is OK and after the first couple of hundred it appears to go faster too.
We have another issue in that there is 14,000+ documents (WCM HTML + file resources) but it only seems to crawl about 1400 or so.
I'll keep you posted on this once we have solved it.
The crawling part is OK now. I believe the search colection I initially created was somehow faulty and I've noticed that crawling is very sensitive to anything else going on at the same time.
Anyway, another PMR has been opened as the crawl is now not an issue (15,000 content items in just over 2 hours) but the index takes forever. Just under 9 hours to crawl and index 15,000 content items.
Currently I'm trying to work out what the average file size is as well as the maximum file size as the bigger the file, the longer it takes to index.