This topic has been locked.
7 replies Latest Post - 2012-11-12T13:46:47Z by karelt
Pinned topic Crawler won't crawl
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
When attempting crawls of public web sites, the jobs keep getting "Killed". Viewing the data files shows an illegal access error even though the files are "open".
Updated on 2012-11-12T13:46:47Z at 2012-11-12T13:46:47Z by karelt
Re: Crawler won't crawl2012-09-07T09:10:20Z in response to SystemAdminHi ZachZ,
We are encountering the same problem. We first thought I had something to do with proxy settings but seeing the posts on this forum there might be something wrong with our installation as well. Attached you can find our oozie.log.
Some feedback would be very appreciated.
Re: Crawler won't crawl2012-10-18T18:56:27Z in response to kareltHi Karel,
Sorry for taking a long time to get back to you.
Please logon to Hadoop Map/Reduce Administration console (http://<namenode-host>:50030/jobtracker.jsp)
Check the values for:
Map Task Capacity
Reduce Task Capacity
1) If they are small, like 1 you can increase them by modifying and set it to larger number (i.e. 6)
2) Next, will need to run the following command to syncronize the configuration:
3) Re-start Hadoop.
See if this fixes the issue with web crawler.