Topic
  • 7 replies
  • Latest Post - ‏2012-11-12T13:46:47Z by karelt
mikencsu
mikencsu
7 Posts

Pinned topic Crawler won't crawl

‏2012-08-31T15:12:12Z |
When attempting crawls of public web sites, the jobs keep getting "Killed". Viewing the data files shows an illegal access error even though the files are "open".
Updated on 2012-11-12T13:46:47Z at 2012-11-12T13:46:47Z by karelt
  • SystemAdmin
    SystemAdmin
    603 Posts

    Re: Crawler won't crawl

    ‏2012-09-06T20:49:07Z  
    Hi,

    Please attach the logs which contain the error.
    The parameters used for the crawler app and the oozie log (/var/ibm/biginsights/oozie/oozie.log).

    Thank you,

    Zach
  • karelt
    karelt
    4 Posts

    Re: Crawler won't crawl

    ‏2012-09-07T09:10:20Z  
    Hi,

    Please attach the logs which contain the error.
    The parameters used for the crawler app and the oozie log (/var/ibm/biginsights/oozie/oozie.log).

    Thank you,

    Zach
    Hi ZachZ,

    We are encountering the same problem. We first thought I had something to do with proxy settings but seeing the posts on this forum there might be something wrong with our installation as well. Attached you can find our oozie.log.
    Some feedback would be very appreciated.

    Kind Regards
    Karel

    Attachments

  • SystemAdmin
    SystemAdmin
    603 Posts

    Re: Crawler won't crawl

    ‏2012-09-18T17:08:53Z  
    • karelt
    • ‏2012-09-07T09:10:20Z
    Hi ZachZ,

    We are encountering the same problem. We first thought I had something to do with proxy settings but seeing the posts on this forum there might be something wrong with our installation as well. Attached you can find our oozie.log.
    Some feedback would be very appreciated.

    Kind Regards
    Karel
    Hi Karel,

    We are in the process of investigating the issue.
    Would you please provide the BigInsights version information?

    Thank you,

    Zach
  • karelt
    karelt
    4 Posts

    Re: Crawler won't crawl

    ‏2012-10-01T08:27:38Z  
    Hi Karel,

    We are in the process of investigating the issue.
    Would you please provide the BigInsights version information?

    Thank you,

    Zach
    Hi Zach,

    Thank you for your time to investigate.
    Our version of BigInsights is 1.4

    Kind Regards,
    Karel
  • SystemAdmin
    SystemAdmin
    603 Posts

    Re: Crawler won't crawl

    ‏2012-10-18T18:56:27Z  
    • karelt
    • ‏2012-10-01T08:27:38Z
    Hi Zach,

    Thank you for your time to investigate.
    Our version of BigInsights is 1.4

    Kind Regards,
    Karel
    Hi Karel,

    Sorry for taking a long time to get back to you.

    Please logon to Hadoop Map/Reduce Administration console (http://<namenode-host>:50030/jobtracker.jsp)

    Check the values for:

    Map Task Capacity
    Reduce Task Capacity

    1) If they are small, like 1 you can increase them by modifying and set it to larger number (i.e. 6)

    /opt/ibm/biginsights/hdm/hadoop-conf-staging/mapred-site.xml
    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>6</value>
    </property>

    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>6</value>

    2) Next, will need to run the following command to syncronize the configuration:

    $BIGINSIGHTS_HOME/bin/syncconf.sh hadoop

    3) Re-start Hadoop.

    See if this fixes the issue with web crawler.

    Thanks,

    Zach
  • karelt
    karelt
    4 Posts

    Re: Crawler won't crawl

    ‏2012-10-19T12:50:55Z  
    Hi Karel,

    Sorry for taking a long time to get back to you.

    Please logon to Hadoop Map/Reduce Administration console (http://<namenode-host>:50030/jobtracker.jsp)

    Check the values for:

    Map Task Capacity
    Reduce Task Capacity

    1) If they are small, like 1 you can increase them by modifying and set it to larger number (i.e. 6)

    /opt/ibm/biginsights/hdm/hadoop-conf-staging/mapred-site.xml
    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>6</value>
    </property>

    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>6</value>

    2) Next, will need to run the following command to syncronize the configuration:

    $BIGINSIGHTS_HOME/bin/syncconf.sh hadoop

    3) Re-start Hadoop.

    See if this fixes the issue with web crawler.

    Thanks,

    Zach
    thank you, we will test these settings
  • karelt
    karelt
    4 Posts

    Re: Crawler won't crawl

    ‏2012-11-12T13:46:47Z  
    • karelt
    • ‏2012-10-19T12:50:55Z
    thank you, we will test these settings
    Now the map and reduce jobs are working and generating an output. I guess the proxy problem is solved now. Thank you for your help!
    Next step: making sense of this data.