Topic
7 replies Latest Post - ‏2012-11-12T13:46:47Z by karelt
mikencsu
mikencsu
6 Posts
ACCEPTED ANSWER

Pinned topic Crawler won't crawl

‏2012-08-31T15:12:12Z |
When attempting crawls of public web sites, the jobs keep getting "Killed". Viewing the data files shows an illegal access error even though the files are "open".
Updated on 2012-11-12T13:46:47Z at 2012-11-12T13:46:47Z by karelt
  • SystemAdmin
    SystemAdmin
    603 Posts
    ACCEPTED ANSWER

    Re: Crawler won't crawl

    ‏2012-09-06T20:49:07Z  in response to mikencsu
    Hi,

    Please attach the logs which contain the error.
    The parameters used for the crawler app and the oozie log (/var/ibm/biginsights/oozie/oozie.log).

    Thank you,

    Zach
    • karelt
      karelt
      4 Posts
      ACCEPTED ANSWER

      Re: Crawler won't crawl

      ‏2012-09-07T09:10:20Z  in response to SystemAdmin
      Hi ZachZ,

      We are encountering the same problem. We first thought I had something to do with proxy settings but seeing the posts on this forum there might be something wrong with our installation as well. Attached you can find our oozie.log.
      Some feedback would be very appreciated.

      Kind Regards
      Karel

      Attachments

      • SystemAdmin
        SystemAdmin
        603 Posts
        ACCEPTED ANSWER

        Re: Crawler won't crawl

        ‏2012-09-18T17:08:53Z  in response to karelt
        Hi Karel,

        We are in the process of investigating the issue.
        Would you please provide the BigInsights version information?

        Thank you,

        Zach
        • karelt
          karelt
          4 Posts
          ACCEPTED ANSWER

          Re: Crawler won't crawl

          ‏2012-10-01T08:27:38Z  in response to SystemAdmin
          Hi Zach,

          Thank you for your time to investigate.
          Our version of BigInsights is 1.4

          Kind Regards,
          Karel
          • SystemAdmin
            SystemAdmin
            603 Posts
            ACCEPTED ANSWER

            Re: Crawler won't crawl

            ‏2012-10-18T18:56:27Z  in response to karelt
            Hi Karel,

            Sorry for taking a long time to get back to you.

            Please logon to Hadoop Map/Reduce Administration console (http://<namenode-host>:50030/jobtracker.jsp)

            Check the values for:

            Map Task Capacity
            Reduce Task Capacity

            1) If they are small, like 1 you can increase them by modifying and set it to larger number (i.e. 6)

            /opt/ibm/biginsights/hdm/hadoop-conf-staging/mapred-site.xml
            <property>
            <name>mapred.tasktracker.map.tasks.maximum</name>
            <value>6</value>
            </property>

            <property>
            <name>mapred.tasktracker.reduce.tasks.maximum</name>
            <value>6</value>

            2) Next, will need to run the following command to syncronize the configuration:

            $BIGINSIGHTS_HOME/bin/syncconf.sh hadoop

            3) Re-start Hadoop.

            See if this fixes the issue with web crawler.

            Thanks,

            Zach
            • karelt
              karelt
              4 Posts
              ACCEPTED ANSWER

              Re: Crawler won't crawl

              ‏2012-10-19T12:50:55Z  in response to SystemAdmin
              thank you, we will test these settings
              • karelt
                karelt
                4 Posts
                ACCEPTED ANSWER

                Re: Crawler won't crawl

                ‏2012-11-12T13:46:47Z  in response to karelt
                Now the map and reduce jobs are working and generating an output. I guess the proxy problem is solved now. Thank you for your help!
                Next step: making sense of this data.