Topic
11 replies Latest Post - ‏2013-09-13T10:50:53Z by NWSD_Francesco_Animali
NWSD_Francesco_Animali
8 Posts
ACCEPTED ANSWER

Pinned topic running workbooks after removing some columns doesn't work

‏2013-09-09T13:45:25Z |

I am running the biginsights tutorials named "Analyzing big data BigSheets" where you import Watson_news and Watson_blogs files and revise them by removing some of the columns you don't need (http://pic.dhe.ibm.com/infocenter/bigins/v2r1/topic/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigSh.html).

After removing some 5 columns as per the tutorial, the workbook should be run in order to apply changes to the rest of the data. When pressing the run button to run the revised workbooks, the run command goes for ages and never progresses.

I am investigating this but would appreciate any help, mostly to understand where all the log files are stored and which process is in charge of running the workbooks.

 

 

  • kvstumph
    kvstumph
    8 Posts
    ACCEPTED ANSWER

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-09T14:28:37Z  in response to NWSD_Francesco_Animali

    Hello,

    If a BigSheets job run does not appear to proceed with any progress there could be different stages at which a failure could occur, and so different logs may be necessary to view.  The first thing I would check is the Hadoop JobTracker logs available on port 50030.  (i.e. if your root webconsole is http://<somedomain>:8080/data/html/index.html then the jobtracker can be accessed via http://<somedomain>:50030).  The JobTracker interface will be the fastest way to see whether or not your job has been scheduled or run; just search for a job bearing the name of your workbook.

    Aside from information from the Hadoop JobTracker you will also probably want to see the BigInsights web console logs, which log any events reported by the BigInsights web server.  Those logs are written to $BIGINSIGHTS_VAR/console/log, which is at /var/ibm/biginsights/console/log on a default install.  Inside that directory you want to find the log named console-web.log or any of the numbered versions of that log such as console-web.log.3.  Search through those to see if any error is written right around the time you clicked the run button.

    It does not pertain to this problem, but another log location for future reference is $BIGINSIGHTS_HOME/sheets/logs (i.e. /var/ibm/biginsights/sheets/logs), which contains logs relevant to the BigSheets sampling that occurs while editing a workbook.  If you encounter an error while defining new workbooks in the "Build new workbook" context, then those errors will be written to one of the logs inside $BIGINSIGHTS_HOME/sheets/logs.

    Thanks,

    Kevin

    • NWSD_Francesco_Animali
      8 Posts
      ACCEPTED ANSWER

      Re: running workbooks after removing some columns doesn't work

      ‏2013-09-10T10:44:15Z  in response to kvstumph

      Thanks Kevin, this info is very useful.

      Unfortunately there is no error is the $BIGINSIGHTS_VAR/sheets/log files. I have reproduced the problem by creating several types of child workbook (according to the tutorials but also by my own creativity) and none of them run.

      They just sit there "running" but not doing any progress. I have checked the ulimit for the biadmin user and the nofiles limit is 65k that looks pretty enough, as "lsof" shows a number of open files in the range of 9k.

      Any other idea where I could look at?

       

      thanks

      • NWSD_Francesco_Animali
        8 Posts
        ACCEPTED ANSWER

        Re: running workbooks after removing some columns doesn't work

        ‏2013-09-10T11:58:29Z  in response to NWSD_Francesco_Animali

        Hi Kevin, I have done more tries with a small input file. Now the file has got only 30 something lines  but still when I create a new child workbook it doesn't work. in the end it's not even a matter of re-organising the columns. The "Run" functionality on the BigSheets doesn't work for me. This is a shame as this is a virtual machine that is provided to external people as well, and I wonder if that works for anyone.

        • kvstumph
          kvstumph
          8 Posts
          ACCEPTED ANSWER

          Re: running workbooks after removing some columns doesn't work

          ‏2013-09-10T18:28:10Z  in response to NWSD_Francesco_Animali

          This is a hard one to debug remotely.  Two pieces of info that would help at this point are: 

          1) Get the status of the cluster.  There are two ways to do this.  You can first use the web console to look at the "Cluster Status" tab.  For this issue I think the task tracker status would be the most interesting.  One possible cause of this issue is that the JobTracker is unable to schedule the job to an available TaskTracker for some reason.  Can you report whether any Task trackers are listed as Running status in the grid on the right of the UI when you click the "Map/Reduce" node on the left panel of the "Cluster Status" tab.

          There is another way to get status by running status.sh from the command line of the web console machine.   The binary exists at:  $BIGINSIGHTS_HOME/bin/status.sh (i.e. /opt/ibm/biginsights/bin/status.sh).  The output of this command will inform you of the health of all the cluster services. 

          2) Run a sample application, such as word count, from the Applications tab to ensure that any job may run.  If word count also fails at least this narrows the issue scope conclusively down to the Hadoop layer.  In case any reader of the forum doesn't know how to run one of the sample applications, these are the steps to run the Word Count applications:

          a) Go to the Applications tab in the web console.

          b) If the application hasn't yet been deployed, then click the "Manage" link in the upper left.  Search for "Word Count" in the search bar and only that application should be shown in the grid on the right.

          c) Click on the entry for "Word Count" in the grid, and you should see the "Deploy" button enabled.  Click Deploy and accept all defaults for the dialog that pops up.

          d) Now that the app is deployed, click the Run tab.  Again search for "Word Count" in the search box and click the Word Count icon to show the application properties.

          e) Give a new Execution Name for the run.  For the Input path, choose a Directory which has any number of child text files.  For the purpose of this test probaly just add any small text file into a new directory and use that directory as input.  The Output path, is any directory.  I think the last segment of the Output path need not exist prior to the run. (i.e. if you select /user/biadmin/wcout/wcrun1, then /user/biadmin/wcout should exist, but the wcrun1 directory will be created during the run.)

          f) Run the application.  If it's taking too long, check the JobTracker at port 50030 for details.

          • NWSD_Francesco_Animali
            8 Posts
            ACCEPTED ANSWER

            Re: running workbooks after removing some columns doesn't work

            ‏2013-09-11T11:04:16Z  in response to kvstumph

            thanks for reply!

            I have run the suggested tests: WordCount job on a directory which contains a sample text file.

            The mapreduce job is in state "running " but it doesn't ptogress.

            1- the status of the cluster services is all green.

            2- the MapReduce service is running one task tracker. This is what is showed when I see the task tracker page:

            bivm Hadoop Map/Reduce Administration

            State: RUNNING
            Started: Wed Sep 11 10:30:35 BST 2013
            Version: 1.1.1, rf0025c9fd25730e3c1bfebceeeeb50d930b4fbaa
            Compiled: Fri Aug 9 17:06:14 PDT 2013 by jenkins
            Identifier: 201309111030
            SafeMode: OFF





            Cluster Summary (Heap Size is 23.85 MB/2.02 GB)

            Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
            2 0 6 1 2 0 0 0 2 1 3.00 0 0 0





            Scheduling Information

            Queue Name State Scheduling Information
            default running N/A

            Filter (Jobid, Priority, User, Name)
            Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields


            Running Jobs

            Jobid Started ▴ Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
            job_201309111030_0006 Wed Sep 11 10:38:07 BST 2013 NORMAL biadmin oozie:launcher:T=map-reduce:W=map-reduce-wf:A=wordcount:ID=0000002-130911103312455-oozie-biad-W 0.00%
             
            1 0 0.00%
             
            0 0 NA NA
            job_201309111030_0005 Wed Sep 11 10:36:42 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 0.00%
             
            2 0 0.00%
             
            0 0 NA NA
            job_201309111030_0004 Wed Sep 11 10:36:20 BST 2013 NORMAL biadmin PigLatin:ClusterAggregation.pig 0.00%
             
            1 0 0.00%
             
            1 0 NA NA
            job_201309111030_0003 Wed Sep 11 10:36:01 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000001-130911103312455-oozie-biad-W 0.00%
             
            1 0 0.00%
             
            0 0 NA NA
            job_201309111030_0001 Wed Sep 11 10:35:19 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000000-130911103312455-oozie-biad-W 0.00%
             
            1 0 0.00%
             
            0 0 NA NA

            Completed Jobs

            Jobid Started Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
            job_201309111030_0002 Wed Sep 11 10:35:49 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 100.00%
             
            1 1 100.00%
             
            1 1 NA NA

            Retired Jobs

            none

            The time now is already 11:39 that is one hour that the job has been submitted, but it shows no progress.

            This is the same behaviour of any other mapreduce task that I tried to run in the previous days.

             

            From the userlogs directory in $BIGINSIGHTS_VAR/hadoop/logs/userlogs?job....  the content of the syslog file is the following:

            [root@bivm attempt_201309111030_0006_r_000001_0]# cat syslog
            2013-09-11 10:38:09,660 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
            2013-09-11 10:38:09,955 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
            2013-09-11 10:38:09,958 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4b554b55
            2013-09-11 10:38:10,025 INFO org.apache.hadoop.mapred.Task: Task:attempt_201309111030_0006_r_000001_0 is done. And is in the process of commiting
            2013-09-11 10:38:10,062 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201309111030_0006_r_000001_0' done.
            2013-09-11 10:38:10,087 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
            2013-09-11 10:38:10,109 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
            2013-09-11 10:38:10,110 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName biadmin for UID 1001 from the native implementation
             

             

            hope this helps... thanks,

            • kvstumph
              kvstumph
              8 Posts
              ACCEPTED ANSWER

              Re: running workbooks after removing some columns doesn't work

              ‏2013-09-12T00:44:48Z  in response to NWSD_Francesco_Animali

              Hi Franceso, 

              These jobs called Downsampling and ClusterAggregation are monitoring jobs.  It is highly likely that monitoring is utilizing all the available slots on the system.  Can you please stop monitoring.  To stop monitoring go to the Cluster Status tab and select the Monitoring service.  You can then click the "Stop" button to disable it.  After its disabled please retry a BigSheets job or the Word Count app run.

              Thanks,

              Kevin

              • NWSD_Francesco_Animali
                8 Posts
                ACCEPTED ANSWER

                Re: running workbooks after removing some columns doesn't work

                ‏2013-09-12T10:13:11Z  in response to kvstumph

                Kevin,

                done more tests with monitoring switched off. The word count job still stuck in running state at 0%. Please see below the workflow log taken from the Application Status>Workflow Summary>Workflow log. You can see after about 6 minutes I issued a kill.
                 

                2013-09-12 10:24:13,379  INFO ActionStartXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Start action [0000003-130912101653919-oozie-biad-W@wordcount] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
                2013-09-12 10:24:14,849  WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] credentials is null for the action
                2013-09-12 10:24:14,849  WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Could not find credentials properties for: null
                2013-09-12 10:24:16,392  INFO MapReduceActionExecutor:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] checking action, external ID [job_201309121014_0004] status [RUNNING]
                2013-09-12 10:24:16,407  WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action status=RUNNING
                2013-09-12 10:24:16,437  WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action updated in DB!
                2013-09-12 10:30:39,887  INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W
                2013-09-12 10:30:40,919  WARN CoordActionUpdateXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
                2013-09-12 10:30:40,919  INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W
                2013-09-12 10:30:48,945  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] callback for action [0000003-130912101653919-oozie-biad-W@wordcount]
                2013-09-12 10:30:48,972 ERROR CompletedActionXCommand:536 - USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] XException, org.apache.oozie.command.CommandException: E0800: Action it is not running its in [KILLED] state, action [0000003-130912101653919-oozie-biad-W@wordcount]
                        at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
                        at org.apache.oozie.command.XCommand.call(XCommand.java:248)
                        at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
                        at java.lang.Thread.run(Thread.java:738)
                
                Updated on 2013-09-12T10:20:31Z at 2013-09-12T10:20:31Z by NWSD_Francesco_Animali
                • NWSD_Francesco_Animali
                  8 Posts
                  ACCEPTED ANSWER

                  Re: running workbooks after removing some columns doesn't work

                  ‏2013-09-12T10:21:28Z  in response to NWSD_Francesco_Animali

                  sorry if the log seems to be on one line, unfortunately I was unable to paste it right, although I inserted the carriage returns.

                  • NWSD_Francesco_Animali
                    8 Posts
                    ACCEPTED ANSWER

                    Re: running workbooks after removing some columns doesn't work

                    ‏2013-09-12T15:42:31Z  in response to NWSD_Francesco_Animali

                    Hi Kevin,

                     

                    have done more tests according to the tutorials, and I can confirm no MapReduce job can run on my installation of hadoop.

                    I have not changed the installation in any part, and I wonder whether this VirtualMachine has ever worked... any idea as to how  to check whether my user (biadmin) has all the permissions to run mapreduce jobs? :-(

                    • kvstumph
                      kvstumph
                      8 Posts
                      ACCEPTED ANSWER

                      Re: running workbooks after removing some columns doesn't work

                      ‏2013-09-13T06:12:45Z  in response to NWSD_Francesco_Animali

                      It looked suspicious that there might be a credential problem, but apparently Oozie will report "credentials is null for the action" even for non-authentication issues.

                      A colleague suggested to me that  you shut-down the entire product on the VM and then only restart the specific services you need.  I think it's worth a shot to try that.  Our theory is that although you shut down monitoring its likely that this only shut down the service, but didn't end the running Oozie monitoring jobs.  Shutting down and restarting hadoop will definitively clear up the task tracker slots if that is what is happening.

                      To clear all services on the cluster and start only those specific to bigsheets, please try this:

                      1) Stop the entire cluster.  On command line issue:  stop-all.sh (the script is at /opt/ibm/biginsights/bin/stop-all.sh).

                      2) Now start only these services: hadoop, derby and console.  (the command to start these is /opt/ibm/biginsights/bin/start.sh hadoop derby console).

                      3) With only these services started go back into the console and attempt to run a BigSheets child workbook.  I believe it will work now.

                      4) Assuming the job succeeds, now you can start the other services aside from monitoring using (/opt/ibm/biginsights/bin/start.sh zookeeper, hive, hbase, bigsql, oozie, orchestrator, httpfs).  I believe that is the full list of services aside from monitoring that may be started.

                      • NWSD_Francesco_Animali
                        8 Posts
                        ACCEPTED ANSWER

                        Re: running workbooks after removing some columns doesn't work

                        ‏2013-09-13T10:50:53Z  in response to kvstumph

                        bingo!

                        it was a resource problem.

                        In my many tests I had also increased the memory of the virtual machine from 4GB to almost 7GB, but the mapreduce jobs did not complete either. Probably that introduce a problem of resource contention with the host OS on my laptop, in fact I had to stop lotus notes and synphony to run the VM.

                        anyhow, I am happy to see that it was a "simple" problem, thanks so much for your help!

                        f