Topic
  • 13 replies
  • Latest Post - ‏2014-08-22T01:10:10Z by Bart Guisset
NWSD_Francesco_Animali
8 Posts

Pinned topic running workbooks after removing some columns doesn't work

‏2013-09-09T13:45:25Z |

I am running the biginsights tutorials named "Analyzing big data BigSheets" where you import Watson_news and Watson_blogs files and revise them by removing some of the columns you don't need (http://pic.dhe.ibm.com/infocenter/bigins/v2r1/topic/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigSh.html).

After removing some 5 columns as per the tutorial, the workbook should be run in order to apply changes to the rest of the data. When pressing the run button to run the revised workbooks, the run command goes for ages and never progresses.

I am investigating this but would appreciate any help, mostly to understand where all the log files are stored and which process is in charge of running the workbooks.

 

 

  • kvstumph
    kvstumph
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-09T14:28:37Z  

    Hello,

    If a BigSheets job run does not appear to proceed with any progress there could be different stages at which a failure could occur, and so different logs may be necessary to view.  The first thing I would check is the Hadoop JobTracker logs available on port 50030.  (i.e. if your root webconsole is http://<somedomain>:8080/data/html/index.html then the jobtracker can be accessed via http://<somedomain>:50030).  The JobTracker interface will be the fastest way to see whether or not your job has been scheduled or run; just search for a job bearing the name of your workbook.

    Aside from information from the Hadoop JobTracker you will also probably want to see the BigInsights web console logs, which log any events reported by the BigInsights web server.  Those logs are written to $BIGINSIGHTS_VAR/console/log, which is at /var/ibm/biginsights/console/log on a default install.  Inside that directory you want to find the log named console-web.log or any of the numbered versions of that log such as console-web.log.3.  Search through those to see if any error is written right around the time you clicked the run button.

    It does not pertain to this problem, but another log location for future reference is $BIGINSIGHTS_HOME/sheets/logs (i.e. /var/ibm/biginsights/sheets/logs), which contains logs relevant to the BigSheets sampling that occurs while editing a workbook.  If you encounter an error while defining new workbooks in the "Build new workbook" context, then those errors will be written to one of the logs inside $BIGINSIGHTS_HOME/sheets/logs.

    Thanks,

    Kevin

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-10T10:44:15Z  
    • kvstumph
    • ‏2013-09-09T14:28:37Z

    Hello,

    If a BigSheets job run does not appear to proceed with any progress there could be different stages at which a failure could occur, and so different logs may be necessary to view.  The first thing I would check is the Hadoop JobTracker logs available on port 50030.  (i.e. if your root webconsole is http://<somedomain>:8080/data/html/index.html then the jobtracker can be accessed via http://<somedomain>:50030).  The JobTracker interface will be the fastest way to see whether or not your job has been scheduled or run; just search for a job bearing the name of your workbook.

    Aside from information from the Hadoop JobTracker you will also probably want to see the BigInsights web console logs, which log any events reported by the BigInsights web server.  Those logs are written to $BIGINSIGHTS_VAR/console/log, which is at /var/ibm/biginsights/console/log on a default install.  Inside that directory you want to find the log named console-web.log or any of the numbered versions of that log such as console-web.log.3.  Search through those to see if any error is written right around the time you clicked the run button.

    It does not pertain to this problem, but another log location for future reference is $BIGINSIGHTS_HOME/sheets/logs (i.e. /var/ibm/biginsights/sheets/logs), which contains logs relevant to the BigSheets sampling that occurs while editing a workbook.  If you encounter an error while defining new workbooks in the "Build new workbook" context, then those errors will be written to one of the logs inside $BIGINSIGHTS_HOME/sheets/logs.

    Thanks,

    Kevin

    Thanks Kevin, this info is very useful.

    Unfortunately there is no error is the $BIGINSIGHTS_VAR/sheets/log files. I have reproduced the problem by creating several types of child workbook (according to the tutorials but also by my own creativity) and none of them run.

    They just sit there "running" but not doing any progress. I have checked the ulimit for the biadmin user and the nofiles limit is 65k that looks pretty enough, as "lsof" shows a number of open files in the range of 9k.

    Any other idea where I could look at?

     

    thanks

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-10T11:58:29Z  

    Thanks Kevin, this info is very useful.

    Unfortunately there is no error is the $BIGINSIGHTS_VAR/sheets/log files. I have reproduced the problem by creating several types of child workbook (according to the tutorials but also by my own creativity) and none of them run.

    They just sit there "running" but not doing any progress. I have checked the ulimit for the biadmin user and the nofiles limit is 65k that looks pretty enough, as "lsof" shows a number of open files in the range of 9k.

    Any other idea where I could look at?

     

    thanks

    Hi Kevin, I have done more tries with a small input file. Now the file has got only 30 something lines  but still when I create a new child workbook it doesn't work. in the end it's not even a matter of re-organising the columns. The "Run" functionality on the BigSheets doesn't work for me. This is a shame as this is a virtual machine that is provided to external people as well, and I wonder if that works for anyone.

  • kvstumph
    kvstumph
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-10T18:28:10Z  

    Hi Kevin, I have done more tries with a small input file. Now the file has got only 30 something lines  but still when I create a new child workbook it doesn't work. in the end it's not even a matter of re-organising the columns. The "Run" functionality on the BigSheets doesn't work for me. This is a shame as this is a virtual machine that is provided to external people as well, and I wonder if that works for anyone.

    This is a hard one to debug remotely.  Two pieces of info that would help at this point are: 

    1) Get the status of the cluster.  There are two ways to do this.  You can first use the web console to look at the "Cluster Status" tab.  For this issue I think the task tracker status would be the most interesting.  One possible cause of this issue is that the JobTracker is unable to schedule the job to an available TaskTracker for some reason.  Can you report whether any Task trackers are listed as Running status in the grid on the right of the UI when you click the "Map/Reduce" node on the left panel of the "Cluster Status" tab.

    There is another way to get status by running status.sh from the command line of the web console machine.   The binary exists at:  $BIGINSIGHTS_HOME/bin/status.sh (i.e. /opt/ibm/biginsights/bin/status.sh).  The output of this command will inform you of the health of all the cluster services. 

    2) Run a sample application, such as word count, from the Applications tab to ensure that any job may run.  If word count also fails at least this narrows the issue scope conclusively down to the Hadoop layer.  In case any reader of the forum doesn't know how to run one of the sample applications, these are the steps to run the Word Count applications:

    a) Go to the Applications tab in the web console.

    b) If the application hasn't yet been deployed, then click the "Manage" link in the upper left.  Search for "Word Count" in the search bar and only that application should be shown in the grid on the right.

    c) Click on the entry for "Word Count" in the grid, and you should see the "Deploy" button enabled.  Click Deploy and accept all defaults for the dialog that pops up.

    d) Now that the app is deployed, click the Run tab.  Again search for "Word Count" in the search box and click the Word Count icon to show the application properties.

    e) Give a new Execution Name for the run.  For the Input path, choose a Directory which has any number of child text files.  For the purpose of this test probaly just add any small text file into a new directory and use that directory as input.  The Output path, is any directory.  I think the last segment of the Output path need not exist prior to the run. (i.e. if you select /user/biadmin/wcout/wcrun1, then /user/biadmin/wcout should exist, but the wcrun1 directory will be created during the run.)

    f) Run the application.  If it's taking too long, check the JobTracker at port 50030 for details.

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-11T11:04:16Z  
    • kvstumph
    • ‏2013-09-10T18:28:10Z

    This is a hard one to debug remotely.  Two pieces of info that would help at this point are: 

    1) Get the status of the cluster.  There are two ways to do this.  You can first use the web console to look at the "Cluster Status" tab.  For this issue I think the task tracker status would be the most interesting.  One possible cause of this issue is that the JobTracker is unable to schedule the job to an available TaskTracker for some reason.  Can you report whether any Task trackers are listed as Running status in the grid on the right of the UI when you click the "Map/Reduce" node on the left panel of the "Cluster Status" tab.

    There is another way to get status by running status.sh from the command line of the web console machine.   The binary exists at:  $BIGINSIGHTS_HOME/bin/status.sh (i.e. /opt/ibm/biginsights/bin/status.sh).  The output of this command will inform you of the health of all the cluster services. 

    2) Run a sample application, such as word count, from the Applications tab to ensure that any job may run.  If word count also fails at least this narrows the issue scope conclusively down to the Hadoop layer.  In case any reader of the forum doesn't know how to run one of the sample applications, these are the steps to run the Word Count applications:

    a) Go to the Applications tab in the web console.

    b) If the application hasn't yet been deployed, then click the "Manage" link in the upper left.  Search for "Word Count" in the search bar and only that application should be shown in the grid on the right.

    c) Click on the entry for "Word Count" in the grid, and you should see the "Deploy" button enabled.  Click Deploy and accept all defaults for the dialog that pops up.

    d) Now that the app is deployed, click the Run tab.  Again search for "Word Count" in the search box and click the Word Count icon to show the application properties.

    e) Give a new Execution Name for the run.  For the Input path, choose a Directory which has any number of child text files.  For the purpose of this test probaly just add any small text file into a new directory and use that directory as input.  The Output path, is any directory.  I think the last segment of the Output path need not exist prior to the run. (i.e. if you select /user/biadmin/wcout/wcrun1, then /user/biadmin/wcout should exist, but the wcrun1 directory will be created during the run.)

    f) Run the application.  If it's taking too long, check the JobTracker at port 50030 for details.

    thanks for reply!

    I have run the suggested tests: WordCount job on a directory which contains a sample text file.

    The mapreduce job is in state "running " but it doesn't ptogress.

    1- the status of the cluster services is all green.

    2- the MapReduce service is running one task tracker. This is what is showed when I see the task tracker page:

    bivm Hadoop Map/Reduce Administration

    State: RUNNING
    Started: Wed Sep 11 10:30:35 BST 2013
    Version: 1.1.1, rf0025c9fd25730e3c1bfebceeeeb50d930b4fbaa
    Compiled: Fri Aug 9 17:06:14 PDT 2013 by jenkins
    Identifier: 201309111030
    SafeMode: OFF





    Cluster Summary (Heap Size is 23.85 MB/2.02 GB)

    Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
    2 0 6 1 2 0 0 0 2 1 3.00 0 0 0





    Scheduling Information

    Queue Name State Scheduling Information
    default running N/A

    Filter (Jobid, Priority, User, Name)
    Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields


    Running Jobs

    Jobid Started ▴ Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
    job_201309111030_0006 Wed Sep 11 10:38:07 BST 2013 NORMAL biadmin oozie:launcher:T=map-reduce:W=map-reduce-wf:A=wordcount:ID=0000002-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA
    job_201309111030_0005 Wed Sep 11 10:36:42 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 0.00%
     
    2 0 0.00%
     
    0 0 NA NA
    job_201309111030_0004 Wed Sep 11 10:36:20 BST 2013 NORMAL biadmin PigLatin:ClusterAggregation.pig 0.00%
     
    1 0 0.00%
     
    1 0 NA NA
    job_201309111030_0003 Wed Sep 11 10:36:01 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000001-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA
    job_201309111030_0001 Wed Sep 11 10:35:19 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000000-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA

    Completed Jobs

    Jobid Started Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
    job_201309111030_0002 Wed Sep 11 10:35:49 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 100.00%
     
    1 1 100.00%
     
    1 1 NA NA

    Retired Jobs

    none

    The time now is already 11:39 that is one hour that the job has been submitted, but it shows no progress.

    This is the same behaviour of any other mapreduce task that I tried to run in the previous days.

     

    From the userlogs directory in $BIGINSIGHTS_VAR/hadoop/logs/userlogs?job....  the content of the syslog file is the following:

    [root@bivm attempt_201309111030_0006_r_000001_0]# cat syslog
    2013-09-11 10:38:09,660 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
    2013-09-11 10:38:09,955 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2013-09-11 10:38:09,958 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4b554b55
    2013-09-11 10:38:10,025 INFO org.apache.hadoop.mapred.Task: Task:attempt_201309111030_0006_r_000001_0 is done. And is in the process of commiting
    2013-09-11 10:38:10,062 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201309111030_0006_r_000001_0' done.
    2013-09-11 10:38:10,087 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2013-09-11 10:38:10,109 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
    2013-09-11 10:38:10,110 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName biadmin for UID 1001 from the native implementation
     

     

    hope this helps... thanks,

  • kvstumph
    kvstumph
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-12T00:44:48Z  

    thanks for reply!

    I have run the suggested tests: WordCount job on a directory which contains a sample text file.

    The mapreduce job is in state "running " but it doesn't ptogress.

    1- the status of the cluster services is all green.

    2- the MapReduce service is running one task tracker. This is what is showed when I see the task tracker page:

    bivm Hadoop Map/Reduce Administration

    State: RUNNING
    Started: Wed Sep 11 10:30:35 BST 2013
    Version: 1.1.1, rf0025c9fd25730e3c1bfebceeeeb50d930b4fbaa
    Compiled: Fri Aug 9 17:06:14 PDT 2013 by jenkins
    Identifier: 201309111030
    SafeMode: OFF





    Cluster Summary (Heap Size is 23.85 MB/2.02 GB)

    Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
    2 0 6 1 2 0 0 0 2 1 3.00 0 0 0





    Scheduling Information

    Queue Name State Scheduling Information
    default running N/A

    Filter (Jobid, Priority, User, Name)
    Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields


    Running Jobs

    Jobid Started ▴ Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
    job_201309111030_0006 Wed Sep 11 10:38:07 BST 2013 NORMAL biadmin oozie:launcher:T=map-reduce:W=map-reduce-wf:A=wordcount:ID=0000002-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA
    job_201309111030_0005 Wed Sep 11 10:36:42 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 0.00%
     
    2 0 0.00%
     
    0 0 NA NA
    job_201309111030_0004 Wed Sep 11 10:36:20 BST 2013 NORMAL biadmin PigLatin:ClusterAggregation.pig 0.00%
     
    1 0 0.00%
     
    1 0 NA NA
    job_201309111030_0003 Wed Sep 11 10:36:01 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000001-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA
    job_201309111030_0001 Wed Sep 11 10:35:19 BST 2013 NORMAL biadmin oozie:launcher:T=pig:W=pig-wf:A=pig_1:ID=0000000-130911103312455-oozie-biad-W 0.00%
     
    1 0 0.00%
     
    0 0 NA NA

    Completed Jobs

    Jobid Started Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information Diagnostic Info
    job_201309111030_0002 Wed Sep 11 10:35:49 BST 2013 NORMAL biadmin PigLatin:Downsampling.pig 100.00%
     
    1 1 100.00%
     
    1 1 NA NA

    Retired Jobs

    none

    The time now is already 11:39 that is one hour that the job has been submitted, but it shows no progress.

    This is the same behaviour of any other mapreduce task that I tried to run in the previous days.

     

    From the userlogs directory in $BIGINSIGHTS_VAR/hadoop/logs/userlogs?job....  the content of the syslog file is the following:

    [root@bivm attempt_201309111030_0006_r_000001_0]# cat syslog
    2013-09-11 10:38:09,660 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
    2013-09-11 10:38:09,955 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
    2013-09-11 10:38:09,958 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4b554b55
    2013-09-11 10:38:10,025 INFO org.apache.hadoop.mapred.Task: Task:attempt_201309111030_0006_r_000001_0 is done. And is in the process of commiting
    2013-09-11 10:38:10,062 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201309111030_0006_r_000001_0' done.
    2013-09-11 10:38:10,087 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2013-09-11 10:38:10,109 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
    2013-09-11 10:38:10,110 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName biadmin for UID 1001 from the native implementation
     

     

    hope this helps... thanks,

    Hi Franceso, 

    These jobs called Downsampling and ClusterAggregation are monitoring jobs.  It is highly likely that monitoring is utilizing all the available slots on the system.  Can you please stop monitoring.  To stop monitoring go to the Cluster Status tab and select the Monitoring service.  You can then click the "Stop" button to disable it.  After its disabled please retry a BigSheets job or the Word Count app run.

    Thanks,

    Kevin

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-12T10:13:11Z  
    • kvstumph
    • ‏2013-09-12T00:44:48Z

    Hi Franceso, 

    These jobs called Downsampling and ClusterAggregation are monitoring jobs.  It is highly likely that monitoring is utilizing all the available slots on the system.  Can you please stop monitoring.  To stop monitoring go to the Cluster Status tab and select the Monitoring service.  You can then click the "Stop" button to disable it.  After its disabled please retry a BigSheets job or the Word Count app run.

    Thanks,

    Kevin

    Kevin,

    done more tests with monitoring switched off. The word count job still stuck in running state at 0%. Please see below the workflow log taken from the Application Status>Workflow Summary>Workflow log. You can see after about 6 minutes I issued a kill.
     

    2013-09-12 10:24:13,379  INFO ActionStartXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Start action [0000003-130912101653919-oozie-biad-W@wordcount] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    2013-09-12 10:24:14,849  WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] credentials is null for the action
    2013-09-12 10:24:14,849  WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Could not find credentials properties for: null
    2013-09-12 10:24:16,392  INFO MapReduceActionExecutor:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] checking action, external ID [job_201309121014_0004] status [RUNNING]
    2013-09-12 10:24:16,407  WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action status=RUNNING
    2013-09-12 10:24:16,437  WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action updated in DB!
    2013-09-12 10:30:39,887  INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W
    2013-09-12 10:30:40,919  WARN CoordActionUpdateXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
    2013-09-12 10:30:40,919  INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W
    2013-09-12 10:30:48,945  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] callback for action [0000003-130912101653919-oozie-biad-W@wordcount]
    2013-09-12 10:30:48,972 ERROR CompletedActionXCommand:536 - USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] XException, org.apache.oozie.command.CommandException: E0800: Action it is not running its in [KILLED] state, action [0000003-130912101653919-oozie-biad-W@wordcount]
            at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
            at org.apache.oozie.command.XCommand.call(XCommand.java:248)
            at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
            at java.lang.Thread.run(Thread.java:738)
    
    Updated on 2013-09-12T10:20:31Z at 2013-09-12T10:20:31Z by NWSD_Francesco_Animali
  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-12T10:21:28Z  

    Kevin,

    done more tests with monitoring switched off. The word count job still stuck in running state at 0%. Please see below the workflow log taken from the Application Status>Workflow Summary>Workflow log. You can see after about 6 minutes I issued a kill.
     

    <pre class="dijitDialogPaneContentArea" dir="ltr" style="overflow: auto; width: 918px; height: 416.9px; background-color: white;">2013-09-12 10:24:13,379 INFO ActionStartXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Start action [0000003-130912101653919-oozie-biad-W@wordcount] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2013-09-12 10:24:14,849 WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] credentials is null for the action 2013-09-12 10:24:14,849 WARN MapReduceActionExecutor:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] Could not find credentials properties for: null 2013-09-12 10:24:16,392 INFO MapReduceActionExecutor:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] checking action, external ID [job_201309121014_0004] status [RUNNING] 2013-09-12 10:24:16,407 WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action status=RUNNING 2013-09-12 10:24:16,437 WARN ActionStartXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] [***0000003-130912101653919-oozie-biad-W@wordcount***]Action updated in DB! 2013-09-12 10:30:39,887 INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W 2013-09-12 10:30:40,919 WARN CoordActionUpdateXCommand:542 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-09-12 10:30:40,919 INFO KillXCommand:539 - USER[biadmin] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000003-130912101653919-oozie-biad-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000003-130912101653919-oozie-biad-W 2013-09-12 10:30:48,945 INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] callback for action [0000003-130912101653919-oozie-biad-W@wordcount] 2013-09-12 10:30:48,972 ERROR CompletedActionXCommand:536 - USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000003-130912101653919-oozie-biad-W] ACTION[0000003-130912101653919-oozie-biad-W@wordcount] XException, org.apache.oozie.command.CommandException: E0800: Action it is not running its in [KILLED] state, action [0000003-130912101653919-oozie-biad-W@wordcount] at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85) at org.apache.oozie.command.XCommand.call(XCommand.java:248) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) </pre>

    sorry if the log seems to be on one line, unfortunately I was unable to paste it right, although I inserted the carriage returns.

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-12T15:42:31Z  

    sorry if the log seems to be on one line, unfortunately I was unable to paste it right, although I inserted the carriage returns.

    Hi Kevin,

     

    have done more tests according to the tutorials, and I can confirm no MapReduce job can run on my installation of hadoop.

    I have not changed the installation in any part, and I wonder whether this VirtualMachine has ever worked... any idea as to how  to check whether my user (biadmin) has all the permissions to run mapreduce jobs? :-(

  • kvstumph
    kvstumph
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-13T06:12:45Z  

    Hi Kevin,

     

    have done more tests according to the tutorials, and I can confirm no MapReduce job can run on my installation of hadoop.

    I have not changed the installation in any part, and I wonder whether this VirtualMachine has ever worked... any idea as to how  to check whether my user (biadmin) has all the permissions to run mapreduce jobs? :-(

    It looked suspicious that there might be a credential problem, but apparently Oozie will report "credentials is null for the action" even for non-authentication issues.

    A colleague suggested to me that  you shut-down the entire product on the VM and then only restart the specific services you need.  I think it's worth a shot to try that.  Our theory is that although you shut down monitoring its likely that this only shut down the service, but didn't end the running Oozie monitoring jobs.  Shutting down and restarting hadoop will definitively clear up the task tracker slots if that is what is happening.

    To clear all services on the cluster and start only those specific to bigsheets, please try this:

    1) Stop the entire cluster.  On command line issue:  stop-all.sh (the script is at /opt/ibm/biginsights/bin/stop-all.sh).

    2) Now start only these services: hadoop, derby and console.  (the command to start these is /opt/ibm/biginsights/bin/start.sh hadoop derby console).

    3) With only these services started go back into the console and attempt to run a BigSheets child workbook.  I believe it will work now.

    4) Assuming the job succeeds, now you can start the other services aside from monitoring using (/opt/ibm/biginsights/bin/start.sh zookeeper, hive, hbase, bigsql, oozie, orchestrator, httpfs).  I believe that is the full list of services aside from monitoring that may be started.

  • NWSD_Francesco_Animali
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2013-09-13T10:50:53Z  
    • kvstumph
    • ‏2013-09-13T06:12:45Z

    It looked suspicious that there might be a credential problem, but apparently Oozie will report "credentials is null for the action" even for non-authentication issues.

    A colleague suggested to me that  you shut-down the entire product on the VM and then only restart the specific services you need.  I think it's worth a shot to try that.  Our theory is that although you shut down monitoring its likely that this only shut down the service, but didn't end the running Oozie monitoring jobs.  Shutting down and restarting hadoop will definitively clear up the task tracker slots if that is what is happening.

    To clear all services on the cluster and start only those specific to bigsheets, please try this:

    1) Stop the entire cluster.  On command line issue:  stop-all.sh (the script is at /opt/ibm/biginsights/bin/stop-all.sh).

    2) Now start only these services: hadoop, derby and console.  (the command to start these is /opt/ibm/biginsights/bin/start.sh hadoop derby console).

    3) With only these services started go back into the console and attempt to run a BigSheets child workbook.  I believe it will work now.

    4) Assuming the job succeeds, now you can start the other services aside from monitoring using (/opt/ibm/biginsights/bin/start.sh zookeeper, hive, hbase, bigsql, oozie, orchestrator, httpfs).  I believe that is the full list of services aside from monitoring that may be started.

    bingo!

    it was a resource problem.

    In my many tests I had also increased the memory of the virtual machine from 4GB to almost 7GB, but the mapreduce jobs did not complete either. Probably that introduce a problem of resource contention with the host OS on my laptop, in fact I had to stop lotus notes and synphony to run the VM.

    anyhow, I am happy to see that it was a "simple" problem, thanks so much for your help!

    f

  • YahaSun
    YahaSun
    8 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2014-07-31T14:00:39Z  

    bingo!

    it was a resource problem.

    In my many tests I had also increased the memory of the virtual machine from 4GB to almost 7GB, but the mapreduce jobs did not complete either. Probably that introduce a problem of resource contention with the host OS on my laptop, in fact I had to stop lotus notes and synphony to run the VM.

    anyhow, I am happy to see that it was a "simple" problem, thanks so much for your help!

    f

    I guess it's not only a resource problem.  I have the exact same issue when using the latest iibi30 single VM. Even the virtual machine got 8 G memory, running a child workbook still makes no progress.

    Kevin's previous post is very valuable. We don't need to run start-all.sh , instead,  start hadoop, console and catalog (in v3) is enough to go.

    $BIGINSIGHTS_HOME/bin/start.sh hadoop console catalog

    Then the new child workbook ran OK.

     

     

    Updated on 2014-07-31T14:03:12Z at 2014-07-31T14:03:12Z by YahaSun
  • Bart Guisset
    Bart Guisset
    2 Posts

    Re: running workbooks after removing some columns doesn't work

    ‏2014-08-22T01:10:10Z  
    • YahaSun
    • ‏2014-07-31T14:00:39Z

    I guess it's not only a resource problem.  I have the exact same issue when using the latest iibi30 single VM. Even the virtual machine got 8 G memory, running a child workbook still makes no progress.

    Kevin's previous post is very valuable. We don't need to run start-all.sh , instead,  start hadoop, console and catalog (in v3) is enough to go.

    $BIGINSIGHTS_HOME/bin/start.sh hadoop console catalog

    Then the new child workbook ran OK.

     

     

    Problem is the Scheduled applications that take up all the available slots: goto Application tab and disable the schedule for the applications: ClusterAggregation and Downsampling, also verify if there are others. Once you do this you can run all manual jobs started. Next time you can continue to start the BigInsight via the desktop icon as in the lab description.

    Updated on 2014-08-22T01:12:09Z at 2014-08-22T01:12:09Z by Bart Guisset