Skip to main content
 
developerworks > Community >  Dashboard > HPC Central Wiki > HPC Central > Tivoli Workload Scheduler LoadLeveler
developerWorks
Log In   View a printable version of the current page.
Overview New to Forums Wikis
Tivoli Workload Scheduler LoadLeveler
Added by hraval, last edited by lcham on Nov 04, 2011  (view change)
Labels: 
(None)

Known Issues

Date Added: November 4, 2011

LoadLeveler startd daemon might abort during startup

Warning:
The LoadLeveler startd daemon might abort during startup if there are files for terminated jobs left in the execute directory.

Users Affected:
Any system installed with IV08342 running the startd daemon.

Fix:
Apply apar IV10161 emergency fix package from IBM service.


Date Added: October 18, 2011

LoadLeveler 3.5.1.12 schedd daemon might core dump

Warning:
The LoadLeveler schedd daemon might core dump on systems at the LoadLeveler 3.5.1.12 service level. LoadLeveler may not restart on this node after the schedd crashes.

Users Affected:
Any system installed with LoadLeveler 3.5.1.12 service level running the schedd daemon. Systems with checkpoint enabled may have a higher occurrence rate.

Fix:
Install LoadLeveler 3.5.1.13 or later service level.
If this is not possible, apply apar IV03346 emergency fix package from IBM service.


Date Updated: February 22, 2010
Updated version: Add in TWS LoadLeveler 4.1.0.3

Date Added: February 17, 2010

TWS LoadLeveler 3.5.1.4, 4.1.0.2 and 4.1.0.3 -Jobs will not be started in a login shell

Warning:
In TWS LoadLeveler 3.5.1.4, 4.1.0.2 and 4.1.0.3, jobs will not be started in a login shell.
The environment in which the job runs may not be set as expected and the job may fail to run.

Users Affected:
All TWS LoadLeveler 3.5.1.4, 4.1.0.2 and 4.1.0.3 installations that need the proper environment set by a login shell for their jobs to run correctly.

Workaround:
Set the environment keyword in the job command file to COPY_ALL
e.g.

#@ environment = COPY_ALL

EFIX:
For TWS LoadL 3.5.1.4 - Apply apar IZ70280 emergency fix package available from IBM service.
For TWS LoadL 4.1.0.2 and 4.1.0.3 - Apply apar IZ70442 emergency fix package available from IBM service.


Date Added: November 20, 2009

TWS LoadLeveler 3.5.1.3 Startd daemon core dumps when starting up in drain mode

Warning:
The LoadL_startd daemon will core dump if started via drain mode under TWS LoadLeveler version 3.5.1.3.

Users Affected:
All TWS LoadLeveler 3.5.1.3 installations that uses the command option " llctl start drained " to start up LoadLeveler.

Workaround:
Start TWS Loadleveler via normal mode (do not specify drained option).

EFIX:
Apply apar IZ64435 emergency fix package available from IBM service.



Date Added: May 18, 2009

Coexistence issues with TWS LOADLEVELER

Warning:
Jobs will not be able to run in a mixed cluster with TWS LoadLeveler 3.5.0.1 - 3.5.0.4 service levels with either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1.
A coexistence issue was introduced in TWS LoadLeveler 3.5.0.5 which also affected TWS LoadLeveler 3.5.1.1.

Users Affected:
Installations running mixed levels of TWS LoadLeveler 3.5.0.1 - 3.5.0.4 with either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1.

Workaround:
The coexistence problem introduced in TWS LoadLeveler 3.5.0.5 can not be corrected.
The entire cluster will need to be migrated to either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1 at the same time.

There is no coexistence issue between TWS LoadLeveler 3.5.0.5 and TWS LoadLeveler 3.5.1.1.



Date Added: March 16, 2009

TWS LoadLeveler Service Update 3.5.0.4 for LINUX is available

Warning:
On linux platforms with multiple cpus, it is possible for the seteuid function to malfunction.
When the LoadLeveler startd daemon encounters this failure, its effective user id may be set incorrectly, in which case it is possible for jobs to become stuck in ST state.

Users Affected:
All Multiprocessor (or multicore) systems on LINUX.

Workaround:
To clear the jobs which are stuck in ST state, recycle the node that the job is pending on, using the command "llctl recycle".
Users need to include the keyword "#@ restart = yes" in their job command file so that the pending jobs which are terminated as the result of LL recycling will be restarted.

EFIX:
APAR IZ46123 is available from TWS LoadLeveler to workaround the glibc issue from IBM service.



Date Added: January 26, 2009

TWS LoadLeveler 3.5.0.2 causes migration and coexistence failures
An error was introduced in TWS LoadLeveler 3.5.0.2 where job objects used by TWS LoadLeveler 3.5.0.2 are incompatible with job objects used by all prior LoadLeveler maintenance levels or releases. LoadLeveler job objects are stored in the job spool and are transmitted among LoadLeveler processes. The incompatibility causes migration and coexistence failures such as the inability to read job objects, produced by earlier TWS LoadLeveler maintenance levels or releases, from the job spool.

Users Affected:
Systems with TWS LoadLeveler 3.5.0.2 installed

FIX:
Install TWS LoadLeveler 3.5.0.3.

TWS LoadLeveler 3.5.0.3 restores compatibility with maintenance levels and releases prior to LoadLeveler 3.5.0.2. LoadLeveler 3.5.0.2 will remain incompatible with prior maintenance levels and releases, and will also be incompatible with subsequent maintenance levels and releases.

Systems with TWS LoadLeveler 3.5.0.2 already installed will need to make sure to have an empty job queue before going to any LoadLeveler maintenance levels or releases; otherwise, the jobs in the job queue will be removed after the upgrade.



Date Added: November 21, 2008

Multiple problems in the negotiator in TWS LoadLeveler 3.4.3.5
The negotiator is frequently core dumping with a signal 6 when using user_priority in the job command file.
llq is showing the incorrect job state after central manager restarts.
Interactive job fails to run.

Users Affected:
Systems with TWS LoadLeveler 3.4.3.5 installed

EFIX:
APAR: IZ37213
DESCRIPTION:
The negotiator abort was due to the way the user_priority job command file keyword is implemented
internally to honor the user's assignment of job priority.
Some cases were found where inconsistencies in internal data structures could occur.

APAR: IZ38238
DESCRIPTION:
After a central manager restarts, running jobs are displayed as IDLE even though they are actually running.

APAR: IZ38253
DESCRIPTION:
The central manager is skipping over the newly arrived interactive step so it will not run.

Efixes are available from IBM service for AIX and LINUX platforms.


Date Added: October 29, 2007

TWS LL - TWS LoadLeveler Service Update 3.4.2.1 for AIX 5L and Linux is available.

Notes:
-TWS LoadLeveler 3.4.2.1 is a mandatory service update to be installed with TWS LoadLeveler 3.4.2.0.
-The TWS LoadLeveler scheduling affinity support has been enhanced to utilize the performance benefits from SMT processor core topology available on SMT-capable IBM POWER5 or POWER6 processor-based systems. Jobs can request TWS LoadLeveler to schedule and attach CPUs for their tasks to processor cores in addition to MCMs. Tasks of jobs requesting MCM task affinity share the processors in an MCM with other tasks in the same job. In some instances, this may cause two tasks to run on the same core (in separate SMT threads). If the application intends to have each task running on a distinct core, then the "task_affinity=core" keyword should be added to the job's JCF.
-Additional information relating to this update can be found at http://www14.software.ibm.com/webapp/set2/sas/f/loadleveler


Date Added: August 20, 2007

TWS LL - An error was introduced in TWS LoadLeveler 3.4.1.2 which causes TWS LoadLeveler to lose reservation and fair share information following a recycle of TWS LoadLeveler nodes running the schedd daemon.

Users Affected:
All installations that use the reservation or fair share features of TWS LoadLeveler.

Issue:
The error causes reservation and fair share data to be unretrievable from the corresponding spool file when the TWS LoadLeveler schedd daemon is re-started.

Solution:
Apply apar IZ03334 efix.
Efix rpms for all linux platforms supported by TWS LoadLeveler and an emergency package for AIX may be obtained by calling IBM service.



Date Added: June 20, 2007

TWS LL - An error was introduced in all 32 bit linux ports of TWS LoadLeveler 3.4.1.1 which can cause TWS LoadLeveler jobs to fail.

Users Affected:
All installations that use 32 bit linux ports of TWS LoadLeveler 3.4.1.1.

Issue:
The error causes user process limits to be set to uninitialized values, which can cause application failures.
When the uninitialized value is very small, the user application may terminate abnormally due to exceeding the small limit.

Solution:
Apply apar IZ00385 efix rpms which are available from IBM service for the following platforms:

x86_redhat_4.0.0
x86_redhat_3.0.0
x86_sles_10.0.0
x86_sles_9.0.0


Date Added: April 04, 2007

The "smt" keyword is defaulted to "no" under LoadLeveler 3.4.0.0 and 3.4.0.1.

The "smt" keyword is defaulted to "no" under LoadLeveler 3.4.0.0 and 3.4.0.1.
This may cause jobs to have degraded performance if SMT is enabled on machines in
the LoadLeveler cluster.

For more information go to:
LL 3.4.0.1 defaults to "SMT=no"