How IBM Spectrum LSF Works with TotalView

IBM® Spectrum LSF is integrated with Etnus TotalView® multiprocess debugger. You should already be familiar with using TotalView software and debugging parallel applications.

Debugging LSF jobs with TotalView

Etnus TotalView is a source-level and machine-level debugger for analyzing, debugging and tuning multiprocessor or multithreaded programs. LSF works with TotalView two ways:

  • Use LSF to start TotalView together with your job

  • Start TotalView separately, submit your job through LSF and attach the processes of your job to TotalView for debugging

Once your job is running and its processes are attached to TotalView, you can debug your program as you normally would.

Installing LSF for TotalView

lsfinstall installs the application-specific esub program esub.tvpoe for debugging POE jobs in TotalView. It behaves like esub.poe and runs the poejob script, but it also sets the appropriate TotalView options and environment variables for POE jobs.

lsfinstall also configures hpc_ibm_tv queue for debugging POE jobs in lsb.queues. The queue is not rerunnable, does not allow interactive batch jobs (bsub -I), and specifies the following TERMINATE_WHEN action:

TERMINATE_WHEN=LOAD PREEMPT WINDOW

lsfinstall installs the following application-specific esub programs to use TotalView with LSF:

  • Configures hpc_linux_tv queue for debugging MPICH-GM jobs in lsb.queues. The queue is not rerunnable, does not allow interactive batch jobs (bsub -I), and specifies the following TERMINATE_WHEN action:

    TERMINATE_WHEN=LOAD PREEMPT WINDOW

  • esub.tvmpich_gm for debugging MPICH-GM jobs in TotalView; behaves like esub,mpich_gm, but also sets the appropriate TotalView options and environment variables for MPICH-GM jobs, and sends the job to the hpc_linux_tv queue

Environment variables for TotalView

On the submission host, make sure that:

  • The path to the TotalView binary is in your $PATH environment variable

  • $DISPLAY is set to console_name:0.0

Setting TotalView preferences

Before running and debugging jobs with TotalView, you should set the following options in your $HOME/.preferences.tvd file:

  • dset ignore_control_c {false} to allow TotalView to respond to <CTRL-C>

  • dset ask_on_dlopen {false} to tell TotalView not to prompt about stopping processes that use the dlopen system call

Limitations

While your job is running and you are using TotalView to debug it, you cannot use LSF job control commands:

  • bchkpnt and bmig are not supported.

  • Default TotalView signal processing prevents bstop and bresume from suspending and resuming jobs, and bkill from terminating jobs.

  • brequeue causes TotalView to display all jobs in error status. Click Go and the jobs will rerun.

  • Load thresholds and host dispatch windows do not affect jobs running in TotalView.

  • Preemption is not visible to TotalView.

  • Rerunning jobs within TotalView is not supported.