Data collection

The following new features affect IBM Spectrum LSF data collection.

Enhanced energy accounting using Elasticsearch

This enhancement introduces the lsfbeat tool, which calls the ipmitool to collect the energy data of each host and to send the data to IBM Spectrum LSF Explorer (LSF Explorer). The bjobs and bhosts commands get the energy data from LSF Explorer and display it. To use this feature, LSF Explorer must be deployed in your LSF cluster. To enable the lsfbeat energy service, configure LSF_ENABLE_BEAT_SERVICE="energy" in the lsf.conf file, then run the lsadmin limrestart all command to start up the lsfbeat service. To query energy data with the bhosts and bjobs commands, configure LSF_QUERY_ES_FUNCTIONS="energy" and LSF_QUERY_ES_SERVERS="ip:port" in the lsf.conf file.

Data provenance tools

LSF now has data provenance tools to trace files that are generated by LSF jobs.

You can use LSF data provenance tools to navigate your data to find where the data is coming from and how it is generated. In addition, you can use data provenance information to reproduce your data results when using the same job input and steps.

When submitting a job with the bsub command, enable data provenance by defining LSB_DATA_PROVENANCE=Y as an environment variable (bsub -e LSB_DATA_PROVENANCE=Y) or by using the esub.dprov application (bsub -a 'dprov(file_path)'), and use the tag.sh post-execution script to mark the data provenance attributes for the output data files (-Ep 'tag.sh'). You can also use the showhist.py script to generate a picture to show the relationship of your data files.

Data provenance requires the use of IBM Spectrum Scale (GPFS) as the file system to support the extended attribute specification of files and Graphviz, which is an open source graph visualization software, to generate pictures from the showhist.py script.