Configuring Energy Data Collector for IBM Spectrum LSF Suite for HPC

The Energy Data Collector allows you to see the energy usage of exclusive jobs.

About this task

After you install IBM Spectrum LSF Suite for HPC, configure the Energy Data Collector.

Procedure

  1. Install ipmitool and bc on any hosts that do not have these tools installed. From the deployer in the /opt/ibm/lsf_installer/playbook directory run the following command:
    # ansible all -i lsf-inventory -m command -a "yum -y install ipmitool bc;
  2. Gather the IPMI product ID's from your host by running the following command:
    # ansible all -i lsf-inventory -m shell -a "ipmitool mc info |grep 'Product ID'";
  3. Determine if the hosts can gather any power information. Run the following command on all physical machines in the cluster:
    # ipmitool sensor |grep -t Watt;
    No output means that the machine does not have any sensors to collect energy data. If this command displays a power value, it means that this host can gather energy data.
  4. Gather the data from all machines in the cluster.
    For example, from the deployer, go to the /opt/ibm/lsf_installer/playbook directory and run the following command:
    # ansible all -i lsf-inventory -m shell -a "ipmitool mc info |grep 'Product ID'"; 
    ipmitool sensor |grep -i Watt;

    Note the sensor name and the product ID.

  5. Update the LSF beat configuration file in /opt/ibm/lsfsuite/lsf/conf/lsfbeats/energy/conf/demo.conf to make it aware of the product ID and sensor name.
    If you are not using a shared directory install, update the LSF beat configuration using the following command:
    # ansible all -i lsf-inventory -m lineinfile -a "dest=/opt/ibm/lsfsuite/lsf/conf/lsfbeats/energy/conf/demo.conf state=present line='PRODUCT ID \"SENSOR NAME\"'";
    Replace PRODUCT ID with the corresponding Product ID and SENSOR NAME with the corresponding sensor name. Run this command for each separate product ID and sensor name.
  6. Kill the LSF beat process on the machines.
  7. Test the configuration with an exclusive job with the following command:
    # bsub -x -J "Test-Energy" {something}
    If the configuration is correct, this command shows the energy usage of the exclusive job.