Configuring LSF resource connector for Google Cloud Platform

Specify LSF resource connector configuration to enable Google Cloud Platform as a resource provider.

Procedure

  1. Update the Google Cloud Platform resource connector configuration.
    Change the parameters in the googleprov_config.json file to enable the resource connector to connect to Google Cloud Platform.
    GCLOUD_PROJECT_ID
    Google Cloud project ID. You can see your project ID when you click the list of all your projects on the Google Cloud Platform Console, or by using the gcloud command.
    GCLOUD_CREDENTIALS_FILE
    Google Cloud service account key. The service account must have read-write permission for Google Compute Engine and full control of Google Cloud Storage.

    This parameter is optional if the LSF management host is on Google Cloud, and the service account that is attached to the LSF management host instance has the required permission.

    GCLOUD_REGION
    Default region for bulk instances.

    Google Cloud Platform provides both zonal bulk API endpoint (Instances.BulkInsert) and regional bulk API endpoint (RegionInstances.BulkInsert) support. LSF resource connector automatically uses bulk API endpoints to create Google Cloud instances.

    If the zone in which you want to create your instances is not important, configure LSF resource connector to call the regional bulk API endpoint by specifying a value for the GCLOUD_REGION parameter or by defining a region in the googleprov_templates.json file. The region that is defined in the googleprov_templates.json file overrides the region that is defined in the GCLOUD_REGION parameter. Google Cloud Platform automatically selects the zone in which to create your instances, considering the available hardware capacity in each zone.

    If you want to specify a zone in which to create your instances, define the zone in the googleprov_templates.json file, and LSF resource connector calls the zonal bulk API endpoint.

  2. Create templates.
    Create at least one template in the googleprov_templates.json file. Make sure that your template accurately defines at least the following attributes:
    • imageId
    • vmType
    • zone

    The following template is a minimal example for hosts with 1 CPU:

    {
        "templates": [
            {   
                "templateId": "Template-VM-1",
                "maxNumber": 1,
                "attributes": {
                    "type": ["String", "X86_64"],
                    "ncores": ["Numeric", "1"],
                    "ncpus": ["Numeric", "1"],
                    "nthreads": ["Numeric", "1"],
                    "mem": ["Numeric", "600"],
                    "googlehost": ["Boolean", "1"]
                },
                "imageId": "serverimage",
                "vmType": "f1-micro",
                "zone": "us-central1-f",
                "instanceTags" : "team=dev8",
                "userData": "team=dev8"
            }
        ]
    }
    

    Optionally, create templates that use other features by specifying additional attributes.

    1. Define GPU attributes to create templates for instances with GPU support.

      Specify the gpuType attribute to enable GPU support and specify other attributes to configure the GPU features.

      The following template defines the GPU-related attributes:

      {
          "templates": [
              {
                  "templateId": "TemplateGPU-VM-1",
                  "maxNumber": 100,
                  "attributes": {
                      "type": ["String", "X86_64"],
                      "ncores": ["Numeric", "1"],
                      "ncpus": ["Numeric", "1"],
                      "nthreads": ["Numeric", "2"],
                      "ngpus": ["Numeric", "1"],
                      "ngpus_physical": ["Numeric", "1"],
                      "gpuextend": ["String", "ngpus=1;nnumas=1;gbrand=Tesla;gmodel=K80;gmem=10240;nvlink=yes"],
                      "define_ncpus_threads": ["Boolean", "1"],
                      "mem": ["Numeric", "3840"],
                      "zone": ["String", "us_east1-d"],
                      "googlehost": ["Boolean", "1"]
                  },
                  "imageId": "lsf-gcloud-dynamic-vm",
                  "region": "us-east1",
                  "zone": "us-east1-d",
                  "vmType": "n1-standard-1",
                  "gpuType": "nvidia-tesla-k80",
                  "gpuNumber":"1",
      	     "privateNetworkOnlyFlag": false,
                  "vpc": "lsf-vpc",
                  "priority": "18",
                  "subnetId": "lsf-vpc-us-east1",
                  "instanceTags" : "lsf-vpn-instance=gcloud-VM-1",
                  "userData": "zone=us-east1-d"
              }
          ]
      }
      

      Use the gpuextend attribute if you want to use factors such as GPU model, GPU memory, or NVLink support to select different templates. The gpuextend attribute consists of key-value pairs that are separated by semicolons.

    2. Specify a baseline minimum CPU platform for your template.

      Use the minCpuPlatform attribute to specify the minimum CPU platform.

      If you decide to specify a minimum CPU platform, ensure that your zone contains multiple CPU platforms, and that the minimum CPU platform exists in the specified zone for this instance. If you specify a minimum CPU platform, your instance uses this CPU platform unless you stop the instance and change the CPU platform.

      For more information on minimum CPU platforms, see the Google Cloud documentation:https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform

      The following template uses Intel Sandy Bridge as the minimum CPU platform:

      {    
      	"templates": [
              {
                  "templateId": "TemplateMinCPU-VM-1",
                  "maxNumber": 100,
                  "attributes": {
                      "type": ["String", "X86_64"],
                      "ncores": ["Numeric", "1"],
                      "ncpus": ["Numeric", "1"],
                      "nthreads": ["Numeric", "2"],
                      "define_ncpus_threads": ["Boolean", "1"],
                      "mem": ["Numeric", "3840"],
                      "zone": ["String", "us_east1-d"],
                      "googlehost": ["Boolean", "1"]
                  },
                  "imageId": "lsf-gcloud-dynamic-vm",
                  "region": "us-east1",
                  "zone": "us-east1-d",
                  "vmType": "n1-standard-1",
                  "minCpuPlatform":"Intel Sandy Bridge",
                  "privateNetworkOnlyFlag": false,
                  "vpc": "lsf-vpc",
                  "priority": "18",
                  "subnetId": "lsf-vpc-us-east1",
                  "instanceTags" : "lsf-vpn-instance=gcloud-VM-1",
                  "userData": "zone=us-east1-d"
              }
          ]
      }

      If different sets of jobs have different minimum CPU platform requirements, you can configure a string resource in different template IDs:

      {
          "templates": [
              {
                  "templateId": "gcloud-VM-1",
                  "maxNumber": 100,
                  "attributes": {
                      "type": ["String", "X86_64"],
                      "ncores": ["Numeric", "1"],
                      "ncpus": ["Numeric", "1"],
                      "nthreads": ["Numeric", "2"],
                      "define_ncpus_threads": ["Boolean", "1"],
                      "mem": ["Numeric", "3840"],
                      "zone": ["String", "us_east1-d"],
                      "googlehost": ["Boolean", "1"],
                      "minCpuPlatform": ["String", "Intel_Sandy_Bridge"]
                  },
                  "imageId": "lsf-gcloud-dynamic-vm",
                  "region": "us-east1",
                  "zone": "us-east1-d",
                  "vmType": "n1-standard-1",
                  "minCpuPlatform":"Intel Sandy Bridge",
      	     "privateNetworkOnlyFlag": false,
                  "vpc": "lsf-vpc",
                  "priority": "18",
                  "subnetId": "lsf-vpc-us-east1",
                  "instanceTags" : "lsf-vpn-instance=gcloud-VM-1",
                  "userData": minCpuPlatform=Intel_Sandy_Bridge"
              },
              {
                  "templateId": "gcloud-VM-2",
                  "maxNumber": 50,
                  "attributes": {
                      "type": ["String", "X86_64"],
                      "ncores": ["Numeric", "1"],
                      "ncpus": ["Numeric", "1"],
                      "nthreads": ["Numeric", "2"],
                      "define_ncpus_threads": ["Boolean", "1"],
                      "mem": ["Numeric", "3840"],
                      "zone": ["String", "us_east1-d"],
                      "googlehost": ["Boolean", "1"],
                      "minCpuPlatform": ["String", "Intel_Cascade_Lake"]
                  },
                  "imageId": "lsf-gcloud-dynamic-vm",
                  "region": "us-east1",
                  "zone": "us-east1-d",
                  "vmType": "n1-standard-1",
                  "minCpuPlatform": "Intel Cascade Lake",
                  "privateNetworkOnlyFlag": false,
                  "vpc": "lsf-vpc",
                  "priority": "18",
                  "subnetId": "lsf-vpc-us-east1",
                  "instanceTags" : "lsf-vpn-instance=gcloud-VM-1",
                  "userData": "minCpuPlatform=Intel_Cascade_Lake"
              }
          ]
      }

      In the lsf.shared file, configure the minCpuPlatform String resource:

      Begin Resource
      RESOURCENAME  TYPE    INTERVAL INCREASING  DESCRIPTION        # Keywords
      ...
      minCpuPlatform String ()       ()          (minCpuPlatform )
      ...
      End Resource

      Content of the userData attribute is exported as environment variables. Add the following lines to the user_data.sh script in the <LSF_TOP>/<LSF_VERSION>/resource_connector/google/scripts/ directory to add the minCpuPlatform String resource in the newly-created instance:

      if [ -n "${minCpuPlatform}" ]; then
      sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap ${minCpuPlatform}*minCpuPlatform]\"/" $LSF_CONF_FILE
      echo "Updated LSF_LOCAL_RESOURCES in $LSF_CONF_FILE successfully to add [resourcemap ${minCpuPlatform}*minCpuPlatform]" >> $logfile
      else
      echo "minCpuPlatform does not exist in the environment variable" >> $logfile
      fi

      You can submit jobs that select different templates based on the minCpuPlatform.

      The following job submission creates or reuses an instance from the gcloud-VM-1 template, which has Intel Sandy Bridge as the minimum CPU platform:

      bsub -R "select[minCpuPlatform==Intel_Sandy_Bridge]" myjob

      The following job submission creates or reuses an instance from the gcloud-VM-2 template, which has Intel Cascade Lake as the minimum CPU platform:

      bsub -R "select[minCpuPlatform==Intel_Cascade_Lake]" myjob

      Both of the following job submissions create or reuse an instance from either the gcloud-VM-1 or gcloud-VM-2 templates, since both of these templates satisfy the requirements:

      bsub -R "select[minCpuPlatform==Intel_Sandy_Bridge  || minCpuPlatform==Intel_Cascade_Lake]" myjob
      bsub myjob
      You can also set minCpuPlatform as a Numeric resource, using different values to represent different CPU platforms, then submit jobs based on this numeric requirement. For example, if you use the following numbers for the different CPU platforms:
      • Intel Skylake: 1
      • Intel Ivy Bridge: 2
      • Intel Haswell: 3
      • Intel Broadwell: 4
      • Intel Cascade Lake: 5
      • Intel Sandy Bridge: 6

      The following job submission creates or reuses an instance that uses the Intel Cascade Lake or Intel Sandy Bridge CPU platforms:

      bsub -R "select[minCpuPlatform > 4]" myjob
    3. Specify a launch instance template.

      Specify the launchTemplateId attribute to enable launch instance templates. You need to create the specified instance template in Google Cloud before using it. When using launch instance templates, you can define all the instance's properties within the template when you create it, then you only need to specify the zone or region in the googleprov_templates.json file. The same attributes that are specified in the googleprov_templates.json file override the values that are specified in the template. For more information on the override behavior of instance templates, see the Google Cloud documentation: https://cloud.google.com/compute/docs/instances/create-vm-from-instance-template#creating_a_vm_instance_from_an_instance_template_with_overrides.

      Note: To export environment variables to the instances, you still must specify the userData attribute in the googleprov_templates.json file. To add labels to the instances, you still must specify the instanceTags attribute in the googleprov_templates.json file.

      For more information on launch instance templates, see the Google Cloud documentation: https://cloud.google.com/compute/docs/instance-templates.

      LSF supports the following Google Cloud instance features only in launch instance templates:

      Local SSDs
      LSF supports attached local SSDs through launch instance templates, but does not include an interface in the googleprov_templates.json file. In Google Cloud, attach the local SSD to the launch instance template in Disks > Add new disk by selecting Local SSD scratch disk in the Type field.

      You must mount SSDs before using them. LSF includes an example code that illustrates how to mount multiple local SSDs in one logical volume in the <LSF_TOP>/<LSF_VERSION>/resource_connector/google/scripts/example_user_data.sh file.

      For more information on local SSDs, see the Google Cloud documentation: https://cloud.google.com/compute/docs/disks/local-ssd.

      Preemptible VM instances
      Preemptible VM instances are instances that run at a lower cost than standard instances, with most of the same features as standard instances.

      LSF supports preemptible VM instances through launch instance templates, but does not include an interface in the googleprov_templates.json file. In Google Cloud, set the Preemptibility field to On when creating the launch instance template to enable preemptible VM instances.

      When a VM instance is being preempted, the instance transitions into TERMINATED status and LSF automatically requeues the job that is running on the instance. LSF then deletes the preempted instance.

      For more information on preemptible VM instances, see the Google Cloud documentation: https://cloud.google.com/compute/docs/instances/preemptible.

      The following template specifies a launch instance template:

      {    
      	"templates": [
              {
                  "templateId": "gcloud-VM-1",
                  "maxNumber": 100,
                  "attributes": {
                      "type": ["String", "X86_64"],
                      "ncores": ["Numeric", "1"],
                      "ncpus": ["Numeric", "1"],
                      "nthreads": ["Numeric", "2"],
                      "define_ncpus_threads": ["Boolean", "1"],
                      "mem": ["Numeric", "3840"],
                      "zone": ["String", "us_east1-d"],
                      "googlehost": ["Boolean", "1"]
                  },
                  "zone": "us-east1-d",
                  "launchTemplateId”: “template-f1-micro"
              }
          ]
      }
  3. Optional: Update the LSF_TOP parameter in the user_data.sh script to run the lsf_daemons start command to start LSF daemons on the instance. This script runs during the Google Cloud Platform instance startup.

    Edit the user_data.sh file in the <LSF_TOP>/<LSF_VERSION>/resource_connector/google/scripts/ directory.

    For example,
        #!/bin/sh
        
        logfile=/tmp/user_data.log
        echo START `date '+%Y-%m-%d %H:%M:%S'` >> $logfile
        
        #
        # Export user data, which is defined with the "UserData" attribute 
        # in the template
        #
        %EXPORT_USER_DATA%
        
        #
        # Add your customization script here
        #
        
        #
        # Source LSF enviornment at the VM host
        #
        LSF_TOP=/usr/share/lsf
        LSF_CONF_FILE=$LSF_TOP/conf/lsf.conf
        . $LSF_TOP/conf/profile.lsf
        env >> $logfile
        
        # 
        # Support rc_account resource to enable RC_ACCOUNT policy  
        # Add additional local resources if needed 
        #
        if [ -n "${rc_account}" ]; then
        sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1  [resourcemap ${rc_account}*rc_account]\"/" $LSF_CONF_FILE
        echo "update LSF_LOCAL_RESOURCES lsf.conf successfully, add [resourcemap ${rc_account}*rc_account]" >> $logfile
        fi
        
        #
        # Start LSF Daemons 
        #
        $LSF_SERVERDIR/lsf_daemons start
        
        echo END AT `date '+%Y-%m-%d %H:%M:%S'` >> $logfile
    
    Note: You can add any extra code that needs to run when the Google Cloud Platform instance launches.