Follow these steps to configure Microsoft Azure to create instances for LSF
resource connector to make allocation requests on behalf of LSF. The
instances launched from Microsoft Azure join the LSF
cluster. If instances become idle, LSF
resource connector terminates them.
Before you begin
The Microsoft Azure provider requires IBM® Spectrum
LSF Version 10.1, Fix Pack 2, or later.
Before using the resource connector with the Microsoft Azure provider, you must apply the latest
LSF Fix Pack, manually move the required configuration files to the appropriate directory under
LSF_TOP/conf/resource_connector/<provider_name>/conf,
and change the ownership of those new files and directories to the cluster administrator.
For more information about applying Fix Packs to LSF
resource connector, see Use the LSF patch installer to update resource connector.
- You must have root access to the LSF management
host.
- The LSF
management host and
the compute nodes (Microsoft Azure instances) must be able to communicate with each other.
- You must be able to restart the LSF
cluster.
- You must be familiar with and have the ability to perform Microsoft Azure administrative
operations.
- The virtual network to be used by Microsoft Azure virtual instances must be configured so that
they can communicate with LSF
hosts.
Note: This is not required if you are running a simple test application.
- You must have the ability to configure the DNS or NIS server and use short host names on
Microsoft Azure.
The default DNS server on Microsoft Azure does not work for LSF
because the Microsoft Azure instance's full host name, with domain name, usually exceeds 60
characters, which is too long for LSF.
About this task
LSF
resource connector has been tested on the following systems:
- Linux x86 Kernel 2.6, glibc 2.11 RHEL 6.x
- Linux x86 Kernel 3.10, glibc 2.17 RHEL 7.x
- Linux x86 Kernel 3.10, glibc 2.17 CentOS 7.x
- LSF
10.1.0 Fix Pack 2, or
later
In the following steps, you must perform all operations as the Microsoft Azure administrator
unless otherwise stated.
For full details on installing LSF, see
Installing IBM Spectrum
LSF on UNIX and
Linux.
To allow user-submitted jobs to run on the instance, the instance must have
this user prepared or LSF user
mapping configured. For more information about user groups and user account mapping, see Managing Users and User
Groups and Between-Host User Account Mapping.
Procedure
-
Create the Azure Java SDK authentication file.
Any application that runs on Microsoft Azure must be registered as an Azure application, and set
roles to access resources such as images and networks. Create an Azure authentication file to grant
access to the LSF
resource connector Azure plug-in, and register the application as an Azure application. You can set
up access control for the LSF
resource connector under multiple subscriptions:
- To generate a role for the authentication key file, manually:
- Edit the Microsoft Azure custom_role.json file to include a custom role to
accurately control access rights for LSF
resource connector. List multiple subscriptions in under the
AssignableScopes
section so that the LSF
resource connector can work under those multiple subscriptions. Here is an example
custom_role.json file, with a custom role named LSF Resource
Connector
, with two subscriptions under AssignableScopes
:{
"Name": "LSF Resource Connector",
"IsCustom": true,
"Description": "LSF resource connector for Azure, access/create/delete VM RG storage network"
"Actions": [
"Microsoft.Storage/*",
"Microsoft.Network/*",
"Microsoft.Compute/*",
"Microsoft.Authorization/*/read",
"Microsoft.Resources/subscriptions/resourceGroups/*",
"Microsoft.Resources/deployments/*",
"Microsoft.Insights/alertRules/*",
"Microsoft.Insights/diagnosticSettings/*",
"Microsoft.Support/*",
"Microsoft.ResourceHealth/availabilityStatuses/read"
],
"NotActions": [
],
"AssignableScopes": [
"/subscriptions/1db8ceea-a921-4395-9586-6fc87945f8d7",
"/subscriptions/5db8ceea-a921-4395-9586-6fc87945f8d9"
]
}
- Create the custom role in Microsoft Azure, register the LSF
resource connector with the Azure application, and assign the custom role to your multiple
subscriptions.
Use the following example as a guide: it creates the
LSF Resource
Connector custom role, registers the
LSF
resource connector with the Azure application called
MyAzureApp, and assigns
the
LSF Resource Connector custom role to two
subscriptions:
$ az role definition create --role-definition custom_role.json
$ az ad sp create-for-rbac -o json -n "MyAzureApp" --role "LSF Resource Connector"
--scopes "/subscriptions/1db8ceea-a921-4395-9586-6fc87945f8d7"
"/subscriptions/5db8ceea-a921-4395-9586-6fc87945f8d9
- To generate a role for the authentication key file, automatically:
- Verify that you have Azure Active Directory administrator permissions and installed Azure CLI
2.0.
- Log in to the Azure CLI and run the authgen.py script to create the
authentication
file:
curl -L https://raw.githubusercontent.com/Azure/azure-libraries-for-java/master/tools/authgen.py | python > my.azureauth
The new authentication file that the LSF resource connector supports is a Java properties file
and contains the following information:
subscription=########-####-####-####-############
client=########-####-####-####-############
tenant=########-####-####-####-############
key=XXXXXXXXXXXXXXXX
managementURI=https\://management.core.windows.net/
baseURL=https\://management.azure.com/
authURL=https\://login.windows.net/
graphURL=https\://graph.windows.net/
-
Create a key pair.
Azure supports public-key cryptography to secure the login information for an instance. A Linux
instance has no password; you use a key pair to securely log in to your instance. You specify the
SSH public key when you launch your instance, then use the private key when you log in using
SSH.
Provide the content of public key file to LSF to
launch Azure instances. Configure the file path of the public key in the
azureprov_templates.json file.
-
Use the ssh-keygen command to create an SSH key pair.
-
Provide the content of the public key file to LSF to
launch Azure instances. Specify the file path of the public key in the
azureprov_templates.json configuration file.
-
Create a resource group for LSF on the
Azure Portal or by using the Azure CLI..
-
Create an Azure Virtual Network (VNet) and subnets in the Azure resource group for LSF.
An Azure Virtual Network (VNet) is a virtual network that is dedicated to an Azure account.
For more details on Azure Virtual Networks, refer to the Microsoft Azure documentation: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview
A subnet is a range of IP addresses in your VNet. You can launch Azure resources into a subnet
that you select. Specify the name of the subnet (for example, subnet) in the
azureprov_templates.json file that is used to launch the instance.
Create the VNet and subnets in the Azure resource group that you created for LSF
-
Create a an LSF
network security group in the Azure resource group for LSF.
A network security group (NSG) contains a list of security rules that allow or deny network
traffic to resources that are connected to VNets. Add customized rules to open all LSF
listening ports to the security group that launches instances. The ports must match those from the
existing LSF
cluster.
The following are the default port number values:
- LSF_LIM_PORT=7869 (TCP and UDP)
- LSF_RES_PORT=6878 (TCP)
- LSB_SBD_PORT=6882 (TCP)
You can also accept all traffic from the LSF management host by adding IP address of the LSF management
host, or you can accept all traffic across the VNet.
Note: If you allow the traffic only from the default ports, some NIOS commands might not work since
the ports are configured for them dynamically and are different from the LSF
ports.
Add the network security group to the Azure resource group that you created for LSF.
-
Create an instance and install the LSF management
host.
The LSF
cluster administrator must manually launch an Azure instance and install LSF on the
management host.
Enabling dynamic hosts is sufficient for this management host configuration. Set
ENABLE_DYNAMIC_HOSTS="Y" in the install.config file if
you are installing a new LSF management
host. If you have an existing LSF management
host, manually the configure LSF_DYNAMIC_HOST_WAIT_TIME parameter in the
lsf.conf file and the LSF_HOST_ADDR_RANGE parameter in the
lsf.cluster.clustername file.
Tip: You do not have to configure the resource connector parameters at this time. You
can configure these parameters after you successfully build the LSF cloud
image
For example, specify the following parameters in the install.config
file:
LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_CLUSTER_NAME="cluster1"
LSF_MASTER_LIST="lsfmanagement1 lsfmanagement2"
LSF_ENTITLEMENT_FILE="/root/platform_lsf_std_entitlement.dat"
LSF_TARDIR="/root/"
ENABLE_DYNAMIC_HOSTS="Y"
LSF_DYNAMIC_HOST_WAIT_TIME="2"
-
Build the LSF cloud
image.
To create an Azure instance image for an LSF cloud
compute host, manually launch an Azure instance and install LSF on
that instance.
-
Create the Azure instance and choose the manage disk when you launch the instance.
-
Use the ssh command to log in to the Azure instance and use the Azure Java
SDK authentication file that you created.
-
Copy the LSF
packages to Azure instance
- lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
- lsf10.1_linux2.6-glibc2.3-x86_64-442293.tar.Z (LSF 10.1,
Fix Pack 2)
- lsf10.1_lsfinstall.tar.Z
- lsf_std_entitlement.dat
-
Check that the ed.x86_64 software is installed.
If this package is not installed, use the yum install command to install
it.
yum install ed
-
Install LSF as a
server host on the Azure instance.
For example, edit the server.config file with
the installation options you need:
LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_TARDIR="/opt/install/"
LSF_LICENSE="/opt/install/lsf_std_entitlement.dat"
LSF_SERVER_HOSTS="management1 management2"
LSF_LOCAL_RESOURCES="[resource azurehost]"
LSF_LIM_PORT="7869"
LSF_GET_CONF=lim
Run the ./lsfinstall -s -f server.config command to
install the LSF server
host.
After installation, make sure that the
LSF_TOP/conf/lsf.conf file contains the
azurehost resource.
LSF_GET_CONF=lim
LSF_CONFDIR=/opt/lsf/conf
LSF_LIM_PORT=7869
LSF_SERVER_HOSTS="management.myserver.com"
LSF_VERSION=10.1
LSF_LOCAL_RESOURCES="[resource azurehost]"
LSF_TOP=/opt/lsf/
LSF_LOGDIR=/opt/lsf/log
LSF_LOG_MASK=LOG_WARNING
LSF_ENABLE_EGO=N
LSB_ENABLE_HPC_ALLOCATION=Y
LSF_EGO_DAEMON_CONTROL=N
- LSF_LIM_PORT
- The port number must be the same as the one that is defined on the LSF management host.
- LSF_GET_CONF
- Update the LSF configuration to synchronize the cluster configuration with the management host.
- LSF_LOCAL_RESOURCES
- The new resource name azurehost is used by LSF to
identify Azure instances. Use the bhosts -a command to see instances that are
used by LSF.
- Optional:
If required, update the /etc/hosts file to add the management host name to the
/etc/hosts file on the Azure instance, or configure the DNS/NIS client on the
instance when they are used.
You must use the short host name when specifying the management host. The host names of the new
instances that the LSF resource connector creates on Azure follows the
"host-a-b-c-d" format, where "a-b-c-d"
corresponds to the instance's IPv4 address (a.b.c.d). Add these entries
into your custom DNS/NIS server or /etc/hosts file.
-
Manually start the LSF
daemons on the instance and make sure that the instance can join the LSF
cluster as a dynamic host.
Note: Disable LSF_LOCAL_RESOURCES="[resource azurehost]" in the
lsf.conf before your test, as hosts with LSF
resource connector flagged resources ("azurehost") are omitted by LSF if it
is not dynamically created by the LSF
resource connector. You must enable this parameter before creating the image.
If the instance cannot join the LSF
cluster as a dynamic host, check the VPN, firewall, or security group settings. Check whether the
management host can ping the instance (and whether the instance can ping the management host) using its
private IP address. If the management host can ping the instance using its IP address but not using the
host name, configure the host name resolution properly.
-
Shut down the LSF
daemons and log out from the instance.
-
Log in to the Azure instance.
-
Deprovision and shut down the instance.
sudo waagent -deprovision+user -force && halt
-
Capture the image.
Run the following Azure commands:
az vm deallocate -g "resource_group_name" -n "instance_name"
az vm generalize -g "resource_group_name" -n "instance_name"
az image create -n "image_name" -g "resource_group_name" --os-type Linux --source "instance_name"
The name of the image (imageId) is required in the
azureprov_templates.json file for LSF to
decide when borrowing happens, and when instances are launched from which image.
For more details on capturing an image, refer to the Microsoft Azure documentation: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/capture-image
-
Add the image to the Azure resource group that you created for LSF.
After capturing the image, the Azure instance cannot start and use this image again. If you need
re-create the image, create a new instance using the image that you created, or create a new
LSF cloud image.
-
Install the LSF
resource connector for Azure.
-
Log in to the LSF management
host as root.
-
Install Java, Version 1.8, or later.
-
Source the LSF
environment.
- For csh or tsch: source
LSF_TOP/conf/cshrc.lsf
- For sh, ksh, or bash: .
LSF_TOP/conf/profile.lsf
-
Copy the LSF
resource connector package to the LSF management
host and extract the package.
-
From the extracted package directory, copy the contents of the
LSF_VERSION/resource_connector/azure directory to the
LSF_TOP/LSF_VERSION/resource_connector
directory
-
Create the azure subdirectory in the
LSF_TOP/conf/resource_connector directory.
-
From the extracted package directory, copy the contents of the
LSF_VERSION/resource_connector/azure/conf directory to the
LSF_TOP/conf/resource_connector/azure directory.
-
Edit the
LSF_TOP/conf/resource_connector/hostProviders.json file and
add an entry for the Azure provider.
-
Replace the ebrokerd executable file in the LSF_SERVERDIR
directory with the ebrokerd executable in the extracted package.
-
Run the badmin mbdrestart command to apply the changes