Use AWS spot instances

Use spot instances to bid on spare Amazon EC2 computing capacity. Since spot instances are often available at a discount compared to the pricing of On-Demand instances, you can significantly reduce the cost of running your applications, grow your application’s compute capacity and throughput for the same budget, and enable new types of cloud computing applications.

With spot instances you can reduce your operating costs by up to 50-90%, compared to on-demand instances. Since spot instances typically cost 50-90% less, you can increase your compute capacity by 2-10 times within the same budget.

Spot instances are supported on any Linux x86 system that is supported by LSF.

Spot Instances have some restrictions, including instance types and fleet limitations. For more information, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html

Requesting Spot instances

Submit the job that requires spot instance pricing with the pricing==spot resource requirement in the bsub command:
bsub -R “awshost && pricing==spot” myjob
The pricing resource must be configured in the lsf.shared file:
Begin Resource
RESOURCENAME   TYPE   INTERVAL  INCREASING  DESCRIPTION
...
pricing        String   ()       ()         (Pricing option: spot/ondemand)
...
End Resource

Spot instances are reclaimed when the spot price goes higher than the current bid price.

You can also configure an AWS template to use spot instances.
awsprov_templates.json:
{
            "templateId": "aws-spotvm-demo",
            "maxNumber": 2,
            "attributes": {
                …
                "awshost": ["Boolean", "1"],
               "pricing": ["String", "spot"],
              },
            ...
            ...
            ...
            "userData": "pricing=spot"
    },
Edit the user_data.sh script to use the spot instance pricing resource:

#!/bin/bash
echo START >> /var/log/user-data.log 2>&1
# run hostsetup
...
if [ -n "${pricing}" ]; then
sed -i "s/\(LSF_LOCAL_RESOURCES=.*\)\"/\1 [resourcemap ${pricing}*pricing]\"/" $LSF_CONF_FILE
echo "update LSF_LOCAL_RESOURCES lsf.conf successfully, add [resourcemap ${pricing}*pricing]" >> $logfile
fi
...

The user_data.sh script is located in the <LSF_TOP>/<LSF_VERSION>/resource_connector/aws/scripts directory.

Security requirements for spot instances

Logging and troubleshooting

To increase traceability, use the TRACE log level in the LogLevel parameter in the awsprov_config.json file. This log level prints the entry of the method with the value of the parameters and the exit of the method with the return value (if exists).

The following troubleshooting messages are created when the log level is configured as DEBUG. For troubleshooting purposes, every state change on a Spot instance request is logged with a predefined format:
Spot Fleet Request ID – Spot Instance Request Id- Spot Instance Machine ID: State update message

Limitations and known issues

  • The Spot Instance Termination Notice is not accurate if the system clock is not synchronized between the management host and the compute host. System clock synchronization is required for reclaim to work.

    The following AWS topic explains this issue: : http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

    .
  • If a request remains pending for 60 minutes, resource connector assumes that the request is lost. The request is ignored and LSF recalculates the demand. In AWS Spot instances, the request remains pending and is not closed.
  • LSF checks periodically for any hosts that are planned to be reclaimed and requeues the jobs within the 2 minute termination notice. However, it's possible that AWS might not honor the 2 minute termination notice, and machines are terminated without a termination notice. For more information, see: : http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices