Resource planning for compute intensive workloads like EDA has always been a challenge that requires compromise.

The middle ground between the cost of compute resources and the cost of delayed decisions is hard to find. With the advent of cloud bursting, a new flexibility has arrived to break this standoff. Now, when your data center capacity is strained, your existing IBM Spectrum LSF cluster can be extended to the IBM Cloud, where virtually unlimited resources are available and you only pay for what you use.

By adding cloud bursting to your LSF Cluster, you can choose how much capacity to employ to suit your business needs. When time is money, the cloud is ready. When demand is low, the meter stops and the cloud will wait.

In this blog post, we will talk about building a proof-of-concept cloud-bursting EDA workload environment using an IBM Spectrum LSF cluster that is running in an existing on-premises data center, IBM-provided automation scripting and documentation and, most importantly, the IBM Cloud.

Start with a pre-existing, on-premises cluster

The cluster we use is located in a Yorktown, NY lab. This cluster is minimal and consists of two nodes—a master and a worker. A production cluster or a next-stage proof of concept might contain hundreds of nodes and thousands of cores. For the purposes of this introductory proof of concept, the size of the on-premises cluster is unimportant. Our main interest is seeing how work can be shifted to cloud resources.

Use our existing automation to build the cloud cluster

The on-premises cluster can be either an existing test or production cluster or possibly a minimal cluster you have put together for the purposes of exploring cloud bursting. The next step is building the cloud portion of the multicluster. Again, size is probably not important at this point, so a minimal cluster is a good place to start. Our reference cloud cluster has three nodes—one master and two workers.

If you set out to build a cloud cluster (the cloud half of a multicluster) from scratch, there is a long list of provisioning and configuration tasks, including the following (and probably more):

  • Installing the required software packages on your deployer
  • Provisioning a VPC
  • Provisioning the virtual instances (master(s) and workers)
  • Provisioning a DNS service
  • Configuring and connecting a VPN that connects the on-premises cluster to the cloud
  • Installing and configuring IBM Spectrum LSF

Luckily, IBM Cloud has a first-class offering that can be used through the catalog UI, API and CLI interfaces. The offering leverages Terraform for automation of cloud resources and IBM Cloud Schematics for API and CLI interfaces. Learn more.

Not only does this automation make the initial setup of a proof-of-concept cluster straightforward, it is the underlying toolset that will deliver the fast provisioning and teardown of resources that define cloud bursting.

In addition, a tutorial was written to provide step-by-step instructions on using the offering to create the cluster.

Bring in the workload

As proof of concept, we have chosen two common EDA packages to run on our multicluster: Optical Proximity Correction (OPC) and Design Rule Checking (DRC). Depending on your EDA vendor and the packages you intend to run, you will likely encounter specific challenges in bringing up your workload on a multicluster that are beyond the scope of this blog entry. Hopefully, this general discussion of how we ran our workload and some of the challenges we overcame will help you in building your cloud-bursting proof of concept.

Run your workload on the cloud cluster

Before you can run your workload, you will need to prepare the cloud cluster by installing the software for your EDA workload or, alternatively, use data dependencies to bring over needed software as part of the bstage in process (or, possibly, some combination of the two).

You will need to give the cluster access to the license service. See the Certificates and License Management section for more information.

Depending on your workload characteristics, you may need to ensure that jobs are sent to a particular node for processing. This can be accomplished with the bsub -R command. This was useful for our workload since the initial deployment of a job is very resource-intensive because work is first divided into tiles (subtasks) and then distributed to workers.

Data management

Moving EDA tasks to the cloud requires careful attention to data management for several reasons:

  1. It is likely that the two clusters will not share a single filesystem.
  2. The connection between the on-premises and cloud clusters will, to a varying extent, be bandwidth-limited.
  3. Depending on your terms of service, minimizing data movement on and off the cloud can reduce cost.

The Spectrum LSF data manager should have been installed and configured on both the on-premises and cloud clusters as part of the Deployment step of the automated setup process. The following were some of the key points in configuring data management for our workload:

  • There is an additional setup step that, for security reasons, requires manual intervention. Each user will need to log in to the cloud master, obtain their ssh public key, and add that key to the authorized keys of the on-premises master.
  • When a job is submitted, the user will need to point out data dependencies employing the -data option to the LSF bsub command.
  • The user’s LSF jobs will need to make the input data available for processing with the bstage in command and make the job output available for post-processing by using the bstage out command. This can be as simple as wrapping the existing run-script in bstage in/out commands to transfer all required data. This can include binaries or scripts as well, as long as they are executed after bstage in.

Certificates and license management

Licensing an EDA workload for a multicluster that spans on-premises and cloud domains is a fairly new and developing domain for license management. For our proof-of-concept workload, we used a FlexLM floating license tied to a cloud server in the IBM Cloud London data center. By configuring the transit gateway to span availability zones, we were able to run our workload in the Dallas data center (where the Cloud portion of our multicluster resides), with licensing provided by a license server in the London data center. This scenario is, of course, particular to our workload vendor and licensing terms, but is used only to illustrate how features of the IBM Cloud VPC can be employed to assist in license management.

Monitor the work

Since the multicluster consists of two cooperating clusters, once a job is sent to the cloud cluster—and until it completes—the on-premises cluster’s job and queue monitoring commands will have limited information about the job’s status. There may be times when you would like to see detailed status information while the job is in progress. This can be accomplished by logging into the cloud cluster’s console and running monitoring commands there.

Happy cloud bursting!

Besides the instructions on EDA workload setup that we have provided in this blog, much of what is needed to start cloud bursting your workload is handled by the automation scripts. Together, they should provide you with much of what you will need to set up your own proof-of-concept EDA environment that makes use of IBM Cloud. We hope that you’ll give it a try!

More from Cloud

Strengthening cybersecurity in life sciences with IBM and AWS

7 min read - Cloud is transforming the way life sciences organizations are doing business. Cloud computing offers the potential to redefine and personalize customer relationships, transform and optimize operations, improve governance and transparency, and expand business agility and capability. Leading life science companies are leveraging cloud for innovation around operational, revenue and business models. According to a report on mapping the cloud maturity curve from the EIU, 48% of industry executives said cloud has improved data access, analysis and utilization, 45% say cloud…

7 min read

Kubernetes version 1.27 now available in IBM Cloud Kubernetes Service

< 1 min read - We are excited to announce the availability of Kubernetes version 1.27 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 22nd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.25 (soon to be 1.26); you can also choose to immediately deploy version 1.27. Learn more about deploying clusters here. Kubernetes version 1.27 In…

< 1 min read

Redefining the consumer experience: Diageo partners with SAP and IBM on global digital transformation

3 min read - In an era of evolving consumer preferences and economic uncertainties, the beverage industry stands as a vibrant reflection of changing trends and shifting priorities. Despite the challenges posed by inflation and the cost-of-living crisis, a dichotomy has emerged in consumer behavior, where individuals untouched by the crisis continue to indulge in their favorite beverages, while those directly affected pivot towards more affordable luxuries, such as a bottle of something special. This intriguing juxtaposition highlights the resilient nature of consumers and…

3 min read

IBM Cloud releases 2023 IBM Cloud for Financial Services Agreed-Upon Procedures (AUP) Report

2 min read - IBM Cloud completed its 2023 independent review of IBM Cloud services and processes. The review report demonstrates to its clients, partners and other interested parties that IBM Cloud services have implemented and adhere to the technical, administrative and physical control requirements of IBM Cloud Framework for Financial Services. What is the IBM Cloud Framework for Financial Services? IBM Cloud for Financial Services® is designed to build trust and enable a transparent public cloud ecosystem with features for security, compliance and…

2 min read