September 30, 2021 By Vidyasagar Machupalli 5 min read

Learn how to set up a GPU-enabled virtual server instance (VSI) on a Virtual Private Cloud (VPC) and deploy RAPIDS using IBM Schematics. 

The GPU-enabled family of profiles provides on-demand, cost-effective access to NVIDIA GPUs. GPUs help to accelerate the processing time required for compute-intensive workloads, such as artificial intelligence (AI), machine learning, inferencing and more. To use the GPUs, you need the appropriate toolchain — such as CUDA (an acronym for Compute Unified Device Architecture) — ready. 

Let’s start with a simple question. 

What is a GPU and how is it different from CPU?

Graphics processing units (GPUs) are becoming more prevalent as accelerated computing is rapidly changing the modern enterprise. They support new applications that are resource-demanding and provide new capabilities to gain valuable insights from customer data.

Let’s quickly go over why you could use GPUs in the first place. By operating in a cloud environment, GPUs work in conjunction with a server’s CPU to accelerate application and processing performance. The CPU offloads compute-intensive portions of the application to the GPU, which processes large blocks of data at one time (rather than sequentially) to boost overall performance in a server environment.

The following video covers the basics about GPUs, including the differences between a GPU and CPU, the top GPU applications (including industry examples) and why it’s beneficial to use GPUs on cloud infrastructure:

 

 

What is RAPIDS?

The RAPIDS suite of open-source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS utilizes NVIDIA CUDA primitives for low-level compute optimization and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Here are the features of RAPIDS as described on the NVIDIA developer website:

  • Hassle-free integration: Accelerate your Python data-science toolchain with minimal code changes and no new tools to learn.
  • Top model accuracy: Increase machine-learning model accuracy by iterating on models faster and deploying them more frequently.
  • Reduced training time: Drastically improve your productivity with near-interactive data science.
  • Open source: Customizable, extensible, interoperable — the open-source software is supported by NVIDIA and built on Apache Arrow.

How to provision a GPU-enabled VSI

You will provision a virtual private cloud (VPC), a subnet, a security group with inbound and outbound rules and a virtual service instance (VSI) with a GPU-enabled profile using Terraform scripts with IBM Cloud Schematics. The RAPIDS, along with Docker, are configured via the VSI’s userdata field. 

VPC uses cloud-init technology to configure virtual server instances. The User Data field on the new virtual server for VPC page allows users to put in custom configuration options by using cloud-init. Cloud-init supports several formats for configuration data, including yaml in a cloud-config file.

To understand more about Terraform and IBM Schematics, check this blog post: “Provision Multiple Instances in a VPC Using Schematics.” In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.

Note: You will need an IBM Cloud account with permissions to provision VPC resources. Check the getting started with VPC documentation.

  1. Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
  2. Under the Specify Template section, provide https://github.com/IBM-Cloud/vpc-tutorials/tree/master/vpc-gpu-vsi under GitHub or GitLab repository URL
  3. Select terraform_v0.14 as the Terraform version and click Next.
  4. Provide the workspace name — vpc-gpu-rapids — and choose a resource group and location.
  5. Click Next and then click Create.
  6. You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables:

    IBM Schematics 

  7. Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
  8. Click Apply plan and check the progress under the Log. (Generate plan is optional.)

In the output of the log, you should see the IP_ADDRESS assigned to the VSI. Save the address for future reference. 

Test the setup

Open a terminal. To test the setup, you need to SSH into the VSI with the below command:

ssh -i ~/.ssh/<SSH_PRIVATE_KEY> root@<IP_ADDRESS>

A working setup can be tested by running a base CUDA container:

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The output should look similar to this: 

NVIDIA GPU info

To check the NVIDIA driver version, run the below command:

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.57.02  Tue Jul 13 16:14:05 UTC 2021

GCC version:  gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

If you don’t see the expected output, check the troubleshooting section of the post. 

How to use RAPIDS

The RAPIDS notebooks are one of the ways to explore the capabilities. These Jupyter notebooks provide examples of how to use RAPIDS. These notebooks are designed to be self-contained with the runtime version of the RAPIDS Docker Container.

  1. Run the below commands to pull a Docker image and run a container to explore RAPIDS: 
    docker pull rapidsai/rapidsai:cuda11.0-runtime-ubuntu20.04
    docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 rapidsai/rapidsai:cuda11.0-runtime-ubuntu20.04
  2. Launch a browser and enter your <IP_ADDRESS>:8888.
  3. You should see the Jupyter notebook:

    RAPIDS notebook landing page

  4. On the left pane, click on notebooks > cuml > tools and then launch the notebook. This notebook provides a simple and unified means of benchmarking single GPU cuML algorithms against their skLearn counterparts with the cuml.benchmark package in RAPIDS cuML.
  5. As shown in the image above, click on the icon to launch the GPU dashboards. These dashboards provide a real-time visualization of NVIDIA GPU metrics in interactive Jupyter environments.
  6. Click on GPU Resources and Machine Resources to launch the respective dashboards. 
  7. Run each code cell in the Jupyter notebook until you see the Nearest Neighbors – Brute Force cell.
  8. Run the cell and check the metrics on the GPU resources and Machine Resources dashboard to understand the GPU utilization metrics and CPU metrics, respectively:

    GPU utilization dashboard

Below the cell, you can check the CPU vs. GPU benchmark metrics.

Troubleshooting

To check the cloud-init logs, run the following command:

cat /var/log/cloud-init-output.log

If the install is configured successfully, you should see a message similar to this:

Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.04.1 running 'modules:final' at Thu, 23 Sep 2021 08:51:11 +0000. Up 36.08 seconds.

Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.04.1 finished at Thu, 23 Sep 2021 08:59:30 +0000. Datasource DataSourceNoCloud [seed=/dev/vdb][dsmode=net].  Up 534.16 seconds

Related information

If you have any queries, feel free to reach out to me on Twitter or on LinkedIn

Was this article helpful?
YesNo

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters