By Chandan Chopra
Many organizations have started to build infrastructure for AI using IBM Power Systems, which leverage NVIDIA GPUs. Enterprises often focus on building AI solutions that provide high availability, automated orchestration and the like, which can add to the cost of the solution. Educational institutions and research organizations, however, often look for solutions that give them more flexibility in utilizing underlying resources optimally for their machine learning and deep learning (ML/DL) workloads, and with much lower costs. Researchers may require running parallel DL training jobs using different AI runtimes. Professors may require allocating and deallocating AI runtimes to multiple students for AI assignments. This is where “community edition” software might be the best option. The components discussed in the remainder of this article can help you build a cost-effective AI solution.
Building ML/DL models with open-source AI frameworks using IBM Watson Machine Learning (Community Edition)
If you use open source AI frameworks like Keras, PyTorch, TensorFlow, Caffe, and the like, Watson Machine Learning Community Edition(WML CE) packages these and many other popular deep learning frameworks, supporting libraries and tools as easily installable software. (Refer to the complete list of packagesthat are part of WML CE.) WML CE not only provides the latest versions of these frameworks and the accelerated Machine Learning librarybut also optimizes them to use the underlying hardware capabilities (with or without GPUs) for optimized deep learning training.
WML CE is pre-bundled at no charge with IBM Power Systems accelerated computing serversbuilt with GPUs. WML CE runs on Linux operating system, Ubuntu and Red Hat Linux.
Deploying WML CE as Docker containers
While WML CE can be set up in standalone mode using an online conda repository, it can also be installed as Docker containers using imagesavailable at Docker Hub.
How does the WML CE container environment help in a multi-user environment?
- Isolates the runtime WML CE instance for each user or student: This provides easier management and better performance compared to virtual machines.
- Limits the number of GPUs per container: Manages resources based on workload requirements.
- Shares GPU among more than one container (Docker doesn’t allow specifying GPU shares and access priority for multiple containers): This helps multiple students with minor assignments that aren’t GPU-intensive.
- Enables python2 and python3 environments: Different WML CE Docker images are available for python2 and python3, and students can choose based on their code compatibility.
- Customizes Docker images with required python packages, Jupyter notebooks: Customized containers can be committed to a new Docker image and reused later.
- Retains data, notebooks and codes that users may create in their container: Docker containers can use internal disks or external storage to save user data. Data access profiles for multiple students can be saved on storage and accessed by respective containers during labs or assignments.
- Allows access to containers using ssh: If students are not comfortable with or new to Docker commands, the ssh interface can be enabled for containers.
Docker (community edition) and NVIDIA Docker plugin are available at no charge with supported operating systems. WML CE Docker images can be freely downloaded from Docker Hub.
Bringing container orchestration using Kubernetes-based IBM Cloud Private (Community Edition)
Container orchestration simplifies container management. Kubernetes is a popular container orchestration open source software. IBM Cloud Privateis built on top of Kubernetes. IBM Cloud Private Community Edition (ICP-CE) brings lot of value with container orchestration. A WML CE helm chart available in the catalogue of ICP-CE provides a simplified chart to deploy WML CE containers. If there is more than one node in an ICP cluster, ICP can deploy container pods based on defined policy in ICP.
How does ICP-CE container orchestration help administration?
- Deploy:System administrators are not conversant with Docker commands. ICP provides an easy-to-use GUI and helm charts to quickly deploy WML-CE containers. More containers for new users can be added using ICP-CE scaling feature. See my quick demo for WML-CE deployment with ICP.
- Monitor:Dashboard provides cluster-wide view of CPU and GPU resources available and allocated.
- Orchestrate:Manual container management can become tedious. ICP helps to restore containers automatically in the rare event of container or node failure.
- Upgrade:Quickly upgrade multiple containers across the cluster with a newer version of WML-CE. ICP-CE performs canary upgrade across the cluster. Perform version rollback to restore previous containers. See my quick demo for WML-CE upgrade and rollback with ICP.
- Operate:Quickly launch your Jupyter notebooks and run parallel environments across multiple containers. See my quick demo for using Tensorflow and Caffe frameworks in parallel in WML-CE.
ICP-CE is available to download at no charge. IBM provides public Slack channel support for ICP-CE users.
The community edition software described in this post limits the high availability and support typically expected in an enterprise solution, but it works well for organizations that want to leverage extreme AI performance with flexibility using IBM Power Systems in a cost-effective way.
If you are an enterprise, educational or research organization looking to build an on-premises environment for ML/DL requirements, IBM Systems Lab Services can help you in your AI journey. Contact us today.