Getting started with PyTorch
Find information about getting started with PyTorch.
This release of WML CE includes PyTorch 1.1.0, which includes support for the IBM® Distributed Deep Learning (DDL) and Large Model Support (LMS).
PyTorch examples
The PyTorch package includes a set of examples. A script is provided to copy the sample content into a specified directory:
pytorch-install-samples <somedir>
PyTorch and DDL
WML CE Distributed Deep Learning is directly
integrated into PyTorch, in the form of ddl
backend in PyTorch's communication
package torch.distributed
.
Find more information at Integration with deep learning frameworks.
PyTorch cpp_extensions tests
The cpp_extensions
tests that are run with pytorch-test
require NVCC and a C++ compiler with C++11 ABI tagging (similar to g++ version 7). These packages
are not listed in the pytorch
conda packages as dependencies, however. In order to
use these tests, you must install the cudatoolkit-dev
conda package. You also need
g++ version 7 installed and set with the CXX environment variable or to a symlink with the
c++ command. One way to install the correct compiler is to run, depending on your
architecture, either gxx_linux-ppc64le or gxx_linux-64 version
7 with conda. If you do not install the cudatoolkit-dev
and set up a C++ compiler,
when running pytorch-test, you will get an info message about the
cpp_extensions
tests not being run and the tests will be skipped.
Large Model Support (LMS)
Large Model Support is a feature provided in WML CE PyTorch that allows the successful training of deep learning models that would otherwise exhaust GPU memory and abort with “out-of-memory” errors. LMS manages this oversubscription of GPU memory by temporarily swapping tensors to host memory when they are not needed.
One or more elements of a deep learning model can lead to GPU memory exhaustion. These include:
- Model depth and complexity
- Base data size (for example, high-resolution images)
- Batch size
Traditionally, the solution to this problem has been to modify the model until it fits in GPU memory. This approach, however, can negatively impact accuracy – especially if concessions are made by reducing data fidelity or model complexity.
With LMS, deep learning models can scale significantly beyond what was previously possible and, ultimately, generate more accurate results.
LMS usage
A PyTorch program enables Large Model Support by calling
torch.cuda.set_enabled_lms(True)
prior to model creation.
In addition, a pair of tunables is provided to control how GPU memory used for tensors is managed under LMS.
torch.cuda.set_limit_lms(limit)
Defines the soft limit in bytes on GPU memory allocated for tensors (default: 0).
By default, LMS favors GPU memory reuse (moving inactive tensors to host memory) over new allocations. This effectively minimizes GPU memory consumption.
However, when a limit is defined, the algorithm favors allocation of GPU memory up to the limit prior to swapping any tensors out to host memory. This allows the user to control the amount of GPU memory consumed when using LMS.
Tuning this limit to optimize GPU memory utilization, therefore, can reduce data transfers and improve performance. Since the ideal tuning for any given scenario may differ, it is considered a best practice to determine the value experimentally, arriving at the largest value that does not result in an out of memory error.
torch.cuda.set_size_lms(size)
Defines the minimum tensor size in bytes that is eligible for LMS swapping (default: 1 MB).
Any tensor smaller than this value is exempt from LMS reuse and persists in GPU memory.
LMS example
The PyTorch imagenet example provides a simple illustration of Large Model Support in action. ResNet-152 is a deep residual network that requires a significant amount of GPU memory.
On a system with a single 16 GB GPU, without LMS enabled, a training attempt with the default batch size of 256 will fail with insufficient GPU memory:
python main.py -a resnet152 -b 256 [imagenet-folder with train and val folders]
=> creating model 'resnet152'
[...]
RuntimeError: CUDA error: out of memory
After enabling LMS, the training proceeds without issue:
git diff
--- a/imagenet/main.py
+++ b/imagenet/main.py
@@ -90,6 +90,7 @@ def main():
world_size=args.world_size)
# create model
+ torch.cuda.set_enabled_lms(True)
if args.pretrained:
print("=> using pre-trained model '{}'".format(args.arch))
model = models.__dict__[args.arch](pretrained=True)
python main.py -a resnet152 -b 256 [imagenet-folder with train and val folders]
=> creating model 'resnet152'
Epoch: [0][0/5005] [...]
Epoch: [0][10/5005] [...]
Epoch: [0][20/5005] [...]
Epoch: [0][30/5005] [...]
Epoch: [0][40/5005] [...]
Epoch: [0][50/5005] [...]
Epoch: [0][60/5005] [...]
[...]
WML CE PyTorch API Extensions for LMS
Large Model Support extends the torch.cuda package to provide the following control and tuning interfaces.
torch.cuda.set_enabled_lms(enable)
- Enable/disable Large Model Support.
Parameters: enable (bool): desired LMS setting.
torch.cuda.get_enabled_lms()
-
Returns a bool indicating whether Large Model Support is currently enabled.
torch.cuda.set_limit_lms(limit)
- Sets the allocation limit (in bytes) for LMS.
Parameters: limit (int): soft limit on GPU memory allocated for tensors.
torch.cuda.get_limit_lms()
-
Returns the allocation limit (in bytes) for LMS.
torch.cuda.set_size_lms(size)
- Sets the minimum size (in bytes) for LMS.
Parameters: size (int): any tensor smaller than this value is exempt from LMS reuse and persists in GPU memory.
torch.cuda.get_size_lms()
-
Returns the minimum size (in bytes) for LMS.
More information
The PyTorch home page has various information, including tutorials and a getting started guide.
Additional tutorials and examples are available from the community: