Get started with PyTorch

Discover how this native Python package redesigns and implements Torch in Python


PyTorch is an open source Python package released under the modified Berkeley Software Distribution license. Facebook, the Idiap Research Institute, New York University (NYU), and NEC Labs America hold the copyrights for PyTorch. Although Python is the language of choice for data science, PyTorch is a relative newcomer to the deep learning arena.

In this article, you can find an overview of the PyTorch system and learn more about the following topics:

  • Background of PyTorch
  • Benefits of using PyTorch
  • Typical PyTorch applications
  • Supported platforms
  • Hardware acceleration
  • Installation considerations and options

Overview of PyTorch

Neural network algorithms typically compute peaks or troughs of a loss function, with most using a gradient descent function to do so. In Torch, PyTorch's predecessor, the Torch Autograd package, contributed by Twitter, computes the gradient functions. Torch Autograd is based on Python Autograd.

Python packages such as Autograd and Chainer both use a technique known as tape-based auto-differentiation to calculate gradients. As such, those packages heavily influenced and inspired the design of PyTorch.

Tape-based auto-differentiation works just like a tape recorder in that it records operations performed, and then replays them to compute gradients—a method also known as reverse-mode auto-differentiation . PyTorch Autograd has one of the fastest implementations of this function.

Using this feature, PyTorch users can tweak their neural networks in an arbitrary manner without overhead or lag penalties. As a result, unlike in most well-known frameworks, PyTorch users can dynamically build graphs, with the framework's speed and flexibility facilitating the research and development of new deep learning algorithms.

Some of this performance comes from the modular design used in the PyTorch core. PyTorch implements most of the tensor and neural network back ends for CPU and graphical processing unit (GPU) as separate and lean C-based modules, with integrated math acceleration libraries to boost speed.

PyTorch integrates seamlessly with Python and uses the Imperative coding style by design. Moreover, the design keeps the extensibility that made its Lua-based predecessor popular. Users can program by using C/C++ with an extension application programming interface (API) based on the C Foreign Function Interface (cFFI) for Python.


Before diving into what PyTorch can do and the benefits it brings, let's get a bit of background.

What is Torch?

Torch is a modular, open source library for machine learning and scientific computing. Researchers at NYU first developed Torch for academic research. The library's use of the LuaJIT compiler improves performance, and C-based NVIDIA CUDA extensions enable Torch to take advantage of GPU acceleration.

Many developers use Torch as a GPU-capable NumPy alternative; others use it to develop deep learning algorithms. Torch has gained prominence because of its use by Facebook and Twitter. The Google DeepMind artificial intelligence (AI) project started out using Torch, and then switched to TensorFlow.

What are Lua and LuaJIT?

Lua is a lightweight scripting language that supports multiple programming models; it has its origins in application extensibility. Lua is compact and written in C, which makes it able to run on constrained embedded platforms. The Pontifical Catholic University of Rio de Janeiro in Brazil first introduced Lua in 1993.

LuaJIT is a Just-in-time (JIT) compiler with platform-specific optimizations to improve the performance of Lua. It also extends and enhances the C application programming interface (API) of standard Lua.

What is PyTorch?

Two PyTorch variants exist. Originally, Hugh Perkins developed pytorch as a Python wrapper for the LuaJIT-based Torch framework. The PyTorch variant this article discusses, however, is a completely new development. Unlike the older variant, PyTorch no longer uses the Lua language and LuaJIT. Instead, it is a native Python package.

PyTorch redesigns and implements Torch in Python while sharing the same core C libraries for the back-end code. PyTorch developers tuned this back-end code to run Python efficiently. They also kept the GPU-based hardware acceleration as well as the extensibility features that made Lua-based Torch popular with researchers.

Benefits of PyTorch

PyTorch has many benefits. Let's look at some of the major items.

Dynamic computational graphs

Most deep learning frameworks that use computational graphs generate and analyze graphs before runtime. In contrast, PyTorch builds graphs during runtime by using reverse-mode auto-differentiation. Therefore, arbitrary changes to a model do not add runtime lag or overhead to rebuild the model.

PyTorch has one of the fastest implementations of reverse-mode auto-differentiation. Apart from being easier to debug, dynamic graphs allow PyTorch to handle variable-length inputs and outputs, which is especially useful in natural language processing (NLP) for text and speech.

Lean back end

Rather than using a single back end, PyTorch uses separate back ends for CPU and GPU and for distinct functional features. For example, the tensor back end for CPU is TH, while the tensor back end for GPU is THC. Similarly, the neural network back ends are THNN and THCUNN for CPU and GPU, respectively. Individual back ends result in lean code tightly focused for a specific task running on a specific class of processor with high memory efficiency. Use of separate back ends makes it easier to deploy PyTorch on constrained systems, such as those used in embedded applications.

Python-first approach

Although it is a derivative of Torch, PyTorch is a native Python package by design. It does not function as a Python language binding but rather as an integral part of Python. PyTorch builds all its functionality as Python classes. Hence, PyTorch code can seamlessly integrate with Python functions and other Python packages.

Imperative programming style

Because a direct change to the program state triggers computation, code execution isn't deferred and produces simple code, avoiding a lot of asynchronous executions that could cloud how the code executes. When manipulating data structures, this style is intuitive and easy to debug.

Highly extensible

Users can program using C/C++ by using an extension API based on cFFI for Python and compiled for CPU or with CUDA for GPU operation. This feature allows the extension of PyTorch for new and experimental uses and makes it attractive for research use. For example, the PyTorch audio extension allows the loading of audio files.

Typical PyTorch applications

Torch and PyTorch share the same back-end code, and there's often a lot of confusion between Lua-based Torch and PyTorch in the literature. As a result, it's difficult to distinguish between the two unless you look at the timeline.

For example, the Google DeepMind AI project used Torch before switching to TensorFlow. Because the switch happened before the advent of PyTorch, one cannot consider it an example of a PyTorch application. Twitter was a Torch contributor and now uses TensorFlow and PyTorch to fine-tune its ranking algorithms on timelines.

Both Torch and PyTorch have seen heavy use at Facebook to research NLP for text and speech. Facebook has released many open source PyTorch projects, including ones related to:

  • Chat bots
  • Machine translation
  • Text search
  • Text to speech
  • Image and video classification

The PyTorch Torchvision package gives users access to model architectures and pretrained models of popular image classification models such as AlexNet, VGG, and ResNet.

Because of its flexible, extensible, modular design, PyTorch doesn't limit you to specific models or applications. You can use PyTorch as a NumPy replacement or to implement machine learning and deep learning algorithms. For more information about applications and contributed models, see the PyTorch Examples page at

Platforms that support PyTorch

PyTorch requires a 64-bit processor, operating system, and Python distribution to work correctly. To access a supported GPU, PyTorch depends on other software such as CUDA. At the time of publication, the latest PyTorch version was 0.2.0_4. Table 1 shows the availability of prebuilt PyTorch binaries and GPU support for this version.

Operating systems that support PyTorch

Operating systemPython 2.7Python 3.5Python 3.6
64-bit Linux®YesYesYes
CPU and GPU support?BothBothBoth
Mac OS XYesYesYes
CPU and GPU support?CPU onlyCPU onlyCPU only
64-bit Windows®NoNoNo
CPU and GPU support?NoNoNo


  • Do not use the main Anaconda channel for installation. Instead, use the channel from PyTorch maintainer soumith to ensure support for later versions of CUDA and properly optimized CPU and GPU back ends as well as support for Mac OS X.
  • You can enable CUDA GPU support on Mac OS X by building PyTorch from source.
  • PyTorch does not officially support Windows. Windows support on Python 3.x is community created and experimental. CPU support is available, but GPU support is partial because the build only partially supports CUDA.

For more information, please see the README file at

Building PyTorch from source

PyTorch builds use CMake for build management. To build PyTorch from source and ensure a high-quality Basic Linear Algebra Subprograms (BLAS) library (the Intel® Math Kernel Library, or MKL), the package maintainers recommend using the Anaconda Python distribution. This distribution also ensures a controlled compiler version regardless of the operating system you use.

For details, refer to

PyTorch on PPC64le architectures

PyTorch doesn't officially support PPC64le architectures such as IBM® POWER8®, but you can use a Linux build recipe to build PyTorch from source. These builds are still experimental and don't pass all tests, especially with CUDA enabled. In addition, the math kernel and linear algebra libraries aren't fully optimized.

How does PyTorch use hardware acceleration?

Instead of a single back end, PyTorch uses several separate back-end modules for CPU and GPU and for different tasks. This approach means that each module is lean, focused, and highly optimized for the task. The modules rely heavily on underlying math and linear algebra libraries (MKL and BLAS) for their performance.


Because Anaconda Python tunes MKL for Intel architectures, PyTorch performs best on Intel CPUs. Intel supplies the MKL-Deep Neural Network primitives for its CPU-centric high-performance computing (HPC) architectures, such as the Intel Xeon® and Xeon Phi™ families of processors.

PyTorch could also use OpenBLAS. On IBM PowerPC® machines, IBM makes available the Math Acceleration Subsystem (MASS) for Linux. You can build OpenBLAS with MASS support. IBM also recently announced the integration of Anaconda on its HPC platform based on the IBM Power Systems™ S822LC platform. Therefore, PyTorch would potentially perform well on IBM systems, as well.


PyTorch supports only NVIDIA GPU cards. On the GPU, PyTorch uses NVIDIA CUDA Deep Neural Network (CuDNN) library, a GPU-accelerated library meant for deep learning algorithms. Open Computing Language (OpenCL) support is not on the PyTorch road map, although the Lua-based Torch had limited support for the language. There are no community efforts to port to OpenCL, either.

PyTorch is still trailing behind on the CUDA development curve. PyTorch uses CUDA version 8, while many other deep learning libraries already use CUDA version 9. Earlier PyTorch releases are based on CUDA 7 and 7.5. As such, PyTorch users cannot take advantage of the latest NVIDIA graphics cards. Building from source with CUDA 9 has so far produced mixed results and inconsistent behaviors.

Installation considerations

In general, PyTorch runs on any platform that supports a 64-bit Python development environment. That's enough to train and test most simple examples and tutorials. However, most experts agree that for research or professional development, an HPC platform is strongly recommended.

Processor and memory requirements

Deep learning algorithms are compute intensive and need at least a fast, multicore CPU with vector extensions. In addition, one or more high-end CUDA-capable GPUs is the norm for a deep learning environment.

Processes in PyTorch communicate with each other by using buffers in shared memory, and so allocated memory must be adequate for this purpose. Hence, most experts also recommend having large CPU and GPU RAM because memory transfers are expensive in terms of both performance and energy usage. A larger RAM avoids these operations.

Virtual machine options

Virtual machines (VMs) for deep learning are currently best suited to CPU-centric hardware with multiple cores. Because the host operating system controls the physical GPU, GPU acceleration is challenging to implement on VMs. That said, two main methods exist for doing so:

  • GPU pass-through:
    • GPU pass-through works only on Type-1 hypervisors such as Citrix Xen, VMware ESXi, the Kernel Virtual Machine, and IBM Power.
    • Pass-through has overheads that can vary based on specific combinations of CPU, chip set, hypervisor, and operating system; in general, overhead is significantly less for the latest generations of hardware.
    • A given hypervisor–operating system combination supports specific NVIDIA GPU cards only.
  • GPU virtualization:
    • All major GPU vendors—NVIDIA GRID, AMD MxGPU, and Intel Graphics Virtualization Technology –g (GVT -g)—support GPU virtualization.
    • The latest versions support OpenCL on specific newer GPU cards. (No OpenCL support is available for PyTorch).
    • The latest version of GRID supports CUDA and OpenCL for specific newer GPU cards.

Docker installation options

Running PyTorch in a Docker container or Kubernetes cluster has many advantages. The PyTorch source includes a Dockerfile that in turn includes CUDA support. Docker Hub also hosts a prebuilt runtime Docker image. The main advantage of using Docker is that PyTorch models can access and run on physical GPU cores (devices). In addition, NVIDIA has a tool for Docker Engine called nvidia-docker that can run a PyTorch Docker image on the GPU.

Cloud installation options

Finally, PyTorch offers several cloud installation options:

  • Google Cloud Platform. Google offers machine instances with access to one, four, or eight NVIDIA GPU devices in specific regions. It is also possible to run PyTorch on containerized GPU-backed Jupyter notebooks.
  • Amazon Web Services (AWS). Amazon offers an AWS Deep Learning Amazon Machine Image (AMI) with optional NVIDIA GPU support that can run on a variety of Amazon Elastic Compute Cloud instances. PyTorch, Caffe2, and other deep learning frameworks are preinstalled. AMIs can support up to 64 CPU cores and up to 8 NVIDIA GPUs (K80).
  • Microsoft® Azure® Cloud. You can set PyTorch up on the Microsoft Data Science Virtual Machine family of Azure machine instances either for CPU only or including up to four K80 GPUs.
  • IBM Cloud® data science and data management. IBM offers a Python environment with Jupyter notebooks and Spark. You can install PyTorch by using a Python3 kernel; first ensure that conda is up to date, and then install PyTorch by using the following command:
    conda install pytorch torchvision -c soumith
  • IBM Cloud Kubernetes cluster. Kubernetes clusters on IBM Cloud can run PyTorch Docker images.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Cognitive computing, Open source
ArticleTitle=Get started with PyTorch