Recent years have brought a tremendous proliferation of hardware acceleration and computation devices such as GPUs and FPGAs to address the ever-increasing need for computational power. Deep learning, which requires the processing of large volumes of data through computationally intensive neural networks, has both been enabled by and driven the development of advanced computational hardware. In conjunction, several tensor-computation packages specialized towards deep learning have been developed to leverage the compute capabilities of these advanced new devices. However, tensor computations are not restricted to deep learning and arise in various other fields, including:
- Scientific computing
- Numerical optimization
- Image and signal processing
- General machine learning
- Data science
Current implementations of tensor operations, present in all deep-learning packages, still lack in one or more aspects. For instance, the implementation may lack in modularity, or provide support for only a limited number of data types, with complex numbers often missing. Internally, there may be no or a minimal amount of flexibility in the memory layout of tensors, and operations used to manipulate the tensors may not be available for stand-alone usage.
Introducing the Ocean Tensor Package
Given the need for a comprehensive general-purpose tensor package, I developed the Ocean Tensor Package. The Ocean Tensor Package has a modular design that makes it easy to add new functionality, provide support for new and emerging device types, and install packages on a per-need basis. Moreover, the layered implementation makes it possible for users to access functions ranging from low to high level. In particular, the Ocean Tensor Package consists of three layers:
- The Solid foundation library, which provides low-level functions that are independent of the higher-level tensor representation;
- The Ocean tensor library, which implements the tensor and module infrastructure and provides the high-level tensor APIs; and
- A Python interface that provides user-friendly access to all tensor functions, as well as interoperability with existing packages.
The Ocean Tensor Package provides support for various integer, floating-point, and complex data types and supports non-aligned and byteswapped memory layouts. It supports automatic conversion between data types and devices, as well as dimension broadcasting, and can be configured to provide low-level control over all operations. On the GPU, high levels of asynchronicity are enabled by consistent usage of streams and the availability of special intermediate tensors.
As an example of the flexible, but well-defined, usage of different devices and data types, consider the below implementation of the modified Gram-Schmidt algorithm for QR factorization, in which the byteswapped double-precision Q matrix is updated in-place on the CPU, and the single-precision R matrix is maintained on a GPU device.
The Ocean Tensor Package runs on various platforms, including MacOS and Linux, on Intel and Power machines, with or without GPU devices, and is available as open-source software at https://github.com/ibm/ocean-tensor-package. A preprint of the accompanying paper can be found at https://arxiv.org/abs/1810.08723.