Recent years have brought a tremendous proliferation of hardware acceleration and computation devices such as GPUs and FPGAs to address the everincreasing need for computational power. Deep learning, which requires the processing of large volumes of data through computationally intensive neural networks, has both been enabled by and driven the development of advanced computational hardware. In conjunction, several tensorcomputation packages specialized towards deep learning have been developed to leverage the compute capabilities of these advanced new devices. However, tensor computations are not restricted to deep learning and arise in various other fields, including:
 Scientific computing
 Numerical optimization
 Image and signal processing
 General machine learning
 Data science
Current implementations of tensor operations, present in all deeplearning packages, still lack in one or more aspects. For instance, the implementation may lack in modularity, or provide support for only a limited number of data types, with complex numbers often missing. Internally, there may be no or a minimal amount of flexibility in the memory layout of tensors, and operations used to manipulate the tensors may not be available for standalone usage.
Introducing the Ocean Tensor Package
Given the need for a comprehensive generalpurpose tensor package, I developed the Ocean Tensor Package. The Ocean Tensor Package has a modular design that makes it easy to add new functionality, provide support for new and emerging device types, and install packages on a perneed basis. Moreover, the layered implementation makes it possible for users to access functions ranging from low to high level. In particular, the Ocean Tensor Package consists of three layers:
 The Solid foundation library, which provides lowlevel functions that are independent of the higherlevel tensor representation;
 The Ocean tensor library, which implements the tensor and module infrastructure and provides the highlevel tensor APIs; and
 A Python interface that provides userfriendly access to all tensor functions, as well as interoperability with existing packages.
The Ocean Tensor Package provides support for various integer, floatingpoint, and complex data types and supports nonaligned and byteswapped memory layouts. It supports automatic conversion between data types and devices, as well as dimension broadcasting, and can be configured to provide lowlevel control over all operations. On the GPU, high levels of asynchronicity are enabled by consistent usage of streams and the availability of special intermediate tensors.
Illustrative example
As an example of the flexible, but welldefined, usage of different devices and data types, consider the below implementation of the modified GramSchmidt algorithm for QR factorization, in which the byteswapped doubleprecision Q matrix is updated inplace on the CPU, and the singleprecision R matrix is maintained on a GPU device.
Availability
The Ocean Tensor Package runs on various platforms, including MacOS and Linux, on Intel and Power machines, with or without GPU devices, and is available as opensource software at https://github.com/ibm/oceantensorpackage. A preprint of the accompanying paper can be found at https://arxiv.org/abs/1810.08723.
Sample implementation




