Posted in: Open Source

Ocean Tensor Package: General-Purpose Software for Tensor Computation

Recent years have brought a tremendous proliferation of hardware acceleration and computation devices such as GPUs and FPGAs to address the ever-increasing need for computational power. Deep learning, which requires the processing of large volumes of data through computationally intensive neural networks, has both been enabled by and driven the development of advanced computational hardware. In conjunction, several tensor-computation packages specialized towards deep learning have been developed to leverage the compute capabilities of these advanced new devices. However, tensor computations are not restricted to deep learning and arise in various other fields, including:

  • Scientific computing
  • Numerical optimization
  • Image and signal processing
  • General machine learning
  • Data science

Current implementations of tensor operations, present in all deep-learning packages, still lack in one or more aspects. For instance, the implementation may lack in modularity, or provide support for only a limited number of data types, with complex numbers often missing. Internally, there may be no or a minimal amount of flexibility in the memory layout of tensors, and operations used to manipulate the tensors may not be available for stand-alone usage.

Current implementations of tensor operations

Current implementations of tensor operations still lack in one or more aspects.

Introducing the Ocean Tensor Package

Given the need for a comprehensive general-purpose tensor package, I developed the Ocean Tensor Package. The Ocean Tensor Package has a modular design that makes it easy to add new functionality, provide support for new and emerging device types, and install packages on a per-need basis. Moreover, the layered implementation makes it possible for users to access functions ranging from low to high level. In particular, the Ocean Tensor Package consists of three layers:

  1. The Solid foundation library, which provides low-level functions that are independent of the higher-level tensor representation;
  2. The Ocean tensor library, which implements the tensor and module infrastructure and provides the high-level tensor APIs; and
  3. A Python interface that provides user-friendly access to all tensor functions, as well as interoperability with existing packages.

 

The Ocean Tensor Package consists of three layers

The Ocean Tensor Package consists of three layers.

The Ocean Tensor Package provides support for various integer, floating-point, and complex data types and supports non-aligned and byteswapped memory layouts. It supports automatic conversion between data types and devices, as well as dimension broadcasting, and can be configured to provide low-level control over all operations. On the GPU, high levels of asynchronicity are enabled by consistent usage of streams and the availability of special intermediate tensors.

Illustrative example

As an example of the flexible, but well-defined, usage of different devices and data types, consider the below implementation of the modified Gram-Schmidt algorithm for QR factorization, in which the byteswapped double-precision Q matrix is updated in-place on the CPU, and the single-precision R matrix is maintained on a GPU device.

Availability

The Ocean Tensor Package runs on various platforms, including MacOS and Linux, on Intel and Power machines, with or without GPU devices, and is available as open-source software at https://github.com/ibm/ocean-tensor-package. A preprint of the accompanying paper can be found at https://arxiv.org/abs/1810.08723.

Sample implementation

def InplaceQR(Q,R) :
   n = Q.size[1]
   for i in range(n) :
      q = Q[:,i]
      r = ocean.sqrt(q.T * q)
      q /= r
      R[i,i] = r
      for j in range(i+1,n) :
         r = q.T * Q[:,j]
         Q[:,j] -= q * r
         R[i,j] = r
def InplaceQR(Q,R) :
   n = Q.size[1]
   for i in range(n) :
      q = Q[:,i]
      r = ocean.sqrt(q.T * q)
      q /= r
      R[i,i] = r
      if (i+1 < n) :
         r = Q[:,i+1:].T * q
         Q[:,i+1:] -= q * r.T
         R[i,i+1:] = r
import ocean

# Create an example matrix A with one added to the diagonal
# entries to make if full rank.
A = ocean.arange(25, ocean.double).reshape(5,5)
d = A.diag(); d += 1;

# As an example, create a byte-swapped copy Q on the cpu and a
# single-precision result tensor R on gpu[0].
Q = A.clone(); Q.byteswap()
R = ocean.zeros(A.size, ocean.float, ocean.gpu[0])

# Call the in-place QR factorization code (see code above)
InplaceQR(Q,R)

# Display matrices, verify orthogonality, and check factorization
print(Q); print(R)
print(ocean.norm(Q.T * Q - ocean.eye(Q.size[0])))
print(ocean.norm(Q*R - A))
# Matrix Q
(:,:)
    0.17961   0.41037   0.58318   0.51237   0.44353
    0.17961   0.77910  -0.59087  -0.07841   0.07392
    0.35921   0.26763   0.51389  -0.66920  -0.29569
    0.53882  -0.05947  -0.06544   0.50915  -0.66530
    0.71842  -0.38658  -0.20594  -0.15575   0.51745


# Matrix R
(:,:)
    5.56776   15.44606   25.50395   35.56185   45.61975
    0.00000    5.42396    9.96772   14.69584   19.42397
    0.00000    0.00000    2.27881    2.87354    3.90708
    0.00000    0.00000    0.00000    1.76914    1.69502
    0.00000    0.00000    0.00000    0.00000    1.55236


# Orthogonality and factorization using single-precision R
8.81018e-15
2.1845e-6

Ewout van Den Berg

IBM Research