Getting started with Caffe2 and ONNX

This release of PowerAI includes a Technology Preview of Caffe2 and ONNX.

Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments.

ONNX (Open Neural Network Exchange) provides support for moving models between those frameworks.

PowerAI support for Caffe2 is included in the PyTorch package. It's set up and activated along with PyTorch, as shown below.

ONNX is packaged as a conda package and will be installed automatically during the install_dependencies step:

conda create -y -n my-py3-env python=3.6
...

source activate my-py3-env

(my-py3-env) $ /opt/DL/pytorch/bin/install_dependencies -y
...

(my-py3-env) $ source /opt/DL/pytorch/bin/pytorch-activate

(my-py3-env) $ python
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:29:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import caffe2
>>>

Though Caffe2 shares activation with PyTorch, its self-tests are in a separate caffe2-test script:

(my-py3-env) $ caffe2-test

More information

Information about Caffe2 and ONNX, and tutorials for moving PyTorch models to Caffe2 using ONNX can be found at:

Known issues

Caffe2 and ONNX are Technology Previews (of a community development preview, in Caffe2's case). Some features may be disabled or still "work in progress". The PowerAI team is aware of the following issues:

  • The torch.onnx tutorial End-to-end AlexNet from PyTorch to Caffe2 fails with an error similar to:
    RuntimeError: [enforce fail at reshape_op.h:110] total_size == size.  92160 vs 2761669743764137984. Argument shape does not agree with the input data.

    The size value may differ but will be a large positive or negative number. This example often fails but will sometimes succeed, so a workaround is just to try again.

  • The ONNX Tutorial Verify the Correctness of Exported Model and Compare the Performance fails. A workaround is to save as a python script and:
    • Replace:
      init_net, predict_net =
      Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model.graph, device="CPU")

      with:

      init_net, predict_net =
      Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model, device="CPU")
    • Replace:
      caffe2_time = benchmark_caffe2_model(init_net, predict_net)

      with:

      ws.CreateNet(predict_net)
      result = ws.BenchmarkNet(predict_net.name,3, 10, True)
      caffe2_time = result[0]
    • Replace:
      _, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)

      with:

      ws, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)

The PowerAI team is working with the upstream communities to address these issues.