Getting started with Caffe2 and ONNX

Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments. ONNX (Open Neural Network Exchange) provides support for moving models between those frameworks.

WML CE support for Caffe2 and ONNX is included in the PyTorch package that is installed with WML CE. Both are set up and activated along with PyTorch.

You can validate the installation of Pytorch, Caffe2, and ONNX by running the following commands:

(my-py3-env) $ python
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:29:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import caffe2
>>> import onnx
>>>

Though Caffe2 shares activation with PyTorch, the self-tests are in a separate caffe2-test script:

(my-py3-env) $ caffe2-test

More information

Information about Caffe2 and ONNX, and tutorials for moving PyTorch models to Caffe2 using ONNX can be found at:

Caffe2 NUMA support and Docker

Docker host configurations may not allow certain NUMA-related operations by default, for example changing memory policy or binding memory allocations to specific NUMA nodes.

The caffe2-test script includes some NUMA tests, and so may fail when run in a container on a NUMA-capable host. Test failures will mention enforce fail at numa.cc, mbind(...), or Could not move memory to a NUMA node. For example:

    def _Blob_feed(blob, arg, device_option=None):
        if device_option is not None:
            device_option = StringifyProto(device_option)
>       return blob._feed(arg, device_option)
E       RuntimeError: [enforce fail at numa.cc:85] mbind(...) == 0. Could not move memory to a NUMA node

caffe2-test may abort with a core dump, like this:

../../tmp/tmp.HF2Nai4yuZ/caffe2/python/checkpoint_test.py::TestCheckpoint::test_download_group_simple 
/opt/anaconda3/envs/pytorch/bin/caffe2-test: line 90: 1735 Aborted (core dumped) 
python -m pytest --disable-pytest-warnings -v --durations=20 ${DESELECTED_TESTS} 
"${TESTCOPY}/caffe2/python"

In this case, you might need to run PYTEST_ADDOPTS=-x caffe2-test to get caffe2-test to show issues.

NUMA operation is disabled by default in Caffe2, so this limitation should not affect customer applications unless they explicitly enable NUMA support using the caffe2_cpu_numa_enabled flag.

If the Docker hosting arrangement permits, NUMA operations can be enabled by starting containers using the --privileged or --cap-add=SYS_NICE options on the docker run command.

Known issues

The WML CE team is aware of the following issue:

The ONNX Tutorial Verify the Correctness of Exported Model and Compare the Performance fails. A workaround is to save as a python script and:

Replace:

init_net, predict_net =
Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model.graph, device="CPU")

with:

init_net, predict_net =
Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model, device="CPU")

Replace:

caffe2_time = benchmark_caffe2_model(init_net, predict_net)

with:

ws.CreateNet(predict_net)
result = ws.BenchmarkNet(predict_net.name,3, 10, True)
caffe2_time = result[0]

Replace:

_, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)

with:

ws, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)

The WML CE team is working with the upstream communities to address these issues.