Getting started with Caffe2 and ONNX
Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments. ONNX (Open Neural Network Exchange) provides support for moving models between those frameworks.
WML CE support for Caffe2 and ONNX is included in the PyTorch package that is installed with WML CE. Both are set up and activated along with PyTorch.
You can validate the installation of Pytorch, Caffe2, and ONNX by running the following commands:
(my-py3-env) $ python
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:29:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import caffe2
>>> import onnx
>>>
Though Caffe2 shares activation with PyTorch, the self-tests are in a separate
caffe2-test
script:
(my-py3-env) $ caffe2-test
More information
Information about Caffe2 and ONNX, and tutorials for moving PyTorch models to Caffe2 using ONNX can be found at:
Caffe2 NUMA support and Docker
Docker host configurations may not allow certain NUMA-related operations by default, for example changing memory policy or binding memory allocations to specific NUMA nodes.
The caffe2-test script includes some NUMA tests, and so may fail when run in a
container on a NUMA-capable host. Test failures will mention enforce fail at
numa.cc
, mbind(...)
, or Could not move memory to a NUMA
node
. For example:
def _Blob_feed(blob, arg, device_option=None):
if device_option is not None:
device_option = StringifyProto(device_option)
> return blob._feed(arg, device_option)
E RuntimeError: [enforce fail at numa.cc:85] mbind(...) == 0. Could not move memory to a NUMA node
../../tmp/tmp.HF2Nai4yuZ/caffe2/python/checkpoint_test.py::TestCheckpoint::test_download_group_simple
/opt/anaconda3/envs/pytorch/bin/caffe2-test: line 90: 1735 Aborted (core dumped)
python -m pytest --disable-pytest-warnings -v --durations=20 ${DESELECTED_TESTS}
"${TESTCOPY}/caffe2/python"
In this case, you might need to run PYTEST_ADDOPTS=-x caffe2-test
to get caffe2-test
to show issues.
NUMA operation is disabled by default in Caffe2, so this limitation should not affect customer applications unless they explicitly enable NUMA support using the caffe2_cpu_numa_enabled flag.
If the Docker hosting arrangement permits, NUMA operations can be enabled by starting containers using the --privileged or --cap-add=SYS_NICE options on the docker run command.
Known issues
The WML CE team is aware of the following issue:
- The ONNX Tutorial Verify the Correctness of Exported Model and Compare the
Performance fails. A workaround is to save as a python script and:
- Replace:
init_net, predict_net = Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model.graph, device="CPU")
with:
init_net, predict_net = Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model, device="CPU")
- Replace:
caffe2_time = benchmark_caffe2_model(init_net, predict_net)
with:
ws.CreateNet(predict_net) result = ws.BenchmarkNet(predict_net.name,3, 10, True) caffe2_time = result[0]
- Replace:
_, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)
with:
ws, caffe2_results = c2_native_run_net(init_net, predict_net, caffe2_inputs)
- Replace:
The WML CE team is working with the upstream communities to address these issues.