IBM Support

Why does DLI training stop and report save issue?

Question & Answer


Question

When running the DLI training, it reports the below error:
Caused by op 'save/SaveV2', defined at:
File "/dli_shared_fs/models/TensorFlow/flower-vgg-du2/flower-vgg-du2-20190314151008/main.py", line 351, in <module>
tf.app.run()
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/dli_shared_fs/models/TensorFlow/flower-vgg-du2/flower-vgg-du2-20190314151008/main.py", line 348, in main
train()
File "/dli_shared_fs/models/TensorFlow/flower-vgg-du2/flower-vgg-du2-20190314151008/main.py", line 314, in train
saver = tf.train.Saver(var_list=variables_to_restore_all, max_to_keep=2)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
self.build()
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 792, in _build_internal
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 284, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 202, in save_op
tensors)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1690, in save_v2
shape_and_slices=shape_and_slices, tensors=tensors, name=name)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/opt/DL/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SS4H63","label":"IBM Spectrum Conductor"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
31 July 2019

UID

ibm10880465