booting from volume failed because failed to copy image to volume and traceback errors returned

Issue description

The deployment failed with errors showed on UI like:

Failed to copy image to volume: Chained Exception #1 Traceback (most recent call last): Traceback (most recent call last):
          File "/usr/lib/python3.6/site-packages/cinder/volume/driver.py", line 513, in _detach_volume
            ignore_errors=ignore_errors)
          File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 137, in trace_logging_wrapper
            return f(*args, **kwargs)
          File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 359, in inner
            return f(*args, **kwargs)
          File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 342, in disconnect_volume
            wwn = self._linuxscsi.get_scsi_wwn(path)
          File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 193, in get_scsi_wwn
            root_helper=self._root_helper)
          File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
            result = self.__execute(*args, **kwargs)
          File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
            return execute_root(*cmd, **kwargs)
          File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
            return self.channel.remote_call(name, args, kwargs)
          File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
            raise exc_type(*result[2])
        oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
        Command: /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/ccw-0.0.1c03-zfcp-0x5005076306105388:0x4001401b00000000
        Exit code: 1
        Stdout: ''

And the error in the log of /var/log/cinder/volume-<host name of cinder volume>.log (for example, /var/log/cinder/volume-v7k60.log.log) is like:

2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server Chained Exception #1
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  Traceback (most recent call last):
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/cinder/volume/driver.py", line 513, in _detach_volume
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      ignore_errors=ignore_errors)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 137, in trace_logging_wrapper
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      return f(*args, **kwargs)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 359, in inner
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      return f(*args, **kwargs)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 342, in disconnect_volume
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      wwn = self._linuxscsi.get_scsi_wwn(path)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 193, in get_scsi_wwn
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      root_helper=self._root_helper)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      result = self.__execute(*args, **kwargs)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      return execute_root(*cmd, **kwargs)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      return self.channel.remote_call(name, args, kwargs)
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server    File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server      raise exc_type(*result[2])
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  Command: /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/ccw-0.0.1c03-zfcp-0x5005076306105388:0x4001401b00000000
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  Exit code: 1
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  Stdout: ''
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server  Stderr: ''
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server
2021-01-12 22:20:33.935 772409 ERROR oslo_messaging.rpc.server

This is because the scsi_id command might fail to query a SCSI device and generate a unique value for it, occasionally.

Resolution

  1. First, cleaning up the leftovers on the agent node of the storage provider if any residual devices left. Refer to Clean up leftovers of multipath for more details.

  2. When cleanup is done, you can retry the deploy.