Hangs and delays in shared file system environment

When there is an XCF communication delay, the zFS hang detector sends you a message. For example:
  • If the other system never received the XCF message, zFS issues message IOEZ00591I.
  • If the other system received the XCF message, but it is not making any progress on the other system or zFS cannot determine its status, zFS issues message IOEZ00547I.
  • If the other system received the XCF message but the progress is very slow or long running, zFS issues message IOEZ00661I.
  • If the other system processed the XCF message and sent a response back, but zFS did not receive the response, zFS issues message IOEZ00592I.
In these cases, zFS does not issue a system dump. Use the message information that refers to the systems that are not responding and determine the status of those systems. There might also be messages on the other systems that indicate the real problem. (Typically, each system issues its own messages when there is a problem.) There are timeouts on each XCF message. Wait to see whether a request timing out resolves the hang. If a request times out, the request will fail.

zFS also determines how long remote requests can take by supplying a timeout value to XCF (approximately 10 to 15 minutes). XCF monitors the request and if it takes longer than the timeout value, XCF indicates to zFS that the request timed out. In this case, zFS issues message IOEZ00658E or IOEZ00659E and fails the request. The message indicates an aggregate name if the timeout can be associated with an aggregate. The administrator should use the information in the message that refers to the system that is not responding and determine the status of that system. You might see zFS hang detector messages and the operation might not have run on the target system.