Topic
1 reply Latest Post - ‏2009-08-31T13:37:18Z by SystemAdmin
walterchen_austria
walterchen_austria
10 Posts
ACCEPTED ANSWER

Pinned topic vio 2.1: crash client lpar : misbehaved virt scsi client

‏2009-08-30T10:48:32Z |
Hello!

On our customers 9117-MMA(p570) a production client lpar crashed some days ago. We opened a PMR , bit they said, that vio-server behaved normal.
The situation is following

We have 2 vios (2.1 fix 20.1) which map LUNS from two DS3400 to the client lpars. vio1 maps from DS1 and vio2 from DS2. Mirroring is performed by the client lpar (redhat 2.6.18-181).
On the DS1 a disk failed from an SATA-Array (built of 6 disks 5+1 Raid5, disk size 1TB. And based on the error log on the vio1, we see that this failure coincided with " LABEL: CLIENT_FAILURE
IDENTIFIER: C972F43B

Date/Time: Thu Aug 13 00:10:00 GMT+02:00 2009
Sequence Number: 266
Machine Id: 00C85B8D4C00
Node Id: learn-570-vio1
Class: S
Type: TEMP
WPAR: Global
Resource Name: vhost3

Description
Misbehaved Virtual SCSI Client

Probable Causes
Bad IU, or SRP Violation

Failure Causes
Bad IU, or SRP Violation

Recommended Actions
Remove Virtual SCSI Client, then Configure the same instance

Detail Data
ADDITIONAL INFORMATION
module: trans_event rc: 00000000FFFFFFD8 location: 00000502
data: 2 2 0 0 0
"

####

The client-lpar hang after this message (I/O-hang) could not log in any more,neither via tcp nor virt-console from hmc). We had to reboot the client. The mirror is broken, so we will fix teh problem with vhost3 tomorrow

We do not have IBM Linux Softwaresupport, so IBM closed the PMR again

Does anybody have an idea, why this failure can occur?
########
health_check interval: we use mpio not RDAC
IBM Software-support said: we shell change the health check interval on the VIO-side from 80 to 0!? So deactivating the health-check
######
queue_depth:
Could it be that the DS3400 SATA-array is to slow? We set the queue_depth of the VIOS-disks to 16. Is this too high?

Did someone already have a similar case?
[ I only found an old case http://ozlabs.org/pipermail/linuxppc64-dev/2004-November/002699.html , where they say: The symptom is that some large I/Os will fail the adapter (putting it
offline).

best regards and thank you for any help!

walter
Updated on 2009-08-31T13:37:18Z at 2009-08-31T13:37:18Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    706 Posts
    ACCEPTED ANSWER

    Re: vio 2.1: crash client lpar : misbehaved virt scsi client

    ‏2009-08-31T13:37:18Z  in response to walterchen_austria
    Can you post the output of:

    > uname -a
    AND
    > cat /etc/redhat-release

    The kernel version you quoted redhat 2.6.18-181 doesn't look familiar. Thanks.

    Regards,
    --Robert