Topic
  • 4 replies
  • Latest Post - ‏2013-10-06T22:40:14Z by Frank Fegert
Frank Fegert
Frank Fegert
8 Posts

Pinned topic Disk UUID query fails after LPM (Debian)

‏2013-09-15T19:57:06Z |

Hello all,

after getting https://www.ibm.com/developerworks/community/forums/html/topic?id=9108f310-0899-4394-b3f1-d82e3506d630 and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700444 out of the way, i seem to be running into more (minor) issues with Linux (Debian), IBM Power and LPM ;-)

For internal inventory/correlation/CMDB purposes we're gathering several pieces of information from inside our LPARs. One of them is the UUID of each disk that is presented to the LPARs OS. Disks are in this scenario dual VIOS backed FC-based LUNs, usually from an SVC. The UUID is determined via the "scsiinfo -s <dev>" command, which works just fine until there has been an LPM operation to the LPAR. E.g:

# Start at P550 (8204-E8A; S/N: 06BE0F4)
root@host:~# cat /sys/class/scsi_host/host*/vhost_*
U8204.E8A.06BE0F4-V45-C2-T1
vhost0
U8204.E8A.06BE0F4-V45-C3-T1
vhost0

root@host:~# /sbin/scsiinfo -s /dev/sda
Serial Number '332136005076801918127980000000000023B04214503IBMfcp'
root@host:~# /sbin/scsiinfo -s /dev/sdb
Serial Number '332136005076801918127980000000000023B04214503IBMfcp'

# LPM to P730 (8231-E2D; S/N: 06AB35T)
root@host:~# cat /sys/class/scsi_host/host*/vhost_*
U8231.E2D.06AB35T-V45-C2-T1
vhost0
U8231.E2D.06AB35T-V45-C3-T1
vhost0

root@host:~# /sbin/scsiinfo -s /dev/sda
Serial Number ' '
root@host:~# /sbin/scsiinfo -s /dev/sdb
Serial Number ' '

# LPM back to P550 (8204-E8A; S/N: 06BE0F4)
root@host:~# cat /sys/class/scsi_host/host*/vhost_*
U8204.E8A.06BE0F4-V45-C2-T1
vhost0
U8204.E8A.06BE0F4-V45-C3-T1
vhost0

root@host:~# /sbin/scsiinfo -s /dev/sda
Serial Number ' '
root@host:~# /sbin/scsiinfo -s /dev/sdb
Serial Number ' '

root@host:~# reboot
root@host:~# cat /sys/class/scsi_host/host*/vhost_*
U8204.E8A.06BE0F4-V45-C2-T1
vhost0
U8204.E8A.06BE0F4-V45-C3-T1
vhost0

root@host:~# /sbin/scsiinfo -s /dev/sda
Serial Number '332136005076801918127980000000000023B04214503IBMfcp'
root@host:~# /sbin/scsiinfo -s /dev/sdb
Serial Number '332136005076801918127980000000000023B04214503IBMfcp'

Similarly, if i restart an VIOS after the LPAR has been moved via LPM, the once empty output of scsiinfo goes back to the normal, expected behaviour:

# LPM to P550 (8204-E8A; S/N: 06BE0F4), scsiinfo shows empty output like above.
# Now reboot VIO server #1 (U8204.E8A.06BE0F4-V45-C2-T1, /sys/class/scsi_host/host0)
root@host:~# /sbin/scsiinfo -s /dev/sda
Serial Number '332136005076801918127980000000000023B04214503IBMfcp'
root@host:~# /sbin/scsiinfo -s /dev/sdb
Serial Number ' '

A manually issued rescan on both SCSI host buses or a SCSI inquery does not seem to be enough to get things back on track. The kernel version is 2.6.39 with Debian patches and https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/scsi/ibmvscsi/ibmvscsi.c?id=201aed678482f247aa96bd8fcd9e960fefd82d59 applied. Taking a wild guess, i'd say this issue is also rooted somewhere in the ibmvscsi driver. Does anyone around here know which kernel structures are not properly updated after the successful LPM and why?

 

Thanks & best regards,
 
Frank Fegert
Updated on 2013-09-15T19:59:00Z at 2013-09-15T19:59:00Z by Frank Fegert
  • Frank Fegert
    Frank Fegert
    8 Posts

    Re: Disk UUID query fails after LPM (Debian)

    ‏2013-09-26T07:34:30Z  

    Hello all,

    i looked at the code of "scsiinfo" and did some more testing with the "sg3-utils" instead of "scsiinfo" to determine whether the problem is with the tools or possibly a driver/kernel issue:

    ~# dmesg (after LPM)
    ...
    [41786.122829] calling ibm,suspend-me on cpu 5
    [41786.541319] ibmvscsi 30000002: Re-enabling adapter!
    [41786.541345] ibmvscsi 30000003: Re-enabling adapter!
    [41786.541931] EPOW <0x6240040000000b8 0x0 0x0>
    [41786.542142] RTAS: event: 20, Type: EPOW, Severity: 1
    [41786.718236] property parse failed in parse_next_property at line 230
    [41788.721810] ibmvscsi 30000003: partner initialization complete
    [41788.721817] ibmvscsi 30000002: partner initialization complete
    [41788.721954] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p550-222 (2), OS 3, max io 262144
    [41788.721960] ibmvscsi 30000002: host srp version: 16.a, host partition vios1-p550-222 (1), OS 3, max io 262144
    [41788.722073] ibmvscsi 30000002: Client reserve enabled
    [41788.722076] ibmvscsi 30000003: Client reserve enabled
    [41788.722086] ibmvscsi 30000003: sent SRP login
    [41788.722088] ibmvscsi 30000002: sent SRP login
    [41788.722164] ibmvscsi 30000002: SRP_LOGIN succeeded
    [41788.722194] ibmvscsi 30000003: SRP_LOGIN succeeded
    [41788.902533] property parse failed in parse_next_property at line 230

    ~# /usr/bin/sginfo -s /dev/sda
    Serial Number ' '

    ~# /usr/bin/sginfo -s /dev/sdb
    Serial Number ' '

    So this is not restricted to "scsiinfo". Strangely getting the serial number from the VPD seems to work though:

    ~# /usr/bin/sg_vpd -p di /dev/sda  | egrep "      0x"
          0x6005076801918127980000000000023b
    ~# /usr/bin/sg_vpd -p di /dev/sdb  | egrep "      0x"
          0x6005076801918127980000000000023b

    I'll have to look at the code of "sg_vpd" to see what it does differently than "sginfo" or "scsiinfo".

    The "sg3-utils" also provide "sg_reset" to issue various SCSI resets. Trying a device reset did not work out:

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# sg_reset -d /dev/sda
    sg_reset: starting device reset
    sg_reset: completed device reset

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# dmesg (after device reset)
    [50787.824288] sd 0:0:1:0: resetting device. lun 0x8100000000000000

    Trying a bus reset instead did the trick though:

    ~# /usr/bin/sg_reset -b /dev/sdb
    sg_reset: starting bus reset

    sg_reset: completed bus reset
    ~# /sbin/scsiinfo -s /dev/sdb
    Serial Number '332136005076801918127980000000000023B04214503IBMfcp'

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# dmesg (after bus reset)
    [50164.410799] ibmvscsi 30000003: Resetting connection due to error recovery
    [50173.353799] ibmvscsi 30000003: SRP_VERSION: 16.a
    [50173.353909] ibmvscsi 30000003: partner initialization complete
    [50173.353975] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p550-222 (2), OS 3, max io 262144
    [50173.354065] ibmvscsi 30000003: Client reserve enabled
    [50173.354075] ibmvscsi 30000003: sent SRP login
    [50173.354127] ibmvscsi 30000003: SRP_LOGIN succeeded

    Judging from the dmesg output it looks the same as the messages produced during/after a regular LPM. Hmmm, going to dig deeper ... ;-)

    Best regards,

    Frank Fegert

    Updated on 2013-09-26T07:35:14Z at 2013-09-26T07:35:14Z by Frank Fegert
  • Bill_Buros
    Bill_Buros
    166 Posts

    Re: Disk UUID query fails after LPM (Debian)

    ‏2013-09-26T22:14:56Z  

    Hello all,

    i looked at the code of "scsiinfo" and did some more testing with the "sg3-utils" instead of "scsiinfo" to determine whether the problem is with the tools or possibly a driver/kernel issue:

    ~# dmesg (after LPM)
    ...
    [41786.122829] calling ibm,suspend-me on cpu 5
    [41786.541319] ibmvscsi 30000002: Re-enabling adapter!
    [41786.541345] ibmvscsi 30000003: Re-enabling adapter!
    [41786.541931] EPOW <0x6240040000000b8 0x0 0x0>
    [41786.542142] RTAS: event: 20, Type: EPOW, Severity: 1
    [41786.718236] property parse failed in parse_next_property at line 230
    [41788.721810] ibmvscsi 30000003: partner initialization complete
    [41788.721817] ibmvscsi 30000002: partner initialization complete
    [41788.721954] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p550-222 (2), OS 3, max io 262144
    [41788.721960] ibmvscsi 30000002: host srp version: 16.a, host partition vios1-p550-222 (1), OS 3, max io 262144
    [41788.722073] ibmvscsi 30000002: Client reserve enabled
    [41788.722076] ibmvscsi 30000003: Client reserve enabled
    [41788.722086] ibmvscsi 30000003: sent SRP login
    [41788.722088] ibmvscsi 30000002: sent SRP login
    [41788.722164] ibmvscsi 30000002: SRP_LOGIN succeeded
    [41788.722194] ibmvscsi 30000003: SRP_LOGIN succeeded
    [41788.902533] property parse failed in parse_next_property at line 230

    ~# /usr/bin/sginfo -s /dev/sda
    Serial Number ' '

    ~# /usr/bin/sginfo -s /dev/sdb
    Serial Number ' '

    So this is not restricted to "scsiinfo". Strangely getting the serial number from the VPD seems to work though:

    ~# /usr/bin/sg_vpd -p di /dev/sda  | egrep "      0x"
          0x6005076801918127980000000000023b
    ~# /usr/bin/sg_vpd -p di /dev/sdb  | egrep "      0x"
          0x6005076801918127980000000000023b

    I'll have to look at the code of "sg_vpd" to see what it does differently than "sginfo" or "scsiinfo".

    The "sg3-utils" also provide "sg_reset" to issue various SCSI resets. Trying a device reset did not work out:

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# sg_reset -d /dev/sda
    sg_reset: starting device reset
    sg_reset: completed device reset

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# dmesg (after device reset)
    [50787.824288] sd 0:0:1:0: resetting device. lun 0x8100000000000000

    Trying a bus reset instead did the trick though:

    ~# /usr/bin/sg_reset -b /dev/sdb
    sg_reset: starting bus reset

    sg_reset: completed bus reset
    ~# /sbin/scsiinfo -s /dev/sdb
    Serial Number '332136005076801918127980000000000023B04214503IBMfcp'

    ~# /sbin/scsiinfo -s /dev/sda
    Serial Number ' '

    ~# dmesg (after bus reset)
    [50164.410799] ibmvscsi 30000003: Resetting connection due to error recovery
    [50173.353799] ibmvscsi 30000003: SRP_VERSION: 16.a
    [50173.353909] ibmvscsi 30000003: partner initialization complete
    [50173.353975] ibmvscsi 30000003: host srp version: 16.a, host partition vios2-p550-222 (2), OS 3, max io 262144
    [50173.354065] ibmvscsi 30000003: Client reserve enabled
    [50173.354075] ibmvscsi 30000003: sent SRP login
    [50173.354127] ibmvscsi 30000003: SRP_LOGIN succeeded

    Judging from the dmesg output it looks the same as the messages produced during/after a regular LPM. Hmmm, going to dig deeper ... ;-)

    Best regards,

    Frank Fegert

    Thanks for the update..   we are watching and interested..  :- )

  • thib
    thib
    18 Posts

    Re: Disk UUID query fails after LPM (Debian)

    ‏2013-10-01T20:14:47Z  

    Hello Frank,

    Do you use LVM on Debian ? We are having trouble installing it !

    Any advice would be greatly appreciated !

     

    Thanks,

    Thibaud

  • Frank Fegert
    Frank Fegert
    8 Posts

    Re: Disk UUID query fails after LPM (Debian)

    ‏2013-10-06T22:40:14Z  
    • thib
    • ‏2013-10-01T20:14:47Z

    Hello Frank,

    Do you use LVM on Debian ? We are having trouble installing it !

    Any advice would be greatly appreciated !

     

    Thanks,

    Thibaud

    Hello Thibaud,

    sorry for the delayed reply!

    Do you mean LVM as in Logical Volume Manager? If so, yes i am using LVM with Debian on IBM Power LPARs. It's a bit tricky to get the paritioning right. Depending on your disk configuration, installation process and/or media, you might also be running into the same issues as i was (see: http://www.bityard.org/blog/2013/02/17/debian_wheezy_on_ibm_power). Basically i ran the debian-installer ISO from a VIOS backed VTOPT, selected "expert64" from boot menu, loaded the additional "multipath" installer component (because of the dual VIOS configuration) and did a manual disk configuration which looks like this:

    # cfdisk -P s /dev/sda
    Partition Table for /dev/sda

                   First       Last
     # Type       Sector      Sector   Offset    Length   Filesystem Type (ID) Flag
    -- ------- ----------- ----------- ------ ----------- -------------------- ----
       Pri/Log           0        2047      0#       2048 Free Space           None
     1 Primary        2048       16383      0       14336 PPC PReP Boot (41)   Boot
     2 Primary       16384     3921919      0     3905536 Linux (83)           None
     3 Primary     3921920    75495423      0    71573504 Linux LVM (8E)       None
       Pri/Log    75495424    75497471      0        2048 Free Space           None

    So, the PReP partition (#1) and the root-FS (#2) are outside the LVM (#3) volume groups.

    What issues are you running into, exactly?

    Best regards,

    Frank Fegert