Topic
  • 7 replies
  • Latest Post - ‏2014-07-11T12:33:01Z by chriscanto
The_Doctor
The_Doctor
39 Posts

Pinned topic CLUSTERED v7000 - degraded host message

‏2013-04-19T12:49:47Z |

I've seen the degraded host message many times when my HBAs are not logged in to the v7000.... no problem.

However, I have the following set up that seems 100% correct to me & I still receive the degraded host message.

  • v7000 - clustered - 2 controllers - 4 canisters
    • 16 total HBA ports - 8 HBA ports on Fabric A - 8 HBA ports on Fabric B
    • ports 1 & 4 on each can are on Fabric A
    • ports 2 & 3 on each can are on Fabric B
  • AIX Client LPAR running current SDDPCM 2.6.3.2 device driver
    • 4 HBA adapters - all NPIV or 4 vfc Client Adapters
    • the 4 HBAs are mapped thru 2 x VIO Servers.... traditional 2 HBAs thru each VIO Server 
  • SAN Zoning
    • vfc hba # 1 zoned on Fabric A to Ctlr A Can 1 Port 1
    • vfc hba # 1 zoned on Fabric A to Ctlr B Can 1 Port 1
    • vfc hba # 2 zoned on Fabric B to Ctlr A Can 1 Port 2
    • vfc hba # 2 zoned on Fabric B to Ctlr B Can 1 Port 2
    • vfc hba # 3 zoned on Fabric B to Ctlr A Can 2 Port 3
    • vfc hba # 3 zoned on Fabric B to Ctlr B Can 2 Port 3 
    • vfc hba # 4 zoned on Fabric A to Ctlr A Can 2 Port 4
    • vfc hba # 4 zoned on Fabric A to Ctlr B Can 2 Port 4
  • for a total of 8 paths to my vDisk (which is SDDPCM's limit)
    • vDisk is accessible by io Group 0 AND Io Group 1 but is cached by io group 1 at the present time
  • from AIX - 8 paths are seen by SDDPCM - no problem so far, everything appears correct
    • I would anticipate that most (all?) I/O would traverse only the 4 paths zoned to Ctlr B (io group 1), and
    • should I every force the vDisk over to io group 0, I would expect most (all?) I/O would traverse only the 4 paths zoned to Ctlr A

BUT this configuration gives me the dreaded "host degraded" message. 

I don't buy into the "degraded" message being correct in this instance.  (and please don't ask me to zone ONLY to IO Group 1 (or ONLY to IO Group 0) without stating some very specific reasons why.  This defeats the idea behind a CLUSTERED v7000 doesn't it ?)    That said, any constructive comments are always welcome.

Bottom line..... I'm struggling with the age old question >>>>> functioning as designed or defect ?  or is my thinking screwed up ?

 

 

Updated on 2013-04-19T14:38:20Z at 2013-04-19T14:38:20Z by The_Doctor
  • chriscanto
    chriscanto
    291 Posts

    Re: CLUSTERED v7000 - degraded host message

    ‏2013-04-19T16:18:06Z  

    If I'm reading that right, I believe V7000 is reporting the host status as degraded because each host port is only logging in to one node canister in each control enclosure.  For failover in the event of a node shutdown/reboot then it's expected that you'd want that host port to also have a login to the partner canister in the same control enclosure.

    Because of the other paths you've presented the volume via then the host actually does have multipathing available to cope with the loss of an individual node canister, but SVC/V7000 doesn't know that because it doesn't know what WWPNs really make up your physical host,  my understanding is that it can only make an assessment of connectivity health on a per-WWPN level.

     

    On the topic of "clustered" V7000, I'd say the main benefit is being able to scale-out and yet keep a single management interface and shared storage pools. The main benefit of being able to access a volume via additional "access I/O groups" is for non-disruptive volume migration.  If the volumes involved are using internal storage (i.e. using drives in a control enclosure or expansion enclosures connected to that control enclosure) I'm not sure if there's much/any benefit in allowing a host to access a volume via multiple control enclosures for "business as normal" running.  I'd be interested to hear what you're looking for more this aspect of your config.

  • The_Doctor
    The_Doctor
    39 Posts

    Re: CLUSTERED v7000 - degraded host message

    ‏2013-04-20T03:48:01Z  

    If I'm reading that right, I believe V7000 is reporting the host status as degraded because each host port is only logging in to one node canister in each control enclosure.  For failover in the event of a node shutdown/reboot then it's expected that you'd want that host port to also have a login to the partner canister in the same control enclosure.

    Because of the other paths you've presented the volume via then the host actually does have multipathing available to cope with the loss of an individual node canister, but SVC/V7000 doesn't know that because it doesn't know what WWPNs really make up your physical host,  my understanding is that it can only make an assessment of connectivity health on a per-WWPN level.

     

    On the topic of "clustered" V7000, I'd say the main benefit is being able to scale-out and yet keep a single management interface and shared storage pools. The main benefit of being able to access a volume via additional "access I/O groups" is for non-disruptive volume migration.  If the volumes involved are using internal storage (i.e. using drives in a control enclosure or expansion enclosures connected to that control enclosure) I'm not sure if there's much/any benefit in allowing a host to access a volume via multiple control enclosures for "business as normal" running.  I'd be interested to hear what you're looking for more this aspect of your config.

    First, thx for your reply & your comments.  I'll try to respond in reverse order.

    The "vision" I have for "clustered" is the non-dispruptive movement of a vDisk from IO Group 0 to IO Group 1 or vice versa..... BUT without SAN Zoning changes being needed & without any human intervention at the AIX level. 

    The scenario I have is 200+ AIX Client LPARs and 600+ vDisks that will (initially) be manually allocated to 1 of 2 IO Groups.  Over time, I anticipate a need to re-shuffle the deck & manually move vDisks amongst the 2 IO Groups for performance or space allocation reasons.  Again, if this can be done without SAN Zoning changes & without manual intervention at the AIX level, then that is a good thing.

    I welcome your comments should this "vision" be out of line.

    You indicated ->

    I believe V7000 is reporting the host status as degraded because each host port is only logging in to one node canister in each control enclosure. .............. my understanding is that it can only make an assessment of connectivity health on a per-WWPN level.

    I suspect your assessment is 100% correct.  I think if I change my approach & re-zone the WWPNs such that a WWPN is logged into BOTH canisters within the SAME Ctrl Enclosure....... this would eliminate the degraded host message.  Of course, it's not quite the same multi-pathing wise..... the new SAN Zoning method would use 2 host HBA ports to each Ctrl Enclosure vs my previous plan of using 4 host HBA ports to each Ctrl Enclosure.  This new method leaves the 2 HBAs going to the opposite IO Group effectively idle.   Not sure if I like this.  Anyway, I guess I'll have to live with it until someone comes up with something better......  OR maybe I'll just live with the degraded message.  

    You further indicated ->

    but SVC/V7000 doesn't know that because it doesn't know what WWPNs really make up your physical host,

    It'd be nice if this wasn't the case.  The Host Definition in the v7000 has a list of all the WWPNs, so I think the v7000 has the necessary information to make that determination, but maybe I'm missing something.  Regardless, if the function isn't there, it isn't there. So I understand what you're saying.

    Thanks for educating me on all this.

     

     

     

     

    Updated on 2013-04-21T02:59:52Z at 2013-04-21T02:59:52Z by The_Doctor
  • E3FQ_Stephen_Gillanders
    1 Post

    Re: CLUSTERED v7000 - degraded host message

    ‏2013-08-09T08:24:45Z  

    Suggestion: vfc typically have 2 x WWPN to enable LPAR Mobility if they are AIX. Normally you are preparing the infrastructure by defining both WWPNs as part of the host aliases in both the SAN and the SVC.

    This automatically gives you a 'degraded' host, because one of the WWPNs is normally offline on the vfc until you need it during LPAR Mobility.

  • The_Doctor
    The_Doctor
    39 Posts

    Re: CLUSTERED v7000 - degraded host message

    ‏2013-08-09T11:47:42Z  

    Suggestion: vfc typically have 2 x WWPN to enable LPAR Mobility if they are AIX. Normally you are preparing the infrastructure by defining both WWPNs as part of the host aliases in both the SAN and the SVC.

    This automatically gives you a 'degraded' host, because one of the WWPNs is normally offline on the vfc until you need it during LPAR Mobility.

    Of course.......  adding the 2nd WWPN for Live Partition Mobility (LPM) adds to the "degraded host message".  Just another WWPN not logged into the fabric or v7k.  Please re-read post #1 (closely),  first sentence describes this scenario >>>

    I've seen the degraded host message many times when my HBAs are not logged in to the v7000.... no problem.

    The purpose of my original post was to generate dialogue / ideas on how to eliminate the "bogus" degraded host message, despite having everything (allegedly) configured correctly.

    I felt it was rather meaningless having a degraded host message for 100% of my 200+ LPARs.  A "false negative" on every LPAR (which is what I have) doesn't help one to identify problems.

     

    Updated on 2013-08-14T13:13:15Z at 2013-08-14T13:13:15Z by The_Doctor
  • bpon
    bpon
    1 Post

    Re: CLUSTERED v7000 - degraded host message

    ‏2014-07-10T17:15:52Z  

    Of course.......  adding the 2nd WWPN for Live Partition Mobility (LPM) adds to the "degraded host message".  Just another WWPN not logged into the fabric or v7k.  Please re-read post #1 (closely),  first sentence describes this scenario >>>

    I've seen the degraded host message many times when my HBAs are not logged in to the v7000.... no problem.

    The purpose of my original post was to generate dialogue / ideas on how to eliminate the "bogus" degraded host message, despite having everything (allegedly) configured correctly.

    I felt it was rather meaningless having a degraded host message for 100% of my 200+ LPARs.  A "false negative" on every LPAR (which is what I have) doesn't help one to identify problems.

     

    I am have the same issue with the "degraded" host message for hosts that have a 2nd standby wwpn. How did you resolve the issue??

  • The_Doctor
    The_Doctor
    39 Posts

    Re: CLUSTERED v7000 - degraded host message

    ‏2014-07-11T11:48:47Z  
    • bpon
    • ‏2014-07-10T17:15:52Z

    I am have the same issue with the "degraded" host message for hosts that have a 2nd standby wwpn. How did you resolve the issue??

    How did you resolve the issue??

    What makes you think it's resolved ? :)  With SDDPCM supporting up to16 paths now, it's probably going in the other direction..... e.g. worse. :)

    AFAIK, there's no plan to address it.... but what do I know ?  Nothing.  

    For example, it would be nice if the product supported something like:

    1. Host has >= 8 logged in paths, don't turn anything on.  Leave it GREEN.
    2. Host has 4-7 logged in paths, but not ALL paths are logged in, turn the GUI --> YELLOW, to indicate maybe degraded
    3. Host has <= 3 logged in paths, but not ALL paths are logged in,  leave the GUI --> RED, as it is today
    4. or make the # of Paths user selectable, if the above values don't cut it

    Anyway, I'm not privy to any v7000 plans either way.

    In our case, we've learned to ignore the 200+ false negatives in the GUI.  a.k.a. we sucked it up and moved on.

    We do monitor our paths from the AIX Host side to cover things.

  • chriscanto
    chriscanto
    291 Posts

    Re: CLUSTERED v7000 - degraded host message

    ‏2014-07-11T12:33:01Z  

    How did you resolve the issue??

    What makes you think it's resolved ? :)  With SDDPCM supporting up to16 paths now, it's probably going in the other direction..... e.g. worse. :)

    AFAIK, there's no plan to address it.... but what do I know ?  Nothing.  

    For example, it would be nice if the product supported something like:

    1. Host has >= 8 logged in paths, don't turn anything on.  Leave it GREEN.
    2. Host has 4-7 logged in paths, but not ALL paths are logged in, turn the GUI --> YELLOW, to indicate maybe degraded
    3. Host has <= 3 logged in paths, but not ALL paths are logged in,  leave the GUI --> RED, as it is today
    4. or make the # of Paths user selectable, if the above values don't cut it

    Anyway, I'm not privy to any v7000 plans either way.

    In our case, we've learned to ignore the 200+ false negatives in the GUI.  a.k.a. we sucked it up and moved on.

    We do monitor our paths from the AIX Host side to cover things.

    You could submit a Request For Enhancement (RFE) for this type of feature to be considered using the developerWorks website:

    www.ibm.com/developerworks/rfe/tivoli

     

    It's possible/likely there is a request for this sort of thing already that could you add your vote to....