Topic
  • 6 replies
  • Latest Post - ‏2013-09-18T09:19:32Z by 0FU1_Christian_Reichhoff
Pmanatee
Pmanatee
4 Posts

Pinned topic ds3400 behind V7000: No path failover

‏2013-06-06T08:55:33Z |

Hello,

we have two DS3400 behind a V7000.

When I publish LUNs to a created Host "V7000" on either of the two DS3400, I cannot change the preferred node for thes LUNs.

Other LUNs, published to "real hosts" like W2k8, clustered or not, and Linux are still changeable from controller A to B and back. LUNS mapped to the V7000 I cannot move from one controller to the other, regardless if they are actually on A or B - both are working but not to change.

So there is no redundancy in our SAN for any of our V7000-Pools containing drives from any DS3400, firmwareupgrades on any of the DS3400 are impossible a.s.o.

IBM Support is involved since a week, but helpless at this moment.

We checked zoning, settings on V7000 and DS3400 and they found nothing faulty.

3 questions they cold not answer but are in my mind:

1: In the IBM-Informationcenter there is one setting described, that nobody could explain or could tell, how to do that, what it could mean, on a DS3400:

"Perform the following steps for the worldwide node name (WWNN) option:

  1. Set the system so that both controllers have the same WWNN."

 

2. According to this I defined the "Host" V7000 on the DS3400 as ONE host with all 4 FC-Ports - would it be suggested to define TWO Hosts, one for each V7000-node and cluster them? Or is this the answer on question 1: One host with BOTH controllers?

 

3. I did this with a V3700, too. From the V3700 the V7000 reports 2 external controllers. From a DS3400 it reports always only one. I think this is "works as designed", isn´t it?

 

Any idea is appreciated - thanks - Chris

  • The_Doctor
    The_Doctor
    39 Posts

    Re: ds3400 behind V7000: No path failover

    ‏2013-06-06T12:42:13Z  

    Sorry I have no answers for you.  I've never had a DS3400 behind a v7000.

    SAN question(s) 101..... I assume you have the traditional 2 Fabrics ?  The DS3xxxx Ctrl A in both Fabric A & Fabric B ?  DS3xxx Ctrl B in both Fabrics as well ?  HBAs in each v7000 cannister....  evenly split between the 2 Fabrics ?  I'll assume this is the way you're set up.

    Seems you're able to do a lot of testing,...... so I'll ask, what happens during this test :

    • on Fabric A.... disable the SAN port going to DS3xxx Ctrl B
    • on Fabric B.... disable the SAN port going to DS3xxx Ctrl B
    • do the DS3xxxx LUNs on Ctrl B switch over to Ctrl A ?
    • if you're up for it.....reset, then repeat the test but using Ctlr A..... do LUNs switch over to Ctrl B ?

    I dunno if the answer will provide insight into the problem or not.  Good luck.

    Updated on 2013-06-06T12:51:28Z at 2013-06-06T12:51:28Z by The_Doctor
  • Pmanatee
    Pmanatee
    4 Posts

    Re: ds3400 behind V7000: No path failover

    ‏2013-06-06T13:16:55Z  

    Sorry I have no answers for you.  I've never had a DS3400 behind a v7000.

    SAN question(s) 101..... I assume you have the traditional 2 Fabrics ?  The DS3xxxx Ctrl A in both Fabric A & Fabric B ?  DS3xxx Ctrl B in both Fabrics as well ?  HBAs in each v7000 cannister....  evenly split between the 2 Fabrics ?  I'll assume this is the way you're set up.

    Seems you're able to do a lot of testing,...... so I'll ask, what happens during this test :

    • on Fabric A.... disable the SAN port going to DS3xxx Ctrl B
    • on Fabric B.... disable the SAN port going to DS3xxx Ctrl B
    • do the DS3xxxx LUNs on Ctrl B switch over to Ctrl A ?
    • if you're up for it.....reset, then repeat the test but using Ctlr A..... do LUNs switch over to Ctrl B ?

    I dunno if the answer will provide insight into the problem or not.  Good luck.

    Hi Doc,

    Yes, zoning is like you described:

    Ports of DS3400 Controller "A" and V7000 node1 are named "A1" und "B1", on Controller B / node2: "B1" and "B2"

    A1 & B1 Fc-cables are going into Switch "1" - A2 & B2 into switch "2" and are zoned there in particular groups, e.g. on switch "1": "V700_A1 & B1with Ds3400_A1 & B1"

    But I´m in a productional environment and have to be very careful.

    I ASSUME, for Test 1 you suggest, that the LUNs on the DS3400 should NOT switch the controller, but the path. Am I right?

    Instead the V7000 should notice a "reduced login" of the DS3400 and, if the LUN from the DS3400 was seen via Fabric A, the mdisc should move from node1 to node2.

    Correct?

  • The_Doctor
    The_Doctor
    39 Posts

    Re: ds3400 behind V7000: No path failover

    ‏2013-06-06T13:59:24Z  
    • Pmanatee
    • ‏2013-06-06T13:16:55Z

    Hi Doc,

    Yes, zoning is like you described:

    Ports of DS3400 Controller "A" and V7000 node1 are named "A1" und "B1", on Controller B / node2: "B1" and "B2"

    A1 & B1 Fc-cables are going into Switch "1" - A2 & B2 into switch "2" and are zoned there in particular groups, e.g. on switch "1": "V700_A1 & B1with Ds3400_A1 & B1"

    But I´m in a productional environment and have to be very careful.

    I ASSUME, for Test 1 you suggest, that the LUNs on the DS3400 should NOT switch the controller, but the path. Am I right?

    Instead the V7000 should notice a "reduced login" of the DS3400 and, if the LUN from the DS3400 was seen via Fabric A, the mdisc should move from node1 to node2.

    Correct?

    I'm not familiar specifically with the DS3400...... DS4xxxx, DS35xx, v7000 & others yes, but not DS3400. If the DS3400 looks & behaves more or less like a DS4xxxx or DS35xx, then your assumptions are NOT correct.

    I ASSUME, for Test 1 you suggest, that the LUNs on the DS3400 should NOT switch the controller, but the path. Am I right?

    I don't think so.  I assume the:

    • DS3400 has 2 HBAs on Ctlr A.... 1 cabled to Fabric A and 1 cabled to Fabric B.
    • DS3400 has 2 HBAs on Ctlr B.... 1 cabled to Fabric A and 1 cabled to Fabric B.
    • is this correct ?

    The intent of my 1st suggested test was to disable ALL paths going to DS3400 Ctrl B..... then that might force the DS3400 to switch any LUNS owned by Ctlr B over to Ctlr A (or not).  The answer might have helped you & IBM Support narrow the problem.  After all, your problem as stated was you didn't think you had redundancy.

    But since you now state this is a production environment...... running "high risk" intrusive tests, just to see what might happen, is not recommended.

    Instead the V7000 should notice a "reduced login" of the DS3400 and, if the LUN from the DS3400 was seen via Fabric A, the mdisc should move from node1 to node2.

    Correct?

    Again, I don't think so.  I don't see a reason why the v7000 would move the disk from Canister 1 to Canister 2.  Canister 1 and Canister 2 should be zoned to BOTH DS3400 Ctlr A and DS3400 Ctlr B.  And you confirmed they are zoned that way.

    Updated on 2013-06-06T14:11:09Z at 2013-06-06T14:11:09Z by The_Doctor
  • Pmanatee
    Pmanatee
    4 Posts

    Re: ds3400 behind V7000: No path failover

    ‏2013-06-07T07:50:21Z  

    I'm not familiar specifically with the DS3400...... DS4xxxx, DS35xx, v7000 & others yes, but not DS3400. If the DS3400 looks & behaves more or less like a DS4xxxx or DS35xx, then your assumptions are NOT correct.

    I ASSUME, for Test 1 you suggest, that the LUNs on the DS3400 should NOT switch the controller, but the path. Am I right?

    I don't think so.  I assume the:

    • DS3400 has 2 HBAs on Ctlr A.... 1 cabled to Fabric A and 1 cabled to Fabric B.
    • DS3400 has 2 HBAs on Ctlr B.... 1 cabled to Fabric A and 1 cabled to Fabric B.
    • is this correct ?

    The intent of my 1st suggested test was to disable ALL paths going to DS3400 Ctrl B..... then that might force the DS3400 to switch any LUNS owned by Ctlr B over to Ctlr A (or not).  The answer might have helped you & IBM Support narrow the problem.  After all, your problem as stated was you didn't think you had redundancy.

    But since you now state this is a production environment...... running "high risk" intrusive tests, just to see what might happen, is not recommended.

    Instead the V7000 should notice a "reduced login" of the DS3400 and, if the LUN from the DS3400 was seen via Fabric A, the mdisc should move from node1 to node2.

    Correct?

    Again, I don't think so.  I don't see a reason why the v7000 would move the disk from Canister 1 to Canister 2.  Canister 1 and Canister 2 should be zoned to BOTH DS3400 Ctlr A and DS3400 Ctlr B.  And you confirmed they are zoned that way.

    Thanks for reply.

    OK, I missunderstood your testscenario: I thought about testing each after each and not disabling both connections to one controller at once. And you are right about the V7000 should not switch nodes.

    But as you already noted - I cannot just try this out - all the LUNs to the V7000 are striped there in one pool. If just one LUN on controller B does NOT switch to controller, everything is broken and a lot of servers are loosing their storage. :-(

    IBM support stated out this morning, that there is a bug in the actual V7000 firmware and that this is the reason of my situation. I disagree with them, because no mdisk is in degraded state. Or they did not verify the mentioned solution.

  • Pmanatee
    Pmanatee
    4 Posts

    Re: ds3400 behind V7000: No path failover

    ‏2013-06-13T08:15:37Z  
    • Pmanatee
    • ‏2013-06-07T07:50:21Z

    Thanks for reply.

    OK, I missunderstood your testscenario: I thought about testing each after each and not disabling both connections to one controller at once. And you are right about the V7000 should not switch nodes.

    But as you already noted - I cannot just try this out - all the LUNs to the V7000 are striped there in one pool. If just one LUN on controller B does NOT switch to controller, everything is broken and a lot of servers are loosing their storage. :-(

    IBM support stated out this morning, that there is a bug in the actual V7000 firmware and that this is the reason of my situation. I disagree with them, because no mdisk is in degraded state. Or they did not verify the mentioned solution.

    I found a small difference in my config compared to IBM redbooks:

    My LUNs on the DS3400 are assingned to the single host "V7000" (with all the 4 FC-Ports defined)

    In the documentations they use a "Host Group" with the SVC or V7000 inside and assign the LUNs to that group.

    Is it safe to change that on a running system? The hostgroup I already created and the host "V7000" is inside, but I hesitate to change the mapping (under keeping the appropriate LUN-Nr.) from the single host to the group it belongs to in a I/O-loaded enviroment.

    And one understanding question more: They talk always about "storage partition". I know what a "Volume" is and what a "logical drive" and a "LUN" is - but is "storage partition" just another word for assigning a logical drive to a host group?

    Edit: I found the following in the DS3500 manual: 

    "The term storage partitioning is somewhat misleading because it actually represents a host or a group of hosts and the logical drives they access. "

    So if a single host uses a "storage partition" as well as a "host group" there seems to be no need of changing the mapping?

    Updated on 2013-06-13T08:22:31Z at 2013-06-13T08:22:31Z by Pmanatee
  • 0FU1_Christian_Reichhoff
    1 Post

    Re: ds3400 behind V7000: No path failover

    ‏2013-09-18T09:19:32Z  
    • Pmanatee
    • ‏2013-06-13T08:15:37Z

    I found a small difference in my config compared to IBM redbooks:

    My LUNs on the DS3400 are assingned to the single host "V7000" (with all the 4 FC-Ports defined)

    In the documentations they use a "Host Group" with the SVC or V7000 inside and assign the LUNs to that group.

    Is it safe to change that on a running system? The hostgroup I already created and the host "V7000" is inside, but I hesitate to change the mapping (under keeping the appropriate LUN-Nr.) from the single host to the group it belongs to in a I/O-loaded enviroment.

    And one understanding question more: They talk always about "storage partition". I know what a "Volume" is and what a "logical drive" and a "LUN" is - but is "storage partition" just another word for assigning a logical drive to a host group?

    Edit: I found the following in the DS3500 manual: 

    "The term storage partitioning is somewhat misleading because it actually represents a host or a group of hosts and the logical drives they access. "

    So if a single host uses a "storage partition" as well as a "host group" there seems to be no need of changing the mapping?

    Not solved but workaround:

    IBM stated this bug to be solved in latest 6.x and 7.x firmware - but it is not and they can still reproduce this at their own labs.

     

    Instead IBM support now changed their opinion from admitting a bug to abnegate it: "Works as designed" 

     

    The only way to avoid a DS3xxx/4xxx in failure-state is to increase the failure alert delay at the DS3xxx/4xxx 

    e.g. "set storageSubsystem failoverAlertDelay=20;

    AND immediately after changing the default controller on the DS running a "detectmdisk" on the V700 / SVC.