Topic
  • 3 replies
  • Latest Post - ‏2014-08-05T07:55:25Z by rate
rate
rate
9 Posts

Pinned topic V7000 hot spare failover issue

‏2014-08-04T09:03:48Z |

Hi,

We have a V7000 running with 2 expansion enclosures. We are running firmware version 6.4.0.4. Each enclosure has 2 8-disk RAID-5 and a single 7-disk RAID-5. Last disk in the enclosure is a hotspare.

The following has happened twice for us, so I just wanted to see if anyone has experienced the same, or if anyone knows that I am doing something wrong.

1) A disk fails in one of the enclosures.

2) The hotspare kicks in.

3) When the hotspare is fully operational, we open the management gui and see the disk error.

4) We start the recommended Fix Procudure Wizard and the wizard instructs us to replace the faulty drive, which we do.

5) The new drive becomes A PART OF THE RAID instead of a new hotspare.

6) The old hotspare doesn't cease to operate as a new disk and goes back to being a hotspare. It stays as a new disk in the affected RAID, and the RAID is now a 9-disk RAID instead of a 8-disk RAID.

 

Now, to my knowledge it isn't possible to shrink a RAID on the V7000, so we are slowly running out of hotspares, because every time we have a faulty drive, the raid expands after disk replacement, and we loose a hotspare. Luckily we configured 1 hotspare in each enclosure, so we still have one left, but what is going on here?

My guess is one of the following:

1) The Fix Procedure must not be run when using hotspares. New replaced drives should instead just manually be marked as hotspares.

2) This is a bug in the firmware.

Anyone got any inputs?

/Rasmus

  • AndersLorensen
    AndersLorensen
    156 Posts
    ACCEPTED ANSWER

    Re: V7000 hot spare failover issue

    ‏2014-08-04T18:41:45Z  

    Everything up till 6) is acting as it should.

    As for your explanation 1 - This is not the case. The second you mark the replaced drive as a hotspare, the copy-back function will kick in.

     

    In 6) the copy-back function is copying the data from the hotspare to the replaced drive. This can take a long time, depending on load and disk size etc.

    Last time I replaced a drive, a 600 GB 10K 2½" SAS disk, it took about 24 hours to do the copy back function, and that was on a very lightly loaded system.

    If you are running with 3 TB NL-SAS disks, I wouldnt be surprised if it would take upward a week to do. Have you waited "long enough"?

     

    The firmware you are running is really really old. You should consider upgrading it. If you call IBM support, its the first thing they'll ask you to do anyways.

     

    /Anders

  • OBRAD
    OBRAD
    1 Post

    Re: V7000 hot spare failover issue

    ‏2014-08-04T14:13:00Z  

    Hi Rasmus,

    best way is to open service ticket at local IBM.

    Collect snap from storage with option 3 (standard log + most recent statesave from each node)

    Also it would be good to add in issue description, when such behavior occured for the first time.

    Regards,
    Minya

  • AndersLorensen
    AndersLorensen
    156 Posts

    Re: V7000 hot spare failover issue

    ‏2014-08-04T18:41:45Z  

    Everything up till 6) is acting as it should.

    As for your explanation 1 - This is not the case. The second you mark the replaced drive as a hotspare, the copy-back function will kick in.

     

    In 6) the copy-back function is copying the data from the hotspare to the replaced drive. This can take a long time, depending on load and disk size etc.

    Last time I replaced a drive, a 600 GB 10K 2½" SAS disk, it took about 24 hours to do the copy back function, and that was on a very lightly loaded system.

    If you are running with 3 TB NL-SAS disks, I wouldnt be surprised if it would take upward a week to do. Have you waited "long enough"?

     

    The firmware you are running is really really old. You should consider upgrading it. If you call IBM support, its the first thing they'll ask you to do anyways.

     

    /Anders

  • rate
    rate
    9 Posts

    Re: V7000 hot spare failover issue

    ‏2014-08-05T07:55:25Z  

    Everything up till 6) is acting as it should.

    As for your explanation 1 - This is not the case. The second you mark the replaced drive as a hotspare, the copy-back function will kick in.

     

    In 6) the copy-back function is copying the data from the hotspare to the replaced drive. This can take a long time, depending on load and disk size etc.

    Last time I replaced a drive, a 600 GB 10K 2½" SAS disk, it took about 24 hours to do the copy back function, and that was on a very lightly loaded system.

    If you are running with 3 TB NL-SAS disks, I wouldnt be surprised if it would take upward a week to do. Have you waited "long enough"?

     

    The firmware you are running is really really old. You should consider upgrading it. If you call IBM support, its the first thing they'll ask you to do anyways.

     

    /Anders

    Hi Anders,

    Thanks for your reply. The copy-back had finished. The failover started saturday, and I replaced the faulty drive on monday. We are running 10k 600GB SAS disks.

    We are planning a firmware update later this month, and after that I will delete and recreate the MDISKs that now consists of 9 disks. Guess we'll see what happens when the first drive fails after that.

    /Rasmus