<   Previous Post  Storage Virtualizati...
The Storage behind...  Next Post:   >

Comments (48)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

16 orbist commented Permalink

Matthieu, sorry for such a long delay, the 1GB extent on SVC is because you are already getting the wide stripe benefit from the V7000 underneath it - so adding more striping doesn't gain much. Main thing is to make sure you dont use the same extent size at both levels, this could lead to situations where the double stripe ends up with all data on one array etc.

17 David90 commented Permalink

Hi Barry <div>&nbsp;</div> thanks for article ... in part 4 do you plan to speak about grainsize setting on vdisks? As we noticed different behaviour in terms of IOPS / backend response time when tuning grainsize on SEV, we would be very interested in having your point of view / experience ./ recommandation on that point <div>&nbsp;</div> Thanks a lot <div>&nbsp;</div> Kind regards <br /> -- <br /> david

18 Montecito commented Permalink

Hi Barry, <div>&nbsp;</div> If I am creating an easy tier solution. Is it best to split the arrays like 2 raid 5 array of 3 drives or create one array of 6 drives in raid5 ? <div>&nbsp;</div> I have been told that splitting the array gives better performance as then ssd arrays are assigned evenly to both the storage nodes. Is it true ?

19 orbist commented Permalink

Hi David, yes i must get part2 up first ! But I can discuss grain size considerations when we get there.

20 orbist commented Permalink

Montecito, <div>&nbsp;</div> You need to be careful in terminology / affinity here. As an array is an IO Group entity, both nodes can access the same array. Each drive is dual ported, and so each node has equal bandwidth and connectivity to each drive. <div>&nbsp;</div> Arrays within a node are assigned to cores. So if you only create one array then you are only using the one core (on each node). Create 4 arrays, and you are using 4 cores on each node (8 total). <div>&nbsp;</div> For easy tier, you will be able to push higher IOPs from the arrays if you have enough SSD. In your case, with 6 SSD ( presume), then a single core will be able to sustain the IOPs those drives can provide. Assuming you then have other HDD arrays, they will use up the other cores in the system. <div>&nbsp;</div> Trade off between the required IOPs (and capability of the SSD drives you have) and the number of drives you are willing to lose through RAID redundancy (parity)

21 Montecito commented Permalink

Barry, <div>&nbsp;</div> Thanks a ton for the response. I have learnt alot from your blogs. I request you to please start with part 2 asap. Been waiting for it from a long time. I bet it will be very interesting. Please do cover easy tier in it as well.

22 tabin commented Permalink

Hi Barry <div>&nbsp;</div> Let assume this situation: <br /> - host wants to write to SVC/V7000 with I/O=256KB <br /> - SVC/V7000 always tries to write to phisical disks with I/O=32KB <br /> - we have 4 mdisks in mdiskgrp (4 raids in v7000 or 4 lun's in SVC) with extent 128MB <br /> - each raid/lun is from an array of 5 disks raid5 (4+P) with stripe unit 128KB on each disk <br /> - we have one striped vdisk from this mdiskgrp mapped to one host <div>&nbsp;</div> And now <div>&nbsp;</div> When hosts writes to vdisk one I/O=256KB this goes down to SVC/V7000's cache. It will be placed in only one extent in mdiskgrp, thus in only one mdisk, thus in only one array or LUN. The SVC/V7000 cache algorithm always wants to write to phisical disks an I/O of maximum size of 32KB. So it will split this 256KB to 8x32KB I/Os and write them down to array. The array size is 4 data disks and stripe unit size is128KB so the array stripe size is 4x128KB=512KB. So how those 8x32KB I/O will be handled? Will it write 2 blocks to every disk with full stripe write making the stripe units half empty? If the stripe unit is not fully filled will it be usable in the another I/O? Or maybe will it write 4 block to first disk and 4 blocks to second disk, not making full stripe write and leaving 2 disk empty but making first two stripe units full? <br /> And if extent size is 128MB will SVC/V7000 have to make 512 (256KBx512=128MB) of this I/Os before writing to the next extent or not? Does an extend have to be filled before going to another extent, like in the sequential vdisk manner? <div>&nbsp;</div> Correct me if I'm wrong, I am just curious how this works in great detail :)

23 orbist commented Permalink

Hi tabin. <div>&nbsp;</div> At 6.1 we modified SVC to read and write up to 256KB I/O to mdisks. <br /> For V7000 it will read up to 256KB and write up to a full stride. i.e. at most 15x 256KB = 3.7MB <div>&nbsp;</div> SVC has no knowledge of what makes up a managed disk, so you will just read or write 256KB chunks. The storage controller itself (say its sequential I/O) has the ability to do whatever it normally does when any normal host reads or writes sequentially. <div>&nbsp;</div> Although its one extent, its one extent on an mdisk, which is in itself an array of disks. <div>&nbsp;</div> With V7000 we know the stripe size, i.e. how many disks there are, so we can build full stride writes in cache before destaging a single large write. <div>&nbsp;</div> For reads we will still just send 256KB, this could span multiple strips in a stripe - if its mis-aligned. So the RAID layer will split into the individual I/O needed from each disk.

24 orbist commented Permalink

Part 2 finally posted : https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/entry/configuring_ibm_storwize_v7000_and_svc_for_optimal_performance_part_26?lang=en

25 CarlosMZN commented Permalink

Hi Barry <div>&nbsp;</div> I have a question. <div>&nbsp;</div> Using a stripe unit of 256 Kbytes can saturate easier V7000's back-end ? <div>&nbsp;</div> Imagine the following situation: <br /> - RAID5. <br /> - 100% write <br /> - 100% random (so i suppose no full stipe writes are possible). <br /> - small blocks (for example 8 Kbytes). <br /> - 3.000 host iops <div>&nbsp;</div> The 3.000 host iops with raid5 penalty (i suppose 4: 2 reads and 2 writes), will produce 12.000 iops at back-end. <div>&nbsp;</div> 12.000 IOPS x 256 KBytes (stipe unit) = 3.000 MB/s (throughput at back-end) <div>&nbsp;</div> 3.000 MB/s = 3 GB/s aprox = 24 Gbps aprox. <div>&nbsp;</div> I suppose V7000 maximum back-end bandwith is: 2 nodes x 2 adapters x 6 Gbps SAS = 24 Gbps. <div>&nbsp;</div> So the maximun random write IOPS would be 3.000 aprox (even using SSD). <div>&nbsp;</div> I think i'm wrong, but could you show me where ?

26 orbist commented Permalink

Carlos, <div>&nbsp;</div> Your mistake is thinking we have to write 256KB every time. If you only write 8KB, then we only need to merge the 8KB into the existing strip/stripe. So you end up with 4x 8KB I/O <div>&nbsp;</div> So its still an IOPs limit of the disks you will hit, not a MB/s limit of the controllers.

27 kikeemm commented Permalink

Hello, <div>&nbsp;</div> Thank you for your post. I have a question regarding performance on V7000. We are experiencing the same performance when writing small random blocks on different RAID configurations and sizes: <div>&nbsp;</div> test config <br /> 1 SSD 1 Mdisk 5+P (R5) <br /> 2 SAS x 2 Mdisk 7+P (R5) <br /> 3 SSD x 2 Mdisk 2+P (R5) <br /> 4 SSD x 3 Mdisk d+P (R1) <div>&nbsp;</div> Small random 100 % writes (4K) <br /> test 1 2 3 4 <br /> IOPS 3860 3650 3830 3866 <br /> MBps 15 14 15 15 <div>&nbsp;</div> Have you encounter this issued before? <div>&nbsp;</div> Thank you again and regards, <div>&nbsp;</div> Miguel

28 PekkaPanula commented Permalink

Great article. <div>&nbsp;</div> I have a couple question. <div>&nbsp;</div> We have a new expansion enclosure, with 24*10k 600Gb SAS drives. <br /> Whats the best practice to spit it to arrays? <br /> What we have now is: 2*RAID6 arrays, 12 disks each. <div>&nbsp;</div> We are trying to figure out couple things. Spare disks, it seems spare disks are per enclosure ? I have 2 spares on control enclosure, but array properties shows out that expansion enclosure arrays are without spare protection. <div>&nbsp;</div> So: if i do 12 disks R6 and then 11 disks R6 and join them as Pool, do i get penalties? <div>&nbsp;</div> Also: you are telling that 4 arrays is optimum, so what u recomend we should do to get good balance between capasity and performance wise? <div>&nbsp;</div> Workload based, we run XenServers and V7000 is our storage there, so workloads are sort of mixed.

29 rate commented Permalink

As I understand it (and I might be wrong), the LUNs are placed on top of the MDISK groups, so that all harddrives will be utilized when performing reads e.g. Is that completely wrong? <br /> And if it isn't wrong, then how do one select the proper stripe size?

30 rate commented Permalink

To add a little more info on my last post: <div>&nbsp;</div> We will be running VMware ESX 5.1 on top of the V7000. Now, the VMFS operates at 1024k chunks, so with a 8+1P RAID 5 I guess the stripe size should be 128k. <div>&nbsp;</div> That leeds me to two questions: <div>&nbsp;</div> 1) Why are the default IBM recommendation 8 disks in RAID5 when it will never add up to anything (stripe wise) due to the parity disk? <div>&nbsp;</div> 2) When the volumes are "floating" in your storage pool, how is it even relevant regarding stripe size, as one would never know which disks are being read from? <div>&nbsp;</div> Hope this makes sense, and that you'll give an answer - we are about to place an order on this baby :)