<   Previous Post  Storage Virtualizati...
The Storage behind...  Next Post:   >

Comments (48)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 kopper commented Permalink

interesting article....during these days I will be working on a v7000-SVC deployment.... waiting for the others articles <div>&nbsp;</div> thanks a lot

2 AngeloBernasconiBlog commented Permalink

Question: <br /> If "With V7000, always configure at least 4 arrays, and if possible, multiples of 4. " <br /> Can we assume that it could be a best practices to have at least 4 Mdisk or multiple of 4 Mdisk also in a SVC ?

3 orbist commented Permalink

Angelo, this is less important on SVC, as SVC does not perform the RAID functions, i.e. there is no XOR or mirroring overhead, and so less CPU is required.

In levels < 6.3 there is a small advantage to having > 4 mdisks per controller, as this will guarantee all 4 FC ports are being used. In levels >= 6.3 we now round robin I/O for IBM and high end storage systems and so even with 1 mdisk you will be using all 4 FC ports per node.
There are other considerations from a config perspective that apply to SVC and I will cover in the next few posts (vdisks etc)

4 Montecito commented Permalink

Very nice read. However I do have a question. <div>&nbsp;</div> I know that especially in case of V7000 the array width should be same for all the mdisk in a storage pool. <div>&nbsp;</div> However if we utilize easy tier and add SSD drives which do not reach the optimal number of 8 [raid 5]. Will it cause any performance issue ? <div>&nbsp;</div> Or because SSD's perform way better than sas drives having 5 drive array compared to 8 drive array for sas would compensate for it ? <div>&nbsp;</div> As SSD drives are costly, and putting all eggs in one basket doesnt seem right.

5 orbist commented Permalink

Montecito, <div>&nbsp;</div> The reson for *recommending* arrays of the same width in a given storage pool is due to the volume strping. i.e. every volume created in a pool is striped across all member arrays. Thus if you have arrays with smaller numbers of components, it may pull down the performance of the pool as a whole to the lowest performing array. <div>&nbsp;</div> In general, unless you are pushing all your arrays to the maximum, you probably wont see much of a difference if you have arrays with 1 or so less component disks. <div>&nbsp;</div> As for Easy Tier, its a different story. Because Easy Tier determines which extents to put onto the SSD, its not affecting the rest of the HDD arrays. i.e. they still all have the same performance capability. <div>&nbsp;</div> The SSD are several orders of magnitude better than the HDD wrt IOPs. So even a single 1+1 RAID-1 array will give you a significant boost it IOPs and reduction in latency when used with Easy Tier. Obviously RAID-5 will give you greater capacity value for your SSD, but remember the penalties above, and you will essentialy always be doing mixed read/write workloads to RAID-5. <br />

6 VahricMuhtaryan commented Permalink

We newly purchased v7000 and nice to discuss, nice to talk people like you and thanks for blogs <div>&nbsp;</div> We already have EMC and tested 3PAR, and question about some (also we are SVC and DS 500 user for a long time) <div>&nbsp;</div> 3PAR and EMC push that for low latency and performance use 3+1 Raid5, for optimal needs use 7+1 like you say. in v7000 i couldn't find something other then defaults, why, is it because IBM do not believe 3+1 or other story you have (Our systems mostly random read and write intensive-Mail and Cloud) <div>&nbsp;</div> We mostly have mail servers on windows 2003/2008 and default volumes are formatted default which is 4KB, can i configure raid5 with 4KB stripe size ? Also in GUI (from redbook) i did not see any stripe option to select <div>&nbsp;</div> via v7000 GUI or TPC, can we see that we are doing full stripe size writes to v7000 arrays ? <div>&nbsp;</div> if we need to create array equal number of core, AMD processors have more core then Intel, and we can create more array with same or multiple number of core , means if your words are linear, <br /> in feature if storwize use 6 core we have to create same number of array or multiple of 6 ? <div>&nbsp;</div> What do you think about traditional raid then 3PAR 'Chunklet-based RAID', its start to come so logic,no need to waste number of disk as a spare, yes of course space will be waste again but no any spare will be wait to get in action and flex configuration (creating mixed raid groups with using same disk) sounds good, what do you think ? <div>&nbsp;</div> Thanks you

7 MatthieuBonnard commented Permalink

Great article Barry, <br /> Just interrogation for me : <br /> We have a V7000 with 12 NSAS 3To configured with RAID6. <br /> Just one "Storage vMotion" (VMware) operation in destination of this array have a very big impact to the read latency of all the LUN on the same array. <br /> In the other hand, a big read operation on the same RAID6 array doesn't have the same impact for the other LUNs of the array. <div>&nbsp;</div> it's a real normal comportment of a NSAS RAID6 ? <br /> One RAID6 with just 11 disk is not enought ? <br /> Why massive write operation have more impact than a massive read operation ? <br /> If I create a SVC Pool with 4 of this RAID6 array, does I multiple by 4 the performance (IOPS, latency ?) <div>&nbsp;</div> With a 23 x SAS RAID5 we never observe this comportment. <br /> I'm very surprise of the performance difference between SAS et NSAS (or between RAID5 &amp; RAID6 ?)

8 orbist commented Permalink

Matthieu, <div>&nbsp;</div> So with it being a RAID-6 array, any write will be causing potentially the 6x overhead on the disks as above. As you have a 9+P+Q (11disk?) then you would want to do vMotion I/O of 9x256kb to guarantee full strides, and thus only have 1 write per disk for each 2304kb... <div>&nbsp;</div> Is the source and destination volumes on the same V7000? i.e. if you use vMotion with VAAI (xcopy) then you can offoad the clone to the V7000. <div>&nbsp;</div> Either way, when it comes down to it you probably have 11x 100 iops, so 1100 iops at the disk level. I also benchmark the 2TB NL-SAS at about 125MB/s when doing sequential writes. <div>&nbsp;</div> Its almost a catch 22, you want RAID-6 because you have NL-SAS drives, for protection, but NL-SAS have 100 ops vs 300 ops for 10K SAS - and RAID-6 adds 6x overhead on writes... <div>&nbsp;</div> It sounds like you are on the cusp, i.e. RAiD-5 could just cope with the vmotion where RAID-6 overloads the disks. One option would be to use the volume throttle feature (CLI only) where you can specify a peak limit on either IOPS or MB/s for each volume - set this temproarily during vmotion, however we recently noted that throttling + VAAI doesn't work -i.e. because we don't see the I/O at the upper layers, they can;t throttle. So if you use this option, (i.e. throttling) be sure to turn off VAAI)

9 orbist commented Permalink

Vahric, <div>&nbsp;</div> A few points here : <div>&nbsp;</div> 1. You can configure any array size you want, down to a 2+P IIRC, but you need to use the command line interafce for that, The GUI will use the defaults, or i think you can use the -configure storage- button on the -internal storage- page, select the number of component disks. Otherwise in the CLI : svctask mkarray -level raid5 -drives 0:1:2 -strip 128 <div>&nbsp;</div> 2. So you can go down to 128kb as a strip size, but again only in the CLI <div>&nbsp;</div> 3. I think EMC recommend 3+P due to the internal 756kb optimisation. Never got to the bottom fo whats magical about 756kb in their code, but seems to be a magic number. Hence 3x256kb + parity means full stride writes for 756kb incoming I/O <div>&nbsp;</div> 4. Its very rare for RAID to be stiped at smaller than say 32kb, so 4kb is always going to be a penalty if doing jus 4kb writes, need to read modify write 4kb in the old,new and parity blocks. <div>&nbsp;</div> 5. When we go to 6 core in V7000 (SVC CG8 nodes already ship with 6core CPU) yes, optimal will be 6 arrays+ for all the cores to be in use. AMD do have more cores per die, but tend to be a generation or so back on PCIe and memory support, and there are details like cache snooping etc that can be a nightmare. For now, the SVC based products will be sticking with Intel. <div>&nbsp;</div> 6. IBM has -chunklet- style RAID with the XIV products, and certainly as we move towards 4 and 5TB physical drives, traditional RAID starts to look creaky at the edges - wrt rebuild times of several days

10 MatthieuBonnard commented Permalink

Thanks a lot Barry, <br /> I just need your point of view : <div>&nbsp;</div> First, my theory <br /> My "11 x NSAS RAID6" is capable to 1100 IOPS. So a single host with a queue depth of 32 and an average response time of 20ms (on a NSAS ?) is probably capable to outmatch the raw capacity of my array (during a svmotion operation for example). Just one host is in capacity to kill the entire array. <br /> I suppose that it's the reason I can't observe the same comportment on my "24 x SAS RAID5" where a single host can't saturate the entire array. <div>&nbsp;</div> Then, the solution ? <br /> So, If I plan to create a Pool with 11 x RAID5 of 4 x NSAS (4 x Enclosure 3.5") for "Enclosure Loss Protection", I protect my array of a "single host massive operation" by keeping a little capacity to serve IO for other hosts. <div>&nbsp;</div> Do you think like me ?

11 VahricMuhtaryan commented Permalink

Thanks Berry, <div>&nbsp;</div> IBM point of view, what is the best raid5 option with 7+1 or 3+1 because 3PAR said that we active better performance then raid10 with 3+1, any workaround/test/case study about it on IBM site ? <div>&nbsp;</div> Until now i newer calculate and imagine stripe size and applications block size issues can cause problems because we have very mixed environment, windows servers default 4KB, vsphare5 new 8KB blocks , oracle customer can set other block sizes , then i believe storage venders have a option or function to consolidate writes and reads for become aligned each time or close to aligned isn't it or really any way to monitor what we are doing on V7000 and how optimize the alignment ? <div>&nbsp;</div> On SPC tests i saw that IBM put V7000s behind the SVC to gain max IOPS, i wonder which one is better put V7000 behind the SVC or but SVC and V7000 on same layer , because i believe that more controller , more memory better then put V7000 behind SVC, or not ? <div>&nbsp;</div> thanks

12 orbist commented Permalink

Matthieu <div>&nbsp;</div> As you say, a single host should be easily capable of saturating the R6 NL-SAS array. <div>&nbsp;</div> If you create a single pool with 11 arrays, you will still in theory be using all 11 arrays if you create a normal "striped" volume, as each array will be contributing to the volume. If you want to protect other workloads, the only way would be to create multiple pools, and then you will only impact volumes in the same pool. <div>&nbsp;</div> Note also "Enclosure Loss Protection", this was a conecpt introduced by FC-AL (DS) systems, where a single internal loop within an enclosure could take down all enclosures. <div>&nbsp;</div> The V7000 expansion controllers have no single point of failure. Everythin is dual redundant, the only single common component is the midplane and we carefully designed it to have no active components. <div>&nbsp;</div> You create strings, or strands of enclosures, down through each enclosure. You dont have the same complex contra-rotating cabling that was needed with DS4/5 (i.e. FC-AL) <div>&nbsp;</div> <div>&nbsp;</div> Do you think like me ?

13 orbist commented Permalink

Vahric, <div>&nbsp;</div> Because the strip size is 256kb by default (128kb if manually changed via CLI mkarray command) you will not see any gain from having 3+P over 7+P UNLESS you are doing large sequential host write operations. <div>&nbsp;</div> 3+P with strip size 256kb, would need 756kb host write to enable full stride. <br /> 8+P with strip size 128kb, would need 1MB host write for full stride etc. <div>&nbsp;</div> For random I/O smaller than the strip size, there is no performance advantage to configuring smaller numbers of component disks in an array. The same overhead is still needed, and infact if anything larger numbers of disks my help reduce latency on a busy array (as you have more disks providing the I/O and possibly lower chance of clashing I/O from another random workload) <div>&nbsp;</div> If you have 4KB and 8KB OS block sizes, and a general mixed workload, then really the array size is going to be dictated more by how many disks you have, how many spares you want to leave, and what divides as evenly as possible into a number like 4 or 8 etc <div>&nbsp;</div> We dont currently support SVC and V7000 being clustered together into the same cluster. A V7000 can be a remote copy target cluster for an SVC (and vice versa) and a V7000 can be storage behind an SVC. They cannot be peer nodes in the same cluster. <div>&nbsp;</div> With SVC, you want high performing high reliability storage at a reasonable cost. Therefore V7000 is IDEAL storage system to put behind SVC. The additional caching in the two layers, is like a L1/L2 CPU cache etc i.e. you get the most common data hits in top level (SVC) allowing the V7000 to potentially cache additional data that wouldn't have otherwise stayed in cache. <br />

14 VahricMuhtaryan commented Permalink

Hi, <div>&nbsp;</div> From your words i understood that penalty always happen independently 3+1 or 7+1 and yes big issue is we do not have seq load and for reduce the load create more big arrays like 15+1 , 31+1, I don't know what is the limit on V7000. <div>&nbsp;</div> Then instead of deal raid5 and number of disk in array, using raid10 with V7000 could be more right way ? I get a feedback from one of other vendor, if you use raid10 instead of raid5 then maybe you can use less number of disk then raid5 <div>&nbsp;</div> Regards <br /> VM

15 MatthieuBonnard commented Permalink

Thank you Barry, <br /> We are in the case of multiple V7000 behind a SVC cluster. <br /> I just look in the SVC Performance Guideline and I have seen that you recommend to configure extend size of 1GB on the V7000 pool presented on the SVC. <br /> Why this configuration ? Is-it really a best practice ? <br /> Matthieu.

16 orbist commented Permalink

Matthieu, sorry for such a long delay, the 1GB extent on SVC is because you are already getting the wide stripe benefit from the V7000 underneath it - so adding more striping doesn't gain much. Main thing is to make sure you dont use the same extent size at both levels, this could lead to situations where the double stripe ends up with all data on one array etc.

17 David90 commented Permalink

Hi Barry <div>&nbsp;</div> thanks for article ... in part 4 do you plan to speak about grainsize setting on vdisks? As we noticed different behaviour in terms of IOPS / backend response time when tuning grainsize on SEV, we would be very interested in having your point of view / experience ./ recommandation on that point <div>&nbsp;</div> Thanks a lot <div>&nbsp;</div> Kind regards <br /> -- <br /> david

18 Montecito commented Permalink

Hi Barry, <div>&nbsp;</div> If I am creating an easy tier solution. Is it best to split the arrays like 2 raid 5 array of 3 drives or create one array of 6 drives in raid5 ? <div>&nbsp;</div> I have been told that splitting the array gives better performance as then ssd arrays are assigned evenly to both the storage nodes. Is it true ?

19 orbist commented Permalink

Hi David, yes i must get part2 up first ! But I can discuss grain size considerations when we get there.

20 orbist commented Permalink

Montecito, <div>&nbsp;</div> You need to be careful in terminology / affinity here. As an array is an IO Group entity, both nodes can access the same array. Each drive is dual ported, and so each node has equal bandwidth and connectivity to each drive. <div>&nbsp;</div> Arrays within a node are assigned to cores. So if you only create one array then you are only using the one core (on each node). Create 4 arrays, and you are using 4 cores on each node (8 total). <div>&nbsp;</div> For easy tier, you will be able to push higher IOPs from the arrays if you have enough SSD. In your case, with 6 SSD ( presume), then a single core will be able to sustain the IOPs those drives can provide. Assuming you then have other HDD arrays, they will use up the other cores in the system. <div>&nbsp;</div> Trade off between the required IOPs (and capability of the SSD drives you have) and the number of drives you are willing to lose through RAID redundancy (parity)

21 Montecito commented Permalink

Barry, <div>&nbsp;</div> Thanks a ton for the response. I have learnt alot from your blogs. I request you to please start with part 2 asap. Been waiting for it from a long time. I bet it will be very interesting. Please do cover easy tier in it as well.

22 tabin commented Permalink

Hi Barry <div>&nbsp;</div> Let assume this situation: <br /> - host wants to write to SVC/V7000 with I/O=256KB <br /> - SVC/V7000 always tries to write to phisical disks with I/O=32KB <br /> - we have 4 mdisks in mdiskgrp (4 raids in v7000 or 4 lun's in SVC) with extent 128MB <br /> - each raid/lun is from an array of 5 disks raid5 (4+P) with stripe unit 128KB on each disk <br /> - we have one striped vdisk from this mdiskgrp mapped to one host <div>&nbsp;</div> And now <div>&nbsp;</div> When hosts writes to vdisk one I/O=256KB this goes down to SVC/V7000's cache. It will be placed in only one extent in mdiskgrp, thus in only one mdisk, thus in only one array or LUN. The SVC/V7000 cache algorithm always wants to write to phisical disks an I/O of maximum size of 32KB. So it will split this 256KB to 8x32KB I/Os and write them down to array. The array size is 4 data disks and stripe unit size is128KB so the array stripe size is 4x128KB=512KB. So how those 8x32KB I/O will be handled? Will it write 2 blocks to every disk with full stripe write making the stripe units half empty? If the stripe unit is not fully filled will it be usable in the another I/O? Or maybe will it write 4 block to first disk and 4 blocks to second disk, not making full stripe write and leaving 2 disk empty but making first two stripe units full? <br /> And if extent size is 128MB will SVC/V7000 have to make 512 (256KBx512=128MB) of this I/Os before writing to the next extent or not? Does an extend have to be filled before going to another extent, like in the sequential vdisk manner? <div>&nbsp;</div> Correct me if I'm wrong, I am just curious how this works in great detail :)

23 orbist commented Permalink

Hi tabin. <div>&nbsp;</div> At 6.1 we modified SVC to read and write up to 256KB I/O to mdisks. <br /> For V7000 it will read up to 256KB and write up to a full stride. i.e. at most 15x 256KB = 3.7MB <div>&nbsp;</div> SVC has no knowledge of what makes up a managed disk, so you will just read or write 256KB chunks. The storage controller itself (say its sequential I/O) has the ability to do whatever it normally does when any normal host reads or writes sequentially. <div>&nbsp;</div> Although its one extent, its one extent on an mdisk, which is in itself an array of disks. <div>&nbsp;</div> With V7000 we know the stripe size, i.e. how many disks there are, so we can build full stride writes in cache before destaging a single large write. <div>&nbsp;</div> For reads we will still just send 256KB, this could span multiple strips in a stripe - if its mis-aligned. So the RAID layer will split into the individual I/O needed from each disk.

24 orbist commented Permalink

Part 2 finally posted : https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/entry/configuring_ibm_storwize_v7000_and_svc_for_optimal_performance_part_26?lang=en

25 CarlosMZN commented Permalink

Hi Barry <div>&nbsp;</div> I have a question. <div>&nbsp;</div> Using a stripe unit of 256 Kbytes can saturate easier V7000's back-end ? <div>&nbsp;</div> Imagine the following situation: <br /> - RAID5. <br /> - 100% write <br /> - 100% random (so i suppose no full stipe writes are possible). <br /> - small blocks (for example 8 Kbytes). <br /> - 3.000 host iops <div>&nbsp;</div> The 3.000 host iops with raid5 penalty (i suppose 4: 2 reads and 2 writes), will produce 12.000 iops at back-end. <div>&nbsp;</div> 12.000 IOPS x 256 KBytes (stipe unit) = 3.000 MB/s (throughput at back-end) <div>&nbsp;</div> 3.000 MB/s = 3 GB/s aprox = 24 Gbps aprox. <div>&nbsp;</div> I suppose V7000 maximum back-end bandwith is: 2 nodes x 2 adapters x 6 Gbps SAS = 24 Gbps. <div>&nbsp;</div> So the maximun random write IOPS would be 3.000 aprox (even using SSD). <div>&nbsp;</div> I think i'm wrong, but could you show me where ?

26 orbist commented Permalink

Carlos, <div>&nbsp;</div> Your mistake is thinking we have to write 256KB every time. If you only write 8KB, then we only need to merge the 8KB into the existing strip/stripe. So you end up with 4x 8KB I/O <div>&nbsp;</div> So its still an IOPs limit of the disks you will hit, not a MB/s limit of the controllers.

27 kikeemm commented Permalink

Hello, <div>&nbsp;</div> Thank you for your post. I have a question regarding performance on V7000. We are experiencing the same performance when writing small random blocks on different RAID configurations and sizes: <div>&nbsp;</div> test config <br /> 1 SSD 1 Mdisk 5+P (R5) <br /> 2 SAS x 2 Mdisk 7+P (R5) <br /> 3 SSD x 2 Mdisk 2+P (R5) <br /> 4 SSD x 3 Mdisk d+P (R1) <div>&nbsp;</div> Small random 100 % writes (4K) <br /> test 1 2 3 4 <br /> IOPS 3860 3650 3830 3866 <br /> MBps 15 14 15 15 <div>&nbsp;</div> Have you encounter this issued before? <div>&nbsp;</div> Thank you again and regards, <div>&nbsp;</div> Miguel

28 PekkaPanula commented Permalink

Great article. <div>&nbsp;</div> I have a couple question. <div>&nbsp;</div> We have a new expansion enclosure, with 24*10k 600Gb SAS drives. <br /> Whats the best practice to spit it to arrays? <br /> What we have now is: 2*RAID6 arrays, 12 disks each. <div>&nbsp;</div> We are trying to figure out couple things. Spare disks, it seems spare disks are per enclosure ? I have 2 spares on control enclosure, but array properties shows out that expansion enclosure arrays are without spare protection. <div>&nbsp;</div> So: if i do 12 disks R6 and then 11 disks R6 and join them as Pool, do i get penalties? <div>&nbsp;</div> Also: you are telling that 4 arrays is optimum, so what u recomend we should do to get good balance between capasity and performance wise? <div>&nbsp;</div> Workload based, we run XenServers and V7000 is our storage there, so workloads are sort of mixed.

29 rate commented Permalink

As I understand it (and I might be wrong), the LUNs are placed on top of the MDISK groups, so that all harddrives will be utilized when performing reads e.g. Is that completely wrong? <br /> And if it isn't wrong, then how do one select the proper stripe size?

30 rate commented Permalink

To add a little more info on my last post: <div>&nbsp;</div> We will be running VMware ESX 5.1 on top of the V7000. Now, the VMFS operates at 1024k chunks, so with a 8+1P RAID 5 I guess the stripe size should be 128k. <div>&nbsp;</div> That leeds me to two questions: <div>&nbsp;</div> 1) Why are the default IBM recommendation 8 disks in RAID5 when it will never add up to anything (stripe wise) due to the parity disk? <div>&nbsp;</div> 2) When the volumes are "floating" in your storage pool, how is it even relevant regarding stripe size, as one would never know which disks are being read from? <div>&nbsp;</div> Hope this makes sense, and that you'll give an answer - we are about to place an order on this baby :)