<   Previous Post  SVC - Business Probl...
Kicking up a 'flash'...  Next Post:   >

Comments (28)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 localhost commented Trackback

This comment will surely surprise you: <div>&nbsp;</div> CONGRATULATIONS!<div>&nbsp;</div> This is an extremely important acheivement of the sort I personally envisioned back when we started on our path to flash. And there will undoubtedly be several more milestones as solid-state persistent storage technologies make their way into commercial storage solutions. You and the Almaden team are to be commended.<div>&nbsp;</div> But this really isn't an arms race, or even an inter-vendor battle. Practical use cases must align the cost of the technology and with both the prformance and capacity requirements. I suspect there's lots more work for us all to do before we have just the right balance of RAM, NAND and spinning rust that will be required for mass-market appeal.<div>&nbsp;</div> One note about my presentation, though - "EFD" doesn't stand for EMC flash drive - the "E" is for "Enterprise." True, many of our field folk have taken to referring to the STEC drives EMC flash drives, but that's more because EMC is still the only place you can buy them in an array.<div>&nbsp;</div> Oh - and those mixed workload issues you not-so-subtly hinted at...there are ways to mitigate those with a little bit of old-fashioned innovation and some integration between the drive and the array microcode. You guys might not have figured that out yet.<div>&nbsp;</div> But honestly, congrats on the accomplishment, and thanks for joining in the efforts to make flash a commercial reality!

2 localhost commented Trackback

I thought that the SVC 4.x code was rated for something under 300,000 IO/s with like 8 nodes? How did you get past that?

3 localhost commented Permalink

BarryB,<div>&nbsp;</div> THANKS! Look out for more news soon. And thanks for the correction re EFD.<div>&nbsp;</div> OSSG,<div>&nbsp;</div> As with all things performance it is never that simple.<div>&nbsp;</div> The 8 node SPC-1 cluster did achieve just under 300K SPC-1 IOPs. However SPC-1 is closer to an 8K 40/60 workload.<div>&nbsp;</div> In my internal benchmarking a 70/30 4K all miss workload will achieve around 120K IOPs per node pair. This is with the cache enabled, so writes are being mirrored. If we run a pure read miss workload, then we get just over 200K IOPs per node pair. <div>&nbsp;</div> As the FusionIO cards give excellent response time we could disabled the cache in SVC for these tests, this reduces the work each node has to do, and the traffic on the fabric as there is no write mirroring in progress. This pushes the 70/30 4K number to almost that of a read miss workload - assuming the backend storage can cope - which in this case it could. <div>&nbsp;</div> The SVC cluster is running a subtly modified version of the SVC code which removes some of the configuration limits. As I've said before the cluster code designed for &gt;64 node clusters, but our official GA test and support statement is for up to 8 nodes...

4 localhost commented Trackback

Which begs the question, if SVC is essentially on complete passthrough mode, where's the value add? Wouldn't a company be able to just buy the same number of SSDs and attach them directly to hosts to get the same performance?

5 localhost commented Permalink

So this particular config had only SSD behind it, but thats not going to be a real life config for some years. You forget all the benefits that SVC brings, you can attach normal HDD based controllers too and can migrate hot / not hot data between HDD and SSD and back, you can FlashCopy, Mirror, Thin Provision etc and you can still use the cache for the HDD products, since we provide per vdisk control of caching. Not to mention the single point of management provisioning etc from Tiers0 through 3 depending on the needs of the application / host.

6 localhost commented Permalink

PS. One other thing we had to spend a lot of time tuning was the optimal data rate / queue depth to be maintained at the flash devices themselves. Given that you want to keep the flash busy enough to get the best out of the available channels, while not overloading it and causing potential issues with the algorithms performing garbage collection, wear leveling etc - this work has been done and the SVC backend queuing algorithms configured to sustain workloads within the optimal ranges (as we do for all storage controllers we support) - thus SVC is performing this work for you, and you don't need to tune each host in turn for the workload required - its handled by SVC.

7 localhost commented Permalink

Yeah, that's cool of course. But how a you going to replace the PCI flash card if it fails? And, by the way, you havent implemented any RAID on that cards?

8 localhost commented Permalink

True, the low level card does not support RAID, however SVC 4.3.0 introduced Virtual Disk Mirroring, so you can mirror across two flash controllers, which not only protects against controller failure, but also allows you to replace a card should it fail - while maintaining online access.

9 localhost commented Permalink

I have been to the DS5000 announcement today, a great future is ahead of us! The solid state disk story will now receive a boost since the first results are made...<div>&nbsp;</div> Can't wait to be able to test it myself !!!<div>&nbsp;</div> greetings<div>&nbsp;</div> ps: SVC is not only able to mirror the VDisks, you can also do RAID 0 with it. With these two functions you can do a sort of RAID 1+0 accross the SSD disks.

10 localhost commented Trackback

Barry, thanks for your comment correcting my error in saying this was an SPC-1 benchmark. My bad.

11 localhost commented Permalink

Congratulations, obviously this is what Mr Legg was hinting at when he dropped into see me recently when I was quizzing him over Flash Disks. However do you not feel that VDM for Flash controllers might be a bit overkill i.e any intention to support different RAID levels at an SVC level. Obviously at that point you've pretty much built a completely abstracted disk-controller and that begs a number of questions!

12 localhost commented Trackback

OK, not to pile on, but since the issue was raised by the OSSG:<div>&nbsp;</div> You've demonstrated that you can get over 1M IOPS with some number "N" greater than 8 specially-tuned SVC nodes operating in pass-through mode with very low latencies (albeit with no RAID protection for the flash devices, it seems).<div>&nbsp;</div> What is the impact on IOPS and latencies if you are using all the "value-add" features you say justifies using the SVC instead of JBOD: migrations, Flashcopy, Mirror, thin provisioning, RAID protection, etc.?<div>&nbsp;</div> Don't get me wrong - it is very interesting to know how fast you can go without any of the features turned on. The real question, however, is how fast you can go in a more realistic operating situation.

13 localhost commented Permalink

MartinG,<div>&nbsp;</div> Obviously I'm not at liberty to confirm or deny any future plans, thoughts, ideas, concepts or such on a public forum such as this, next time Steve is with you, ask for the roadmap details.<div>&nbsp;</div> BarryB,<div>&nbsp;</div> So maybe I wasn't clear about the 'tuning' side - we do the same 'tuning' for every storage controller - including DMX, CX, DS4K, DS8K etc - to ensure we get the best out of the storage controller. So from that point of view we did nothing new. <div>&nbsp;</div> Passthrough is a bit of a strong term, so the cache was disabled and is working in write through mode, but all the code stack is still there and I/O is processed through the system as per normal, striping, virtualizing etc. This is SVC code as installed at any customer today.<div>&nbsp;</div> Advanced functions will depend on what the source and target storage are, so a 700MB/s capable fusion IO source can obviously read a lot quicker than a 100MB/s capable HDD target. Whereas a migrate from flash to flash would be able to sustain much higher rates. As with any additional workload (as is generated by advanced functions) the backend has to be able to sustain the combined throughput - application I/O and function I/O - so I would expect there to be an increase in response time, and drop in top end throughput unless you added more backend capability. (I'm sure this is the case with your EFD's in a DMX too). As the SVC node hardware used was not running even close to saturation point, there is plenty of MIPs left to ensure that 1M at similar response times would still be possible - given adequate backend flash capability. <div>&nbsp;</div> PS There are a few nice side effects of flash when using SEV (as I'm sure you are aware) especially for a fine-grained solution such as ours, which can seriously reduce any performance impacts when using SEV - even when the vdisks are provisioned from traditional HDD.

14 localhost commented Trackback

Thanks for the explanation. But I'm still thinking that when you're running FlashCopy, Mirroring or mirroring with cache enabled, there must be additional memory-memory copies and redirections that the CPU's have to handle than they do in "pass-through" mode. Thus, it's not enough to add more flash drives; the load on the processors is increased by the "features", right?<div>&nbsp;</div> Am I misunderstanding this?<div>&nbsp;</div> And yes indeed, the real beauty of flash drives is that they can support a much higher access density than can HDD. Thin Provisioning can leverage this to deliver capacity efficiency without sacrificing performance (a true challenge on hard drives). And with flash it's no longer necessary to stripe database index tables across dozens of drives...it truly makes performance tuning a whole new ball game.

15 localhost commented Permalink

So by doing cache mirroring between nodes, there is no extra memory accesses on a given node, its simply the same buffer that is submitted onto the fabric (we can have multiple references to the same memory block) There is obviously additional code to run when doing FC or Mirroring, but when you enable the cache, this hides the additional latency of copy on write operations or doing 2 writes to the backend in a mirroring case.<div>&nbsp;</div> So yes, there is a longer code path through the node when you run advanced function, and yes that requires more CPU processing, but as I stated above we still had plenty on MIPs free for such processing, and doing the 4K style I/O we won't hit any bandwidth limits internally. Remember one of the key benefits of using fast moving Intel planar technology is that we can ride the technology curve. So 1.33GHz FSB, DDR2, PCIe etc etc - so these SMP multi-cored boxes maybe "thin" in the sense of 1U, but bandwidth within a single node is not an issue.

16 localhost commented Trackback

Gentlemen,<div>&nbsp;</div> This has been a heady and informative debate. Keep it up!<div>&nbsp;</div> I filed a short precis of it at<div>&nbsp;</div> http://www.eetimes.com/news/latest/showArticle.jhtml;?articleID=210300295<div>&nbsp;</div> Rick

17 localhost commented Permalink

Thx Rick.<div>&nbsp;</div> FusionIO have today release a press release covering the work we have been doing together :<div>&nbsp;</div> http://www.fusionio.com/PDFs/Pressrelease_IBM_Fusion.pdf

18 localhost commented Trackback

Yes, it was the Fusion-IO release that turned my attention to this page.<div>&nbsp;</div> One question: You say you prefer a custom approach to SSDs on servers. Did you make any modifications to the Fusion-IO cards beyond the use of your virtualization software?<div>&nbsp;</div>

19 localhost commented Permalink

Sorry for the delay, looks like the latest upgrades to the blog software has resulted in it not working with Firefox... (being investigated)<div>&nbsp;</div> So I had to run (expletive deleted) to add this...<div>&nbsp;</div> The FusionIO ioDrive is unmodified. <div>&nbsp;</div> There is more debate over on BarryB's blog, and some clarification of the 'points' he's making.

20 localhost commented Permalink

Barry,<div>&nbsp;</div> Very interesting results.<div>&nbsp;</div> Could you clarify how many Fusion IO drive cards were used for this test.<div>&nbsp;</div> Also what was the useable capacity of the "Virtual Disks" that the 1 Million IOPS was operated over?<div>&nbsp;</div> <div>&nbsp;</div> <div>&nbsp;</div>