<   Previous Post  Still here
Get the painters in  Next Post:   >

Comments (6)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 localhost commented Permalink

Very helpful - thanks!<div>&nbsp;</div> I would be interested in knowing more about what coalescing SVC does (if any) to try to make life easier on the back end controllers, or even if any difference has been noted. <div>&nbsp;</div> Also interested in cache impacts of inter-cluster links, specifically what happens when they go away/have problems.<div>&nbsp;</div> Stuff to be discussed in future perhaps?

2 localhost commented Permalink

Thank you for the redpaper.<div>&nbsp;</div> Now I'm wondering if TPC knows anything about this cache-partitioning. We can produce many reports about the cache of the SVC,Nodes and I/O groups but I cant find anything about partitions. Are there any metrics to look at?<div>&nbsp;</div>

3 localhost commented Trackback

Very interesting and useful - thanks.<div>&nbsp;</div> I'm not sure where you've gotten the impression that Hitachi doesn't have cache partitioning or that it's not advised to be used for external storage?<div>&nbsp;</div> Maybe I've misunderstood what you're trying to say...but I've heard exactly the opposite from end-users - that the "best practice" is really to put at least as much cache in the partition that external storage is assigned to as the external storage itself has. <div>&nbsp;</div>

4 localhost commented Permalink

Sharbu,<div>&nbsp;</div> SVC will coalesce any writes that occur within tracks in the cache (if they are contiguous) So for example a (strange) but sometimes seen 512 byte sequential workload would result in a 32K track being destaged as a single write to the controller. If however you have two non-contiguous pages in a single track, we still have to do two 4K writes to the controller.<div>&nbsp;</div> Inter-cluster latency is important, as SVC uses the fabric to mirror writes between partner nodes. The TPC "node to local node latency" statistic is one to keep an eye on. Normal values should be less than 0.5ms. This is one of the main reasons we recommend you don't have ISL's between nodes in an IO group. All too often if you do, and the fabric is overloaded, HOLB and other congestion issues can impact write performance. I could go on at length on this one, and maybe its worthy of a post of its own! Especially when we inherit an EMC designed core edge edge design - it may work well for EMC direct attached boxes, but you need to rethink your thinking not only about storage provisioning, but SAN design when you implement an SVC cluster.<div>&nbsp;</div> TMasteen,<div>&nbsp;</div> At the moment we don't have any specific partitioning stats in TPC - there is a lot of internal stats we keep, and the next step is to export them from the cluster internals to the XML and hence TPC data. If you are wanting to check if you are hitting partition limiting (in a 4.2.1+ SVC implementation) - other than general poor performance from vdisks in one mdisk group, then the "delayed write cache" statistics will be incrementing. <div>&nbsp;</div> BarryB,<div>&nbsp;</div> Hopefully not useful in a bad way ;) - as for HDS, I stand corrected, I'd been told they didn't support cache above external storage - maybe this has changed since I was given this info - will look into.

5 localhost commented Permalink

I didn't make myself clear on the inter-cluster links question, what I meant to type was "does the cache partitioning have an impact on replicated (MM or GM) workloads". Same letters, different order. <div>&nbsp;</div> With the replication layer sitting above cache I am scratching my head trying to figure out the impacts of either the source partition or the target partition reaching capacity limits.

6 localhost commented Permalink

Sharbu,<div>&nbsp;</div> OK, sorry wrong end of he stick!<div>&nbsp;</div> So if either mdisk group in either cluster is overloaded, then yes you could impact replication performance. However, I'd hope that the workload targeted at these business critical mdisk groups would be well within tolerance, and its only in the event of a severe problem at the storage level (in which case GM link tolerance would have fired) that performance would be impacted (as it would have before cache partitioning) but the impact would be limited to vdisks using the faulty controller.