<   Previous Post  Introducing the IBM...
Next UK (EU) SVC...  Next Post:   >

Comments (72)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

61 BAKerns commented Permalink

Open question regarding the latest 6.4 code on the V7000. I can not find info on hot sparing recommendations nor how many drives can/should be in a storage pool. Last summer(2012) I upgraded and used the 6.4 code on a DS3524, since all the drives were the same I made one big storage pool. I had 4 enclosures so I designated 2 hot spares. <div>&nbsp;</div> Should I use the same metrix 1 Hot Spare for every 2 enclosures? <div>&nbsp;</div> What is the recommended number of drives in one storage pool? <div>&nbsp;</div> I'm going to be purchasing a V7000 with 68ish 600GB and 105ish 900GB disks. <br /> Thanks for any and all help <br /> Brett

62 sas234 commented Permalink

Hi Barry , <div>&nbsp;</div> Is it mean the small extent is not suitable for easy tier current logic ? <br /> hmmm it's quite difficult to figure out the best extent size ....:( <br /> suppose if we make bigger extent size , it will be disadvantage for virtualization , and not all data needed by system will be moved to SSD tier . <br /> with small extent ,suppose if we can make shorter time frame for dpa logic not in 24 hours , there will be more movement from HDD tier to SSD tier and vice versa , and that will be disadvantage for SSD nature . <br /> is that mean 256MB extent size ( default from IBM ) , is suite for almost all system ? <br /> OMG ...... I have to scratch my V7000 ... :( :( :( <br />

63 sas234 commented Permalink

Hi Barry , <div>&nbsp;</div> Please ignore my previous comment , just check my latest dpa file .....and the result ........ <br /> ....... 2 days ago , we got our "first" hot extent ....... , 0,3GB on log vdisk , and 3 GB on data vdisk ...... <br /> uhuiiiiiiiiiiiiiiii ...... <br /> Thank you........

64 Robby-TX5 commented Permalink

Hi Barry, <div>&nbsp;</div> we are developing a small performance monitoring tool for V7000 and SVC. We collecting the perf-data via smis cimom. now we have the this litte efect. <div>&nbsp;</div> V7000 READIO = 784 (diff of two consecutive readio values / interval length) <br /> V7000 READHITIO = 1636 (diff of two consecutive readhitio values / interval length) <div>&nbsp;</div> so we have a unbelievable 208% cachehitratio. <div>&nbsp;</div> Zhu Yang (IBM China) give us the desciption of the cache values, which are: <div>&nbsp;</div> ibmtssvc_storagevolumestatistics <br /> READHITIO = The cumulative count of all read cache hits (Reads from Cache), which represents the number of track reads received from components above that have been treated by cache as a total hit on prestage or non-prestage data. <div>&nbsp;</div> So please could you explain, what the hell a track is, or howto work with READHITIO? <div>&nbsp;</div> Many thanks in advance <div>&nbsp;</div> Bjoern <br />

65 orbist commented Permalink

BAKearns, <div>&nbsp;</div> General rule is to have at least 1 spare of each drive speed / size in each SAS chain. That is there are 2 SAS chains (ports) where you attach enclosures. So on each chain you should have a spare of each type. <div>&nbsp;</div> Therefore for a system using both chains, 2 of each drive speed / size. <div>&nbsp;</div>

66 orbist commented Permalink

Robby, <div>&nbsp;</div> A "track" is a 32KB contiguous LBA range in the cache. So if for example you have an I/O size of 4KB and happen to hit all 32KB of that track, then its going to register 8 track hits... <div>&nbsp;</div> IIRC there should be an IO cache hit measurement, which is the number of host based I/O that end up as cache hits, I forget the wierd naming for all these stats. <br />

67 Kareem_in_Toronto commented Permalink

Hi, we have a large multi cluster (7 clusters) SVC installation. On our older/first eight node production cluster at our primary data center we have about 500 hosts and 5000 vdisks and we are having performance issues due to the utilization of the cluster and the fabric and with older 2/4 GB hosts. <div>&nbsp;</div> Additionally, on this older highly used cluster, we often get a badly behaving host - for example a sql or oracle job that suddenly starts requesting a sustained/cache hostile read workload of 400-1000MB/sec. When we see this we get complaints from other applications that their jobs are running at about 1/4 their normal speed. We engage the appliaction teams who then engage their vendors, but this process to get a fix (if we get one at all) takes weeks if not more than a month to develop, test and deploy thru the dev/uat/prod cycle. <div>&nbsp;</div> Can we use vdisk throttling to limit the performance impact the host is causing the rest of our production environment? We have been told by Hursley SVC Developers that this isn't recommended. So I'm wondering what is the designed use for vdisk throttling and how to better reduce the performance impact? <div>&nbsp;</div>

68 orbist commented Permalink

Not sure why someone in dev would say thats not a good idea? Thats exactly what the throttle function is for. Leave the IOPs throttle alone, but for those "offenders" that maybe run a batch job etc that consumes huge bandwidth (compared to normal) set a 200MB/s limit or such. <div>&nbsp;</div> Thats why the function is there, email me - if you want follow up discussion barry.whyte (at) uk.ibm.com

69 Kareem_in_Toronto commented Permalink

Thank you for the help regarding vdisk throttling. We have used it a few times now to control a few high throughput hosts with suspected bad queries. From my digging into the concerns around vdisk throttling, it was that we may exhaust the internal i/o queue on svc which is 10,000 per node, but support looked at our livedumps and found that the max we hit as 6,000 on one node. <div>&nbsp;</div> I have another question regarding hba queue depth and how to calculate what is appropriate for a given environment as there doesn't seem to be a definitive guide. The defaults which we use for Intel are generally 32 per lun, and for unix it is 256 per lun, except when using veritas and then is it set to 20. For SQL for example I've seen that Microsoft recommends is to increase it from 32 up to 128 or 256 (I'm assuming they assume the db is on 1 lun). For oracle I've seen them recommend that you divide 256 by the number of luns presented to the host. <div>&nbsp;</div> From the IBM SVC host attachment guide "Homogeneous queue depth calculation in Fibre Channel hosts" when I plug in the variables using the formula q = ((n * 7000) / (v * p * c)) <div>&nbsp;</div> I get this for our environment assuming all hosts are somewhat busy: <br /> q = ((8 * 7000) / (4831 * 4 * 1) = 2.89 <br /> •q = the queue depth per device path <br /> •n = the number of nodes in the system <br /> •v = the number of volumes configured in the system <br /> •p = the number of paths per volume per host. <br /> •c = the number of hosts that can concurrently access each volume. This number typically is 1. <div>&nbsp;</div> I checked some of our intel sql clusters and they usually have 8-16 sql instances and for lun count they ranged from 10 to 155 luns. For Oracle farms their lun count is either low from 2-10 range, or high in the 50 to 130 luns range. <div>&nbsp;</div> Is the default queue depth too high in our environment, and what should we expect to see as the result? Currently we do see high latency from applications especially when the environment is busy. All the research I've done points to the fact that it is probably too high and adding to our latency numbers that the application see. <div>&nbsp;</div> Thank in advance for your help, <br /> Kareem

70 orbist commented Permalink

Kareem, <div>&nbsp;</div> The queue depth at host calculation assumes that all volumes are being accessed equally heavily. It also assumes an even load across all nodes in the cluster which is rarely true! <div>&nbsp;</div> You could do some more targetted calculations by working out per I/O group numbers. How many vdisks in that I/O group, and reduce the number of nodes down to 2. This may then vary the "average" per system. You may also want to cut down the number of vdisks to a proportion that are constantly active / busy. <div>&nbsp;</div> In general queue depths of around 20 should be sufficient, both to drive a decent I/O queue into the system, and to not cause too much queueing. <div>&nbsp;</div> At the end of the day, if the disk systems can't cope with the concurrent I/O load being requested of them, then you have to queue somewhere. In some ways it can be better to queue in SVC, so keeping a reasonable host queue depth. Otherwise you just end up queueing in the host instead. As long as you aren't going over the 10,000 concurrent I/O per node, then SVC will happily queue I/O requests and they are then down in the storage system rather than being held off in the host. <div>&nbsp;</div> If you have access to TPC data or the XML stats info, look for queue times on the mdisks themselves, this will show if we are submitting as many I/O as we can to the backend disks, and are having to queue at the backend of SVC waiting for disks to complete outstanding I/O. this would suggest more disks or more storage controllers themselevs would actually help reduce latency.

71 sas234 commented Permalink

Hi Barry, <br /> on V7000 striping vdisk configuration , do we still need spreading the LV across all the LUNs ( AIX command mklv -t jfs2 -e x -y ) in term of performance ? <br /> Thank you

72 pgr commented Permalink

Hi Barry, in relation to migrate a storage system that is no longer used afterwards. In this scenario, are we bound to the support matrix? For example: Migration from Datacore to SVC with V3700. Thanks patrik