Anthony's Blog: Using System Storage - An Aussie Storage Blog
I think this picture speaks for itself: Three XIVs. Three cities. Three way iSCSI.
If your a user of XIV, or your considering purchasing an XIV, then there is one tool that you will truly love. It's called XIVTop. The XIVTop application comes packaged with the XIV GUI and is one of the handiest add-ons I have ever seen. It lets you monitor your XIV in real time, seeing exactly how much IO or throughput is being achieved and at what response time (in milliseconds). You can immediately answer questions like:
The ability to get this information in real time is what makes XIVTop so invaluable.
So in the tradition of always pushing my boundaries, I thought I would create a narrated video about XIVTop. What I discovered is just how terribly hard doing narrated videos are: You need to write a script... you need to stick to the script... you need to not fluff any words.... you need to speak slowly and clearly and not start talking in a strange accent. I had trouble with all of these, so I made take after take after take after take, until I was heartily sick of the process. I have now got a much greater respect for newsreaders and film actors. This narration stuff is hard!
So please check out my final take. It's still far from perfect, but all feedback is very welcome. The only other thing that is quite strange is Youtubes choice of videos to watch after mine. Its worth watching just to see the list. I think the term performance confuses the algorithm.
I recently created a post about the XIV Host Attachment Kit (amusingly called the HAK). IBM has released an update to the HAK, taking us from version 1.5 to version 1.6. The updated versions, along with release notes and installation instructions can be found at the following links:
Whats changed you asked? Great question! Checking the Release Notes for each Operating System (which can be found in the links above), I found some common improvements to the HAK for every OS:
There are also several other fixes which are mainly common between Operating Systems. Given that a major part of the HAK are Python scripts such as xiv_attach, xiv_devlistand xiv_diag and given that the output and behaviors of these script are very similar for each OS, this is not surprising.
I installed the new version 1.6 HAK onto my 64-bit Windows 2008 server and found another pleasant surprise: When I ran the xiv_attach command it detected that my Qlogic driver was downlevel. In this example it detected I am running a Qlogic QLE2462 on driver version 9.18.25 and suggested I should instead run driver version 9.19.25.
I then tried out the xiv_devlist command, displaying volume sizes in both decimal (GB) and binary (GiB). Note the syntax I used to get the GiB output: xiv_devlist -u GiB
Finally I offloaded the output of the xiv_devlist command to a CSV file. Again please note the syntax as you may find it useful:
xiv_devlist -t csv -f devlist.csv -u GiB
You do not need to worry about which version of firmware your XIV is running. The release notes confirm HAK version 1.6 will work with XIV firmware 10.1.0, 10.2.0, 10.2.2, 10.2.4 and 10.2.4a, which should cover pretty well every machine in the world.
One final note: Under Known Limitations the release notes state that you should not map a LUN0 volume. This simply means leaving LUN0 disabled (which is the default). In the example below I start mapping volumes from LUN1 and have NOT clicked to enable mapping of volumes to LUN0. This should be the norm.
anthonyv 2000004B9K Tags:  xiv svc mdisk ds8800 v7000 maximum 2tb 2tib size storwize ds8700 10,599 Views
There was a time when 32 bits was considered a lot. A hell of a lot.
With 32 bits, you can create a hexadecimal number as big as 0xFFFFFFFE (presuming we reserve one bit).
Well... actually...no... that's 2 TiB, which most people would refer to as 2 Terabytes. Mmm.. Suddenly I am less impressed (still wouldn't mind that as a bank account though).
Now there are plenty of running Systems that still cannot work with a disk that is larger than 2 TiB. One of the more common is ESX. I am presuming this limitation is going to disappear, so Storage susbsystems need to be ready to create volumes that are larger than 2 TiB.
The good news is that with the May 2011 announcements, IBM is removing the last 2 TiB sizing limitations from its current storage products. There appears to have been some confusion in the past, so I thought I would go through and be clear where each product is at:
Firmware version 07.35.41.00 added support to create volumes larger than 2 TB. The maximum volume size is limited only by the size of the largest array you can create. This capability has been available for some time and hopefully you are already on a much higher release.
DS4000 and DS5000
Firmware version 07.10.22.00 added support to create volumes larger than 2 TB. The maximum volume size is limited only by the size of the largest array you can create. This capability has been available for some time and hopefully you are already on a much higher release.
DS8700 and DS8800
The DS8700 and DS8800 will support the creation of volumes larger than 2 TB once a code release in the 6.1 family has been installed. With this release you will be able to create a volume up to 16 TiB in size. The announcement letter for this capability is here.
The volume size on an XIV is limited only by the soft limit of the pool you are creating the volume in. This allows the possibility of a 161 TB volume.
SVC and Storwize V7000
These two products have two separate concepts:
SVC and Storwize V7000 Volumes (VDisks).
Prior to release 5.1 of the SVC firmware, the largest volume or VDisk that you could create using an SVC was 2 TiB in size. With the 5.1 release this was raised to 256 TiB, as announced here. When the Storwize V7000 was announced (with the 6.1 release) it also inherited the ability to create 256 TiB volumes.
Storwize V7000 Internal Managed Disks (Array MDisks).
Because the Storwize V7000 has its own internal disks, it can create RAID arrays. Each RAID array becomes one Mdisk. This means the largest MDisk we can create is limited only by the size of the largest disk (currently 2 TB), times the size of the largest array (16 disks). This means we can make arrays of over 18 TiB in size (using a 12 disk RAID6 array with 2 TB disks). Thus internally the Storwize V7000 supports giant MDisks. We can also present these giant MDisks to an SVC running 6.1 code and the SVC will be able to work with them.
SVC and Storwize V7000 External Managed Disks.
When presenting a volume to the SVC or Storwize V7000 to be virtualized into a pool (a managed disk group) we need to ensure two things are confirmed. Firstly you need to be on firmware version 6.2 as confirmed here for SVC and here for Storwize V7000. Secondly that the controller presenting the volume has to be approved to present a volume greater than 2 TiB. From an architectural point of view, MDisks can be up to 1 PB in size as confirmed here, where it says:
I recommend you go to the supported hardware matrix and confirm if your controller is approved. The links for Storwize V7000 6.2 are here and for SVC here. As of this writing, the list has still not been updated, but I am reliably informed it will include the DS3000, DS4000, DS5000, DS8700 and DS8800. It will not initially include XIV, which will come later. Please also note the following:
What to do in the meantime?
If your currently using an SVC or external MDisks with a Storwize V7000, then you need to work within the 2 TiB MDisk limit (except for Storwize V7000 behind SVC). The recommendation is a single volume per Array for performance reasons (so the disk heads don't have to keep jumping all over the disk to support consecutive extents on different parts of the disk). This can require careful planning. For instance using 7+P RAID5 Arrays of 450 GB drives makes an array that is over 3 TB. What to do in this example?
The answer is that where possible, create single volume arrays using 4+P or larger. If the disk size precludes that, then create multiple volumes per array and preferably split these volumes across different pools (MDisk groups).
Anything else to consider?
Well first up, will your Operating System support giant volumes? Googling produces so much old material that it becomes hard to nail down exact limits. For Microsoft, read this article here. For AIX check out this link. For ESX, check out this link.
Second of course is the consideration of size. File systems that utilize the space of giant volumes could potentially lead to giant timing issues. How long will it take to backup, defragment, index or restore a giant file system based on a giant volume (the restore part in particular)? Outside the scientific, video or geo-physics departments, are giant volumes becoming popular? Are they being held back by practical realities or plain fear? Would love to hear your experiences in the real world.
And a big thank you to Dennis Skinner, Chris Canto and Alexis Giral for their help with this post.
As you would expect, the IBM XIV supports a very wide range of Host Operating Systems. Even better, for most of these Operating Systems, IBM makes available (free-of-charge) a multipathing kit to install on these hosts. We call this the Host Attachment Kit, or HAK. You can find all of the available Host Attachment Kits at the IBM Support site found here. You will find HAKs for AIX, HP-UX, Linux, Solaris and Microsoft Windows.
What is important is that if the HAK is available for your Operating System, we need you to always install it on every host that attaches to IBM XIV. We ask this for the following reasons:
Here is an example of what xiv_devlist will tell you. In this example I have run it on a Windows 2008 machine, but the output is basically the same regardless of host operating system. You can see the operating system identifier (the Device as reported by the operating system, in my example PHYSICALDRIVE0), the name of the volume (as seen on the XIV, in my example W2K8X64-H02_BOOT - Exchange) and the serial number of the XIV providing the volume (in my example 6000081)
The operating system device identifier lets you map an XIV volume from XIV to host. So in this example, I know that the Windows (C drive, which is Windows Disk 0, maps to a volume on the XIV known as W2K8X64-H02_BOOT - Exchange.
And to finish, there are several other commands that are very helpful. For instance thexiv_fc_admin -P command will tell you your WWPNs.
C:\Windows\system32> xiv_fc_admin -P 21:00:00:0d:60:13:b0:8c: [QLogic IBM FCEC Fibre Channel Adapter]: IBM FCEC 21:00:00:0d:60:13:b0:8d: [QLogic IBM FCEC Fibre Channel Adapter]: IBM FCEC
Another useful command is xiv_fc_admin -R because it rescans your bus. In some operating systems it is not obvious how to do this (other than reboot of course).
The nice thing is that regardless of your host operating system, the commands are the same. This is possible because they use the Python programming language. You may notice Python being installed as xpyv when you install the HAK (it is so named to ensure it doesn't interfere with any other Python installs you have).
So please install the HAK on every host that attaches to XIV. You will be making everyones life a lot easier (especially your own).
Oh and by the way, you can confirm whether your Host Operating System can be attached to the XIV by consulting the IBM System Storage Interoperation Center (or SSIC). If the HAK is not available for your Operating System, the SSIC will list other Vendor approved multipathing solutions (such as Veritas DMP).
A quick blog post about XIV call home..... As with most IBM products, the XIV can call home to IBM using e-mail notifications. I still meet people who call this dial-home, which reflects the 20th century practice of using modems to provide a Remote Support Facility (RSF). The e-mail notifications sent by the XIV allow IBM to track any issues that may occur and respond where appropriate.
This is all good, provided IBM know how to get hold of you if there actually is an issue. I had a situation recently where our internal client records had an out-of-date phone number. This led to a delay in problem resolution, a delay which was avoidable.
One way to help prevent delays is by keeping the XIV up to date with your contact details and as usual, the XIV GUI makes this easy.
From the XIV GUI, head to the Support menu as per the screen capture below:
Actually don't hesitate to fill in ALL the tabs, but the point of this exercise is to at least ensure IBM knowwhere the machine is and who to call.
Its worth ensuring the XIV is updated if your support center phone numbers change, or if you relocate the machine to a different site. At some client sites, I find the primary contact is a single person (whose mobile number sadly ends up being the 24 hour storage help desk). If you are that person.... and your leaving the company.... ensure your name and number gets updated by your replacement. After all, its one thing to have IBM calling you at 3am when you manage the machine... but to be rung after you have left the company? Mmmm... thats just plain annoying.
Storage IT offers up many choices, some of which provoke argument so heated, you could almost describe the adherents as religious. I think you might know the sort of arguments I am talking about:
OK.... so maybe that last one isn't quite in the same league. But it is still fascinating to see the variation in usage patterns from sites where every command (of any description) is run via a command line interface (a CLI), to sites where the CLI is viewed with either great fear... or even greater distaste. There are those who view the CLI as... well... so 1970s.....
But the reality is that the CLI will always be with us for one principal reason: scripting. If you cannot script it, you cannot automate it (well actually thats not true, but stick with me here, I am on a roll). Every single major implementation I have ever done (whether it be SVC, XIV, DS8000), I have automated with scripting. I regularly use the concatenatecommand in Excel to build large numbers of commands that I can then run as a script.
So its pleasing to see that all of our products are working towards making the scripters life even easier. For example the XIV has offered a command log in the GUI for some time. I blogged about it here. You simply do a command once in the GUI and then consult the log to find the syntax, making scripting very easy:
With last years release of SVC 6.1 and Storwize V7000, we added this level of smarts to those two products as well. Now every command you run in the GUI will offer you the exact CLI command that was used under-the-covers to do this work. Simply toggle the details tab on the completion panel to see the command (or toggle it back to hide it!).
This weeks announcement of release 6.2 of the SVC and Storwize V7000 firmware, has brought in two more important usability improvements:
And there are more improvements coming, so as always, watch this space....
.... and please... share with me... are you a GUI... or a CLI person? Whats your reasoning behind your choice?
With the announced release of DS8000 6.1 code, IBM has moved its three major storage systems to a common GUI platform. This makes me think of aircraft manufacturers who utilize a common cockpit design. For airlines, this is major drawcard when choosing aircraft models. It cuts down on training costs for your pilots. Except in storage IT, there is a major difference in motivation....
First and foremost, the design of the XIV GUI (that has inspired such dramatic change in IBMs other GUIs), was made possible, not by clever XIV GUI developers (don't get me wrong - they ARE clever), but by a remarkably user-friendly architecture. The XIV GUI is a miracle of ease-of-use for end users, made possible because first and foremost, by design, the XIV made it almost impossible to make it hard.
The good news for Storage administrators, is that unlike a jet aircraft, where a pilot needs to spend hundreds of hours in the cockpit before they are considered potentially competent, the XIV GUI can be picked up in minutes and lends itself very well to casual contact. You don't need to keep using it to stay competent.
The challenge for IBM was take more complex products, which require more user decisions, and make the usage experience just as easy. To add to this, the SVC and DS8000 GUIs were driven by WebSphere. Changing these GUIs would require a complete re-write to employ Java script.
First off the rank was the SVC and Storwize V7000. With the release last year of the SVC 6.1 update, the transformation was nothing less than remarkable. End user experience ruled every decision. The key again is that the user does not need to spend hundred of hours learning this GUI or re-learning it every time they go to perform a configuration task. Everything is in its right place. Its much more than an XIV-like GUI. Its a GUI that took the ease of use experience of the XIV and used that to inspire something just as remarkable.
With the release of the 6.1 update for the DS8000, we complete another fundamental step towards a truly common GUI. The DS8000 GUI has undergone a complete re-write. Essentially it has been rebuilt from the ground up. This highlights something fundamental: It confirms the DS8000 has a very strong roadmap.
As you can see from the image below, the transformation from the old design (to the left) to an ease of use model is complete:
Back from a short break (for Easter and the School Holidays) to three great pieces of news:
Ok... maybe the Royal Wedding has no place in my blog, but the VAAI link is very appreciated.
Two ways to get to the driver:
Remember, your XIV needs to be on 10.2.4a firmware, so you need to be talking to your IBM Service Representative to schedule a concurrent firmware update before you turn the VAAI functions on.
Now if your going, um... what is VAAI and how does it help? Check this blog post out:
If your asking, hey what else will 10.2.4a code bring me?
Last week IBM released Version 2 of the management plug-in for VMware vCenter. The main benefit of Version 1 (the previous release) was that it allowed you to map your datastores to XIV volumes (i.e. which XIV volume equates to which VMware datastore). This was very handy (especially if you were not paying attention as you allocated volumes to your VMware farm), but you still needed the (very easy to use) XIV GUI as well as (obviously) vCenter to manage your landscape end to end.
With the release of Version 2 of the XIV plug-in, we suddenly have the tantalizing possibility that the VMware administrator will not need to talk to their storage administrator or turn to the XIV GUI for day to day operations.
Well Version 2 offers a new and improved graphical user interface (GUI), as well as brand new and powerful management features and capabilities, including:
So from vCenter you can now for instance map yourself some new volumes to create data stores, or re-size existing ones. You can also confirm that each of your datastores is being mirrored.
You can get the plug-in free of charge from here:
There is a users guide here. I urge you to download it and have a read. The Users Guide contains lots of really good examples of how the plug-in can be used with some great screen captures. The release notes are here and also make for very good reading.
I honestly think every VMware installation should be using this plug-in. But I am curious about how it will affect the responsibility divide. If your a one-person shop, the chances are that you love your XIV quite simply because you don't need to administer it. The XIV leaves you free to focus on your VMware farm, rather than fret about hot spots or hot spares or RAID groups. For you, this plug-in just makes your life even easier.
But what about larger companies? Firstly, its important to understand that to perform storage administration, the vCenter plug-in will need an XIV userid that has Storage Admin privileges. Why is this significant? Well what if the team who manage the XIV and the team who manage VMware, are not the same people? What if they are different teams; who maybe have different managers; who may work in different buildings or different cities? What if they work for different companies? Do plug-ins like this one erode the lines and bring these teams together? Or are the functional divides still too strong?
I would love to hear your experiences, both in using the plug-in.... and tearing down the walls.
For someone who blogs so frequently about the IBM XIV, I will let you in on a little pet hate of mine: The XIV uses decimal volume sizes.
The XIV GUI and CLI has the user create volumes using decimal sizing, meaning 1 GB = 1,000,000,000 bytes (1000 to the power of three).
This disparity has a quirky consequence. If the XIV says a volume is 17 GB, the host that uses that volume says it is 16 GiB (which the host often then mis-states as GB). This doesn't mean there is a loss of space, this isn't headroom or formatting - its just a different way of counting bytes. Its not a road block and its easy to understand and work with. But it is a little annoying. (Then again, so is my 32 GB iPhone reporting it has 29.3 GB of space).
The other point is that the IBM SVC, Storwize V7000, DS8000 and DS3000/DS4000/DS5000 families have always used binary sizing (even if their respective interfaces use the term GB as opposed to GiB - yet another pet hate of mine and the Storage Buddhist).
So whats the point of this rant?
The IBM XIV Storage System GUI (Version 3.0) will allow volume creation in both GiB and GB units. The IBM XIV Storage System management GUI version 3.0 will support the creation of volumes in Gigabyte (GB) or in Gibibyte (GiB) or Blocks (where each block is 512 bytes).
So this is a really good change.
The new GUI has not hit the download site yet... but I will be sure to tell you as soon as it has!
*** Update 08/09/2011 - corrected GUI version from 2.5 to 3.0, removed some confusing terms ***
I have some great news regarding VAAI support for XIV.
Let me detail the current situation:
So what should your plan be?
A question I get routinely asked relates to Windows disk partition alignment with XIV. If you don't know what I am talking about, take some time to read these very useful pages from our friends at Microsoft. Once you have had a look, come on back and read my perspective.
Disk Partition Alignment (Sector Alignment): Make the Case: Save Hundreds of Thousands of Dollars
How to Align Exchange I/O with Storage Track Boundaries
Over on my Wordpress blog, I have posted an entry on migrating a Linux RHEL host from EMC to XIV.
If that subject interests you, check out my article here:
The XIV 10.2.4 release notes report performance improvements that are worth investigating. Two of the reported improvements listed are:
I visited a client running 10.2.4 to see if these could be detected in the XIV performance statistics. In this clients case, the upgrade occurred on Feb 14. First up I wanted to show that in the period I am examining there was no major variation in write IO. In other words, before and after the code load, I wanted to confirm the client performed the same level of IO.
Having confirmed that the write IOPS did not vary over the period in question, did the latency change? Here we have some good news. Firstly the latency for Write Hits improved (slightly). A write hit is a write into a 1MB partition that already has some data in cache. It is faster than a write miss because some of the address allocation work has already been done. Write hits and misses both hit cache as I explained here. You can see a change on Feb 14 (when the code was updated):
I then looked at the latency for write misses. Again the latency dropped. This suggests that cache operations in general are being handled faster.
I then started thinking.... are we getting more write cache hits? The answer was YES! This is curious because the client normally does not have much control over where they actual write data to... Clearly the XIV firmware is managing the write cache in a more efficient manner. This is good not only because write hits normally have lower latency than write misses, but also because a write hit can save us destaging a block of data to disk. This is because a write hit could involve over-writing data that had not yet been destaged to disk. So two writes to the same LBA would only result in one write to backend disk.
So in conclusion, the upgrade to 10.2.4 code resulted in a measurable improvement in write IO performance at a real world client. Nice!