IBM SAN Volume Controller (SVC) has offered Fibre Channel Storage Virtualization since June 2003. Two SVC nodes communicate with each other via fibre channel to form a high availability I/O group. They then communicate with the storage that they virtualize via Fibre Channel and with the hosts they serve that virtual storage to, via Fibre Channel. When IBM added real-time (metro mirror) and near real-time (global mirror) replication it was also done using Fibre Channel, with each SVC cluster communicating to the other by connecting using fibre channel protocol transported over dark fibre with or without a WDM or via FCIP (Fibre Channel over IP) routers.
Each Fibre Channel port on an SVC node can be a SCSI initiator to backend storage, a SCSI target to hosts and all the time communicate to its peer nodes using those same ports. With every generation of SVC node, these ports got faster and faster, going from 2 Gbps to 4 Gbps to 8 Gbps. In SVC firmware V5.1 IBM added iSCSI capability to the SVC using the two 1 Gbps ethernet ports in each node. This allowed each node to also be an iSCSI target to LAN attached hosts.
When the Storwize V7000 came out in Oct 2010 it offered all of this capability, plus offered two fundamental changes to the design.
Firstly the two controllers in a Storwize V7000 can communicate with each other across an internal bus, eliminating the need to zone them together (or even attach the Storwize V7000 to Fibre Channel fabrics).
The other more obvious difference is that a Storwize V7000 comes with its own disks, which it communicates with via multi-lane 6 Gbps SAS.
When IBM added 10 Gbps Converged Enhanced Ethernet adapters to the SVC and to the Storwize V7000, these adapters operated as iSCSI Targets, allowing clients to access their volumes via a high-speed iSCSI network. In V6.4 code IBM allowed these adapters to also be used for FCoE (Fibre Channel over Ethernet). These are also effectively SCSI targets ports allowing hosts that use CEE adapters to connect to the SVC or V7000 over a converged network.
If you have a look at the Configuration limits page for SVC and Storwize V7000 version 6.4 (the Storwize V7000 one is here), you will see this interesting comment:
"Partnerships between systems, for Metro Mirror or Global Mirror replication, do not require Fibre Channel SAN connectivity and can be supported using only FCoE if desired"
So does this mean we can stop using FCIP routers to achieve near real-time replication between SVC clusters or Storwize V7000s? The short answer is most likely not. Lets look at why...
The whole reason Fibre Channel became the standard method to interconnect Enterprise Storage to Enterprise hosts is simple: Packet loss is prevented by buffer credit flow control. Frames are not allowed to enter a Fibre Channel network unless there are buffers in the system to hold them. Frames are normally only dropped if there is no destination to accept them. Fibre channel is a highly reliable, scalable and mature architecture. When we extend Fibre Channel over a WAN we do not want to lose this reliable nature, so we use FCIP routers like Brocade 7800s, that continues to ensure frames are reliably delivered in order, from one end point to another.
Converged enhanced ethernet allows Fibre Channel to be transported inside enhanced ethernet frames. The one fundamental that CEE brings to the table is the same principle that a frame should not enter the network without a buffer to hold it. Extending FCoE over distance has the same challenge: the moment you start moving those frames over a WAN connection you need to ensure frames are not lost due to congestion. How do we do this? The same way we did with Fibre Channel: we use Dark Fibre, we use WDMs or we use routers. The same issues and requirements exist.
For more information on FCoE over distance check out this fantastic Q&A from Cisco:
It's a story that has been repeated many times: You buy a shiny new storage system..... and it is beautiful.
Then... a disk fails, which takes just the tiniest bit of shine off.
No problem you declare! You place a service call and the disk is replaced. So far so good.
But then as the vendor service representative is walking out the door, it suddenly occurs to you... hey, that person is taking away the failed disk. Doesn't that disk have my data on it?
The short answer is that unless you have purchased self encrypting drives or are encrypting your data prior to writing it, then that failed drive will almost certainly contain some readable data. How readable will depend on the product. If the disk contains de-duplicated compressed data, it would present a great (but I suppose not insurmountable) challenge to any would be data snooper. But a failed disk removed from a standard RAID array, would contain data in sequential chunks (that are perhaps 256 KB in size). Whether that would be useful is another question.
So what to do?
First up, every responsible vendor takes great pains to ensure failed hard drives are not simply thrown in the dumpster or sold in job lots. As Railcorp in Australia found out the hard way (when they started selling off the media they had in the lost and found department) not controlling media with client data is a very bad idea. Instead responsible vendors usually return failed drives either to the original manufacturer (to get a warranty rebate) or to a reutilization center (either their own, or a third-party). In either case, there is a financial benefit to them to do this. The shipment will be done in a secure fashion and any disk drive that can be repaired will be thoroughly wiped. If not it will be securely destroyed. Again, all the major vendors should be able to produce a policy document explaining how this is done. For the majority of clients out there, I personally think this is good enough.
But what if you don't think this is good enough? What if your data is way too sensitive to take any risks?
Simple answer: Keep the failed disks.
A quick Google search came up with lots of easy to find programs from most major storage vendors. Just search for something like disk retention service (retention is the key word here). Here are some examples:
The only fly in the ointment is that these services are generally not free... and if you realize this only after the first drive has failed, you may find yourself negotiating with your vendor on price, well after the main purchase is complete. The only exception I have found so far is that IBM Australia lets you retain failed drives for free, provided the machine is covered by a Service Pac.
Of course maybe you knew this already and have always retained failed drives, but now your store-room is slowly filling with failed disks. Now what? Well I do not suggest you do this, but I sure laughed while watching it (sorry if there is an advertisement before-hand):
Instead Google search for secure hard disk shredding or secure hard disk recycling. Examples I found in Australia very quickly ( I have not contacted or dealt with either of these) included this one and this one. I am sure there are plenty of choices out there.
A few weeks ago I wrote a piece called How to spot an old IBMer. It was a sort of reminiscence about my early days with IBM but it turned out to be one that really touched a chord with many Big Blue veterans. In fact the response was overwhelming, I have never received more hits or more comments for anything I have written. It was also pleasing that these responses were almost universally positive.
So it's ironic that today I am becoming an ex-IBMer.
Yes it's time for me to move on, so I have decided to try something new. I am joining a really exciting IT startup called Actifio.
So for all of you who have worked with me and helped me over the past 23 years: Thank you. It has been an honor to work at IBM and I wish Big Blue and all who continue to work there, nothing but success and happiness.
So you need to do some disk performance testing? Maybe some benchmarking? What tools are out there to help you out? Well I am glad you asked... here are some that I use on my daily travels:
IOmeter is an old classic, with emphasis on the word old. At time of writing, the most recent update was from 2006. However it remains very popular mainly because it is free and easy to use.
Some tips when using IOmeter:
On Windows, IOmeter needs to be run as an Administrator, which seems to be the most common mistake people make (not running as Administrator means you don't see any drives). You can only run one instance of IOmeter in Windows, which means if multiple users logon to the same server, only one user can run IOmeter. You also really need to run IOmeter with a queue depth ( or number of outstanding I/Os) greater than one, with multiple workers. If you don't, you will not be able to drive the storage to saturation. For instance here are some results running 75% read I/O, 0% random, 4 KB blocks on a Windows 2008 machine with 4 workers. In each case against the same 128 GB volume on a Storwize V7000 backended by 4 x 300 GB SSDs in a RAID10 array. In each case I let the machine run for 10 minutes before taking the screen capture to ensure the performance was steady state and not peaking.
Firstly I used a queue depth of one. Aggregate performance was around 27000 IOPS.
Then I used a queue depth of 10. Aggregate performance was around 81000 IOPS.
I then used a queue depth of 20. Aggregate performance was around 113000 IOPS.
What I am trying to show is that taking the defaults (one worker with a queue depth of 1) will not drive the storage to a useful value for comparison... you need to do some tuning and some experimenting to get valid results. At some point increasing queue depths will not improve performance (it may actually decrease it).
There is an alternative to IOmeter called IOrate (created by an EMC employee). It is also very popular and appears to still be in active development. It is not unusual to see IBM performance whitepapers that used IOrate to generate the workload.
This is a fairly recent tool that I have not had a chance to try out (due to time pressures). The tool uses virtual machines under VMware to generate the I/O and includes some very nice workload capture and playback tools as well as reporting tools.
Jetstress is a benchmarking tool created by Microsoft to simulate Microsoft Exchange workloads. I like the fact you can configure it to run for very long periods and it has a more real world feel about it than just running empty I/Os. You can get the base software here, but you will also need some files from a Microsoft Exchange install DVD (or from an installed instance of Microsoft Exchange). If you cannot get to those files you cannot complete the startup process inside Jeststress.
Oracle offer a tool on their website called Orion, which will simulate the workload of an Oracle database. You can get the tool from here (although you will need to create a free Oracle user account before you can download it).
SDelete is not a benchmarking tool or a performance modelling tool. But it is a great way to generate I/O with very little effort. Just create a new drive in Windows and then run SDelete against it with the -c parameter. This parameter is used for secure deletion, so generates random patterns (which is real traffic - albeit 100% sequential writes). The syntax is like:
(updated April 20, 2012 - I found in version 1.6 of SDelete the meaning of the -z and -c parameters got swapped. In version 1.6 if you want random patterns use -c, if you want zeros use -z. In previous versions it is the other way around!).
Just doing file copies is probably the worst way to generate benchmarks, especially as a single copy is usually a single threaded operation.
I am sure there are plenty of other tools out there to generate benchmarks and simulate workload. My main concern with many of them is that synthetic (artificial) workloads do not reflect real world workloads.
Right now I am working on giving a client a recommended version of firmware for their Cisco MDS Fibre Channel switches. For FICON, the recommendations are easy, but for Open Systems there are so many choices. So what am I going to recommend?
FICON Switches and Directors
For FICON switches, sticking to the FICON (IBM Mainframe Fibre Connection) recommended versions (which are determined by the IBM System z Mainframe team), is a very good strategy. The best place to get these is here (standard IBM logon is required). Just look along the right hand column for the release letters.
The SAN-OS and NX-OS release notes found on the Cisco website also show recommended versions for FICON. For instance have at the look at the FICON recommendations table in the releases notes for version 5.2.2a that you can find here. The upgrade path is just below the table I have linked to. This link will get outdated over time (as newer versions come out), but you can list all the release notes here.
If you are using a IBM TS7700 you should also be aware of this page on the IBM Techdocs site.
So based on current versions, if you are running SAN-OS 3.3.1c or below you need to move to 4.2.7b (as per the non-disruptive upgrade path). I strongly recommend you get to at least version 4.2.7b and start planning to move to release 5.2.2 (provided your hardware supports it).
For open systems attached Fibre Channel switches there are a number of versions to choose from. There are five things to consider:
Being on the very latest version has a small potential risk (of un-discovered bugs). However being on very old versions has a greater implicit risk (of being exposed to KNOWN bugs). Just because you have not hit a bug yet, does not insure you from potential issues, especially if your SAN is growing.
Your hardware. Some older Generation hardware is not supported at higher levels (for example Supervisor-1 cards cannot go past SAN-OS 3.3.5b) but later generation hardware is not supported at lower levels (for example Fabric 3 modules need NX-OS 5.2.2). The Cisco recommended versions page is the best place to confirm this.
End of life. As SAN-OS reached end of development in 2011, 3.3.5b is the best choice for all hardware that cannot upgrade to NX-OS. However be aware that some Cisco Generation 1 hardware (such as 2 Gbps capable hardware) will go end of service in September 2012 (for example Supervisor-1 cards and MDS 9120 switches). Links for this are below. Of course your service provider may choose to offer support beyond the Cisco end of life date, but instead of updating code, maybe you should be updating hardware.
You need to also upgrade your Fabric Manager to at least the same or a higher version than your switches are running. One important thing to be aware of is that from version 5.2, Cisco Fabric Manager has been merged into a new product called Cisco Data Center Network Manager (DCNM).
We just updated our Cisco MDS9509s to NX-OS 4.2.7b (from Cisco SAN-OS 3.3.1c) and now we are getting emails from this source: GOLD-major.
The actual message looks like this:
Time of Event:2012-03-05 15:07:21 GMT+00:00 Message Name:GOLD-major Message Type:diagnostic System Namexxxx Contact Namexxx@xxx.com Contact Emailxx@xxx.com Contact Phone:+61-3-xxxx-xxxx Street Addressx Road, xxxx, VIC, Australia Event Description:RMON_ALERT
WARNING(4) Falling:iso.18.104.22.168.22.214.171.124.1.10.18366464=2401032512 <= 4680000000:135, 4 Event Owner:ifHCOutOctets.fc4/5@w5c260a03c162
So who is GOLD-major?
GOLD actually stands for Generic OnLine Diagnostics. From Cisco's website: GOLD verifies that hardware and internal data paths are operating as designed. Boot-time diagnostics, continuous monitoring, and on-demand and scheduled tests are part of the Cisco GOLD feature set. GOLD allows rapid fault isolation and continuous system monitoring. GOLD was introduced in Cisco NX-OS Release 4.0(1). GOLD is enabled by default and Cisco do not recommend disabling it.
So in our example GOLD is actually reporting a major event (to do with exceeded thresholds, in this example utilisation on interface fc4/5).
Most clients using Cisco MDS switches are now moving to NX-OS (over SAN-OS, the name Cisco used for MDS firmware between version 1 and version 3) so this question will become more common. I am working on a post that discusses recommended versions (and the sunsetting of SAN-OS), so expect something soon. If on the other hand you are thinking.... how do I setup call home on a Cisco MDS switch? The information for NX-OS is here.
Curiously my brain cannot help itself, when I hear Gold Major I think it means Gold Leader which leads me to Red Leader which leads me to Red October. Maybe it's just me? Enjoy:
Because if a product uses a 32 bit counter to record uptime, and that counter records a tick every 10 msec, then that 32-bit counter will overflow after approximately 497.1 days. This is because a 32 bit counter equates to 2^32, which equals 4,294,967,296 ticks. If a tick is counted every 10 msec, we create 8,640,000 ticks per day (100*60*60*24). So after 497.102696 days, the counter will overflow. What happens next depends on good programming: normally the counter just starts again, but worst case a function might stop working or the product might even reboot.
Fortunately we are seeing less and less of these issues but just occasionally one still slips out. Recently IBM released details of a 994 day reboot bug in the ESM code of some of their older disk enclosures (EXP100, EXP700 and EXP710). Details about this bug can be found here. What I find interesting is the number of days it takes to occur, since 994 is actually 497 times two. This suggests that this product records a tick every 20 msec. This meant we got past 497 days without an issue but hit a problem after exactly double that number. So if you still have these older storage enclosures, you will need to reboot the ESMs (after checking the alert).
I googled 497 to see what images that number brings up and was amazed to find the M-497 jet powered train. More details on this rather interesting attempt at speeding up the commute home can be found here and here. It adds a whole new meaning to keeping behind the yellow line.
Many years ago I picked up a book that literally blew my mind. It was the Cuckoo's Egg by Clifford Stoll and it's a genuine classic, a true tale of hackers and how one was tracked down in the very early days of the internet.
Now the story is about events in 1986, so it captures the state of technology at the time (which rather dates the book), but wow, what a great story.
So why mention the book? Well apart from the fact that it is well worth a read, the key issue that Clifford saw again and again was default passwords. The hacker would identify a target and then try to logon using default IDs and default passwords, usually with great success.
Now I have blogged in the past about the determined (but often ignored) way that Brocade switches berate you into changing default passwords. But pretty well all products need to do this, as they all have the same issue (and a truly problematic counter-point). You absolutely need to do two things with every product in your data center:
Change the default passwords on every device you deploy.
Record what those passwords got set to (preferably using a logical or physical password safe).
Now don't laugh, but forgotten/lost passwords on data center kit (like switches) is a VERY common problem. When I worked in the IBM Storage Support team I took calls EVERY WEEK from clients who had devices they could not logon to, for all manner of reasons. For some, supplying them with the default passwords saved them (and condemned their employer?), but for others they needed much more detailed assistance.
My preferred solution to this challenge is to use external authentication (like LDAP) but being able to reset passwords with an external tool is also a nice option to have available.
The reason I started thinking about this is a nice tool IBM offer for the Storwize V7000 called the Initialization Tool that you can download from here. Using this tool you can reset the password of the Superuser ID on a Storwize V7000 back to the default (passw0rd). The tool runs on a USB key. After requesting the tool to help you to reset the superuser password, you insert the USB key into the Storwize V7000, wait for the orange indicator light on the relevant node canister to stop blinking and the task is complete. Then put the USB key back into your laptop and run the init tool again to get a completion report that should look like this:
This is great to rescue customers who have lost their passwords, but the question then gets raised: Can I block this?
My first response is: if you are concerned about unauthorized people with malicious intent placing USB keys into your Storwize V7000, then don't let them into your computer room (presuming you can spot them by the colour of the hat they are wearing). If that is not an option, lock the rack that the Storwize V7000 resides in (change control does have its benefits). If that is not an option, there is one more alternative, but it is a tad extreme.
What we can do is prevent password reset via USB key (or in the case of the SVC, via the front panel). We do this by issuing the following CLI command: setpwdreset -disable
In the following example, I confirm that password reset is possible (value 1), I then disable it and confirm that password reset is no longer possible (value 0). If curious I could then get some help on that command:
Only if your paranoia is matched by your attention to detail.
My reason to hesitate recommending it is simple: If you prevent password reset and then forget your password (and have no other local Security Administrator accounts), you have locked the door and thrown away the key. Far better to physically lock the rack.
In the end though, your company needs to set a policy that is actively enforced (with no exceptions). So get to it.
When IBM brought out the SAN Volume Controller (SVC) in 2003, the goal was clear: support as many storage vendors and products as possible. Since then IBM has put a huge ongoing effort into interoperation testing, which has allowed them to continue expanding the SVC support matrix, making it one of the most comprehensive in the industry. When the Storwize V7000 was released in 2010 it was able to leverage that testing heritage, allowing it to have an amazingly deep interoperation matrix on launch date. It almost felt like cheating.
However I recently got challenged on this with a simple question: Where is the VNX? If you check out the Supported Hardware list for SVC V6.3 or Storwize V7000 V6.3 you can find the Clariion up to a CX4-960, but no VNX.
The short answer is that while the VNX is not listed there yet, IBM are actively supporting customers using VNX virtualized behind SVC and Storwize V7000. If you have a VNX 5100, 5300, 5500, 5700 or 7500 then ask your IBM pre-sales Technical Support to open an Interoperation Support Request. The majority are being approved very quickly. The official support sites that I referenced above will be updated soon (but don't wait, if you need support, ask for it). IBM is working methodically with EMC to be certain that when a general publication of support is released for VNX (soon!), both companies will agree with the published details.
And for the wags who think that this is a ringing encouragement to buy VNX, you would be missing the point. You cannot be a serious Storage Virtualization vendor if you are not willing to support your clients purchasing decisions, regardless of which vendor they buy their storage from. IBM have been staying that course and demonstrating that willingness since 2003. It's a pretty good track record and one that they are determined to maintain.
Its been a long time coming, but I finally joined the cult of Mac in the form of a new MacBook Pro. Having not used an Apple Mac for over 15 years, I must say I am truly loving what they have done with the operating system and the hardware (my last Mac was a Mac SE bought in 1990).
Now this post is not a rant from a new convert to everything Apple. In fact my main gripe is that what you rapidly discover when you move to Mac OS is that not every piece of software is going to work in your new world. While Lotus Notes and Sametime have very nice Mac versions, my day to day work involves IBM Storage and there are several tools that I need that are Windows only. These include Capacity and Disk Magic (used to size solutions) and eConfig (used to order IBM products). This means for certain applications I need to use a Hypervisor (such as VMware Fusion or Parallels).
But what about managing IBM Storage? Well I have some good news on that front:
SAN Volume Controller and Storwize V7000: Because these products are managed from a web page, they are operating system agnostic. To be clear, officially only Firefox 3.5 (and above) and IE 7.0 and 8.0 are supported (support details are right at the bottom of this page while setup details are here). Since IE is no longer available for Mac, you should install Firefox (or try out Safari or Chrome, I have tried all three without issue).
XIV: The XIV GUI is available in a native Mac OS version from here. The release notes state that the XIV GUI works on Mac OS X 10.6 but I am happily using it on Mac OS X 10.7 (Lion). The Mac OS X installation process is simply beautiful (just drag and drop, one of the truly nice features of Mac OS X) and of course it works just as nicely on Mac as it does on the other supported operating systems.
Drag and drop done right.
Attaching OS X to IBM Storage
Of course maybe you want to attach your Mac OS X box to IBM Storage. If you visit the SSIC you will find IBM supports OS X on pretty well it's entire range including SVC, Storwize V7000, XIV, DS3500 and DCS3700. Mainly these use the ATTO HBA and multipath device driver. If your particular setup is not there, get your IBM Pre-sales Support to open a support request, depending on your request, approvals are normally very fast.
Of course I have to mention the iPhone and iPad. IBM have the XIV Mobile Dashboard for both devices, which I previously blogged about here (iPad) and here (iPhone). These are really elegant apps that even have a cool YouTube video.
Of course now I want all the goodies promised in Mountain Lion. With the convergence of OS X and iOS, I would love to see even more converged tools. A man can dream....
There is a demo mode, but right now there is no tick box to activate it. Simply use the word demo in all three fields at login. In other words:
IP address: demo User ID: demo Password: demo
2) Retina display requirement
The Mobile Dashboard was written for the Retina display (that comes with an iPhone 4 or iPhone 4S). This sadly means that the iPhone 3GS and earlier will not be able to use the new Mobile Dashboard. This wasn't done as part of some devious plan on IBM's part to force you to buy a new iPhone, the developers simply needed the better resolution to draw those graphs and provide the richest and clearest display of information on a single page (you will believe it when you see it, the detail is quite stunning).
The Apple Store clearly states the hardware and iOS requirements on the download page, however you can still try to install it on an iPhone 3GS. Curiously what you get is this rather bizarre message:
The reason you get this message is simple: There is no way to specify when uploading a new app to Apple that you need the Retina display. So instead the developer needs to specify a feature that is not found on earlier iPhones, such as a camera flash. So it is not that the XIV Mobile Dashboard needs a flash in your camera, it is simply a quirk of the Apple store.
And for those of you who are using Android devices, your calls are being heard. Watch this space for developments in that direction.
Here are two common statement I often hear from clients:
I don't just want SAS drives, I also want SATA drives. SATA drives are cheaper than SAS drives.
Nearline SAS drives are just SATA drives with some sort of converter on them.
So is this right? Is this the actual situation?
First up, if your storage uses a SAS based controller with a SAS backplane, then normally you can plug SAS drives into that enclosure, or you can plug SATA drives into that enclosure. This is great because when you plug SATA drives into a SAS backplane, you can actually send SCSI commands to the drive plus you can send native SATA commands t00 (which is handy when you are writing software for RAID array drivers).
But (and this is a big but) what we do know is that equivalent (size and RPM) SAS drives perform better than SATA drives. This is because:
SAS is full-duplex, SATA is simplex.
SAS uses the native SCSI command set which has more functionality (which leads to the next point).
A SAS drive uses SCSI error checking and reporting which is much more robust than the SATA error reporting. This allows your storage system to collect richer information from the drive if errors are occurring (such as a failing or marginal disk).
SAS drives are dual ported which is vital in dual controller enclosures.
So given a choice (and a very small price differential), why choose SATA over SAS? SAS is the clear winner. What we should instead differentiate on is speed (7.2K RPM vs 10K RPM vs 15K RPM vs SSD) and size (2.5" vs 3.5" form factor).
Which leads us to Nearline SAS
It is a common belief, that if you buy a Nearline SAS (or NL-SAS) drive it is really a SATA drive with a SAS connector (interposer) stuck on it. But this is confusion from the past.
What led to the confusion?
Most midrange and enterprise storage controllers and enclosures up until recent years, used disks that had fibre channel interfaces on them. We plugged those disks into fibre channel enclosures. Examples include the DS4700 or the DS8100. And yet these devices also offered SATA drives. How did they do this?
They took a SATA drive and added a SATA to Fibre Channel converter card to the disk. We call this extra piece of hardware an interposer or bridge card. So people start assuming that this is common practice in every product. In fact we are now seeing SAS drives being put into fibre channel disk enclosures by using a SAS to fibre channel interposer.
There are in indeed older products that did take a SATA drive and add a SATA to SAS interposer to achieve a similar thing. But that really is not necessary any more. The reason? The same hard drive can now be ordered from the factory as either a SAS drive or a SATA drive.
Seagate have a nice selector tool to let you see all their possible combinations. For instance you can order a 2 TB drive with a 6 Gbps SAS interface, which is a model ST32000444SS:
Or you can order a 2 TB drive with a 6 Gbps SATA interface, which is a model ST2000NM0011:
So what you get is very similar drive hardware (same spindles, heads, motors) but with different adapter hardware, built with the desired adapter at manufacture time. Meaning that if we install this drive into a SAS enclosure, there is no need to add an interposer or bridge card to the drive after you bought it.
This leads to the next question:
OK. So this is good, so Nearline SAS drives are MADE as SAS drives. Does that mean a drive manufactured with a SAS adapter is a SAS drive or a Nearline SAS drive?
Now we are mixing up two different things. SAS as a standard is a combination of a connection technology (the Serial Attached part) and a command set (the SCSI part). Actually SCSI as a standard also defines both connection methods and command sets. So SAS is really talking about how we connect to the disk and what command set we use to control the disk.
Nearline on the other hand is a statement about the disks rotational speed and it's mean time between failure (MBTF). A Nearline-SAS drive is Nearline because:
It rotates slower (7200 RPM) than the higher specified Enterprise drives (that spin at 10 K or 15K RPM). Because they are slower they can also hold way more data.
It has a lower MBTF (1.2 million hours) than the higher specified Enterprise drives (which are normally specified at 1.6 million hours).
So we have now gone full circle. A Nearline-SAS drive can use the same physical disk hardware as a SATA drive, but with a superior adapter that uses a superior command set, built onto the drive at manufacture time.
Still confused or want to read some more? Check out these links:
I got a question about Veritas DMP and XIV, so I thought I would write a quick post with some details on the subject.
A fundamental requirement for a host attached to a fibre channel SAN, is the use of multi-pathing software. One alternative to achieve this (that IBM support for most operating systems attaching to XIV) is Symantec Dynamic Multi Pathing (DMP). A nice way to find out whether this is the case for your particular operating system is to head to the SSIC, choose Enterprise Disk → XIV Storage System → Your product version and then Export the Selected Product Version to get a spreadsheet of every supported environment. Now under the multi-path heading of each page you will see what choices are supported.
It works with heterogeneous storage and server platforms (so you could have EMC and IBM attached to the same server at the same time).
You can centrally manage all storage paths from one central management GUI.
Then the question becomes, if I choose to go down the DMP route, do we still need the XIV Host Attachment Kit (HAK)?
The answer is a definite yes!
Veritas DMP and Solaris
If you're using DMP with Solaris, when you run XIV HAK wizard, it will scan for existing dynamic multi-pathing solutions. Valid solutions for the Solaris operating system are Solaris Multiplexed I/O (MPxIO) or Veritas Dynamic-Multipathing (VxDMP). If VxDMP is already installed and configured on the host, it is preferred over MPxIO.
Veritas DMP and Windows
For a Windows host the important point is that Veritas Storage Foundation Dynamic Multipathing (DMP) does not rely on the native multipath I/O (MPIO) capabilities of the Windows Server operating system. Instead, it provides its own custom multipath I/O solution. Because these two solutions cannot co‐exist on the same host, perform the following procedure if you intend to use the Veritas solution:
Install the Veritas Storage Foundation package (if it is not already installed).
Restart the host.
Install the IBM XIV Host Attachment Kit (or run the portable version).
The HAK will perform whatever system changes it detects are necessary while still allowing DMP to perform the multipathing. This may require a reboot (to install Windows hot fixes).
As I said, the HAK will ensure that the required hot fixes are present. These hot fixes are fairly important. To understand what tasks the HAK will want to perform WITHOUT performing them, use the portable HAK and run:
This will tell you what tasks will be undertaken when you run the command without the -i parameter. I detailed this behaviour here.
One benefit of the HAK is the wonderful xiv_devlist command. Even if you are using DMP, the xiv_devlist command will still work, although you may need to specify veritas as per this example:
xiv_devlist -m veritas
Need more documentation?
This is all documented in the XIV Host Attachment Users Guide which you can find here .
In that post I detailed how the XIV began as the Nextra, was then released as the IBM XIV and then updated to the XIV Gen3. So this means last year we saw release 3.0 of the XIV.
At the risk of getting over excited, some of the achievements of the IBM XIV have been truly remarkable:
There are 59 Clients with more than 1 PB each of usable XIV capacity
There are 16 Clients with more than 2 PB each of usable XIV capacity
I am sure some competitors will find larger numbers to try and drown out this achievement, but the point is this: These are FANTASTIC numbers. It shows that despite all the FUD, the XIV is a success for IBM and a success for IBM's customers.
So at the time of the Gen3 release, IBM made no secret of the fact that they planned to add the option of SSD as a read cache layer. In fact each and every Gen3 shipped so far has the mounting and attachment hardware needed to support those SSDs.
Now with release 3.1 IBM turns that promise into a reality.
So... to answer some possible questions:
How can I get some of this SSD goodness?
Order the feature! For existing machines, IBM will need to update the firmware of your XIV Gen3 (non-disruptively) to add SSD support. There will also be an updated version of the XIV GUI. Once those are in place, an IBM Service Representative will add an SSD to each interface module. All of this will be completed without interruption to your operations.
How much read cache will I get?
Each XIV Gen3 Module already has 24 GB of server RAM. Since an XIV can vary from 6 to 15 modules (based on capacity), that gives you between 144 GB and 360 GB of server RAM to provide read and write cache. If you add the SSD option you will get a 400 GB SSD per module. This means we get between 2.4 TB to 6 TB of additional read cache (depending on module count). The SSDs are not used as write cache.
What administration will I need to perform?
How about none? This is XIV: it's all about making it simple. It's no surprise that practically every IBM Storage device now uses the XIV GUI. These guys wrote the book on making things easy to use.
But seriously, no administration? Well... there are two things you may want to do:
Check how many SSD based read hits you are getting (versus memory based read hits). It's always nice to see just how effective these SSDs are proving themselves to be.
Turn SSD read caching off or on at a per volume level (by default it is on for all volumes). I don't anticipate many clients will need or want to do this, but the option is there and it is very easy to do.
Won't these SSDs wear out or slow down over time?
These are the two great fears of SSD... and XIV development has combined their art with some great work from IBM Research to make sure this is not an issue. The way data is written out to the SSD is handled in a very sophisticated manner. The end result will be consistent and predictable performance with a very long operational life. I will give you more details about exactly how this is done in a future post.
What happens if one of these SSD fails?
Because the SSD is not used as write cache, no data can be lost. Data in memory cache is de-staged by that module to both SAS disk and asynchronously to SSD (although not all data will necessarily go to SSD). So there are no bottlenecks and there is no risk. The other modules will keep using their SSDs and IBM will replace the failed SSD non-disruptively.
What sort of performance improvement will I see?
Depending on application and data patterns you should see your IOPS more than double. A three times improvement is quite possible. Response times could drop by more than two thirds. In many ways these are obvious results.
IBM intend to demonstrate using industry standard benchmarks what the performance of an XIV Gen3 with SSD will be. I can tell you these numbers are going to be very impressive. Watch this space.
Is that it? Any thing else in this release?
Release 3.1 also adds:
The ability to mirror between Generation 2 and Gen3 XIVs.
All the base support for IPv6 is now in place (although there are still some certification tests to complete)
Improvements to system thresholds (such as maximum pool size)
GUI enhancements (mainly to add panels for the SSD cache)
A new iPhone app (in addition to the existing iPad app)
If you interested in the current state of play with XIV, there are a huge number of new resources that have been created or updated as part of the XIV 3.1 update, so I thought I would give you a list. If you are a customer then please scan down to see if there is anything here that interests you. If you an IBMer or IBM Business Partner (or IBM competitor!), this is all mandatory reading. Either way, check out the new YouTube video, it is very cool.
As promised here is the new video on YouTube that shows the new XIV iPhone App!
I just checked the Apple App Store and cannot see the application yet (only the iPad version). I will update you the moment the iPhone version becomes available for download (and yes it will have a demo mode).
For more XIV related materials (white papers, demos, videos, case studies...), I invite you to pop over to the XIV area of the ibm web site: ibm.com/storage/disk/xiv. You'll find links to materials throughout, such as the SPC report and ISV white papers; click on the Resources tab for a consolidated list of the most recent materials.