This blog recently learned that it was going to have to find a new home at some point in the reasonably near future. It could be 6 months, it could be a year. This blog had recently contemplated changing locations but decided where it is now was comfortable and just getting broken in. However, with the news that it will be asked to move, it decided to move before it collects any more blog posts. Moving now makes it easier since there is less to move.
Construction workers are hard at work on the new location, and in the next few days it will be moving. Once the new site is ready, new posts will be put there. Current posts will be eventually copied to the new blog but also left active here.
Modified on by DavidGreen
IBM Tech U Atlanta 2019
This is just a quick post to say that I will be at IBM Technical University in Atlanta. I will be there from April 29 through May 3
My sessions for this event are:
- s106417 The Path of an FC Frame Through a Cisco MDS Director
- s106420 Proactive Monitoring of a Cisco Fabric
- s106421 Troubleshooting Cisco SAN Performance Issues - Part 1
- s106422 Troubleshooting Cisco SAN Performance Issues - Part 2
Find these sessions and many more at Technical University. Click the banner at the top of the page to register or for more information.
As always, if you have any questions, leave them in the comments at the end of this blog or find me on LinkedIn or Twitter.
Modified on by DavidGreen
Why Low I/O Rates Can Result in High Response Times for Reads and Writes
As IBM Storage Insights and Storage Insights Pro become more widely adopted, many companies who weren't doing performance monitoring previously are now able to see the performance of their managed storage systems. With the Alerting features on Storage Insights Pro, companies are much more aware of performance problems within their storage networks. One common question that comes up is why a volume with low I/O rates can have very high response times. Often these high response times are present even with no obvious performace impact at the application layer.
These response time spikes generally measured in the 10s or 100s of milliseconds, but can be a second or greater. At the same time, the I/O rates are low - perhaps 10 I/Os per second or less. This can occur on either read or write I/Os. As an example, this picture shows a typical pattern of generally low I/O rates with a high response time. The volume in question is a volume used for backups, so the volume is generally only written to during backups. The blue line is the I/O rate - in this case the write I/O rate, but the same situation can happen with reads from idle volumes. The orange line is the response time. You can see a pattern of generally low I/O rates overall to the volume and that the write response time spikes up when the I/O rate goes up. It is easy to see why if you were using Storage Insights Pro to alert on response time, you might be concerned about a response time greater than 35 ms.
This situation happens because of the way storage systems manage internal cache. This is generally true for all (IBM or non-IBM) storage subsystems and storage virtualization engines (VE). If a volume has low I/O rates then the volume is idle or nearly idle for extended periods of time. The volume can be idle for a minute or more. The storage device or VE will then flush the cache of that volume. It does this to to free up cache space for other volumes which are actively reading or writing. The first I/O that arrives after an idle period for the volume requires re-initialization of the cache. For storage systems and VEs with redundant controllers or nodes, this also requires that the cache is synchronized across the nodes or controllers of the storage subsystem. All of this takes time. The processes are often expensive in performance terms and the first I/O after an idle period can have significant delays. Additionally for write I/O, the volume may operate in write-through mode until the cache has been fully synchronize. In write-through mode the data is written to cach and disk at the same time. This can cayse further slowdowns because each write will be reported as complete only after the update has been written to the back-end disk(s). After the cache is synchronized, each write will be reported as complete after the update has been written to cache. This is a much faster process. You can see how, depending on the caching scheme of the storage subsystem, you would see a pattern of idle or almost idle volumes having extremely high response times. Unless you are seeing applications be impacted this is generally not a concern.
Response time spikes can also occur with large transfer sizes. This picture shows response time for the same volume as in the previous picture, except as it relates to transfer size. In this case it is the size of the write. As in the above picture the orange line is the response time, the blue line is the transfer size. You can see that the transfer size is large - almost 500 KB per I/O. The volume for this performance data is not compressed. If it were compressed there could be additional delays depending on the compression engine used in the storage. Barry Whyte gives an excellent writeup of Data Reduction Pools here that details how DRP gives better performance than IBM RACE or other comppression technologies.
If you have any questions, leave them in the comments at the end of this blog or find me on LinkedIn or on Twitter.
Modified on by DavidGreen
IBM Storage Insights Pro Groups Feature
In addition to the Reporting and Alerting features that are not available in the free version of IBM Storage Insights, the subscription based offering, IBM Storage Insights Pro, has a very useful feature called Groups. Groups allow you to group or bundle together related storage resources for ease of management. For example, you might group a set of volumes together that are all related to a specific application, or server cluster. You might group the hosts that make up a cluster into a group. You can even group ports together - you could define a group of ports for an SVC Cluster that includes all the ports used for inter-node communication. Such a group would be very handy for your IBM Storage Support person. It would potentially save support from having to collect a support log and dig through it. It would certainly make analysis go faster when troubleshooting an issue.
So to get started with Groups, the first thing to do is to see if you have any Groups defined. They are listed, not surprisingly, under the Groups-> General Groups and General Groups is the last entry on the Groups menu.
This is the type of group that will be used in this blog post.
If you open the Groups menu, you can see any groups that have been defined. For this IBM Storage Insights Pro instance, there are currently no groups defined.
You can add any storage resources you want to a group.The process for creating a group is to add a resource to a group and select the option to create a new group.
Adding A Resource To A Group
To add a resource to a group, in IBM Storage Insights Pro, browse to a resource, right-click the resource then select Add to General Group.
In this example we are adding a volume to a group. This can be done from either the listing of Block Storage Systems where you can add a storage system to a group, or it can be done from any of the internal resources in storage when you are looking at the resources in the Storage Properties view. You can select multiple resources and add them all to a group at one time. So you might list your volumes, filter on a volume name and then select all those that are filtered and add them to a group. To add the hosts from a cluster you might do the same in the host list.
You can mix resource types in a group. So for example, you might have all the storage systems, volumes and hosts for an application cluster in a group. This makes it easier to see the resources associated with that particular application - for both you and IBM Storage Support. This way, when you call IBM for a performance problem on an application, they can just look at the appropriate group to identify all the resources that are of interest for that application.
The next page that appears after you click Add to General Group is the Add to Group page.
You can either create a new group or add the selected resources to existing groups. It is possible for a resource to be in more than one group.
Adding Resources To A New Group
In this example, you create a group for storage resources related to the payroll database. You can a volume to it, but you could (and should) add hosts and other storage systems to the group. In this way, all the resources used for our payroll application are in a single group and easily identifiable.
Adding A Resource To An Existing Group
The following image shows the Group selection, if you opted to add a resource to an existing group. In this example, you only have one group defined, but if you had multiple groups, you could add the selected resources to multiple groups just by selecting all the groups that you want to add the resources to.
Viewing The Resources In A Group
The group listing looks much like the internal resources view of a storage system.
The last feature for Groups are Sub-groups. Sub-groups are exactly what they sound like - a way to further define relationships between resources in a group. In the following example, you can see the sub-groups for our Payroll group.
You might have hosts, volumes or other resources dedicated to different aspects of the processing Payroll group. In this example, there is a subgroup dedicated to Pensions and another dedicated to Taxes.These sub-groups will appear in the listing of groups if you want to assign resources to them. Like all other resources, sub-groups can belong to multiple groups.
If you have any questions, leave them in the comments at the end of this blog or find me on LinkedIn or on Twitter.
Modified on by DavidGreen
Troubleshooting CRC Errors On Storage Networks
Before I say anything else, I will say that there is no "Easy Button" for troubleshooting CRC errors. It is an iterative process. You make a change, you monitor your fabric, and if necessary you make more changes until the issues are resolved. I frequently have customers who want it to be a one step process. It can be, but usually takes multiple steps. Having said that, before we can fix them, we need to know what CRC errors are and why they occur.
What Are CRC Errors? When Do they Occur?
The simple answer is that CRC errors are damaged frames. The more complicated answer is that before a fibre-channel frame is sent, some math is done. The answer is added to the frame footer. When the receiver gets the frame, the receiver repeats the math. If the receiver gets a different answer then what's recorded in the frame, then the frame was changed in flight. This is a CRC error. The only time these happen is if the physical plant - cabling, SFPs is somehow defective. It is much less common, but still possible to have a bad component in the switch. Troubleshooting that will be a separate blog post, some day. What the receiver does with the damaged frame depends on whether it's a switch or end device and if it is a switch, what brand of switch.
Why Does Fixing These Matter?
At best, the effect of faulty links is a few dropped frames. Left unchecked the problem will get worse and eventually cause performance problems. Also, you will go from 1 or 2 bad links to many. A customer I have been working with for the last several months was in this situation and is finally finishing a very long process of cleaning up many faulty links. Years ago I had customer that was experiencing extremely long delays on their Brocade fabric. They had over-redundancy (there is such a thing) on the switches and links between the hosts and the storage. Many of the links were questionable and producing CRC errors. When the storage received a bad frame, it simply dropped it and did not send an ABTS. They also had an adapter in the host with a bug in it, and it would simply sit and wait for the storage to respond. 90 or so seconds later, the application would time out and initiate recovery for a problem that should never have happened.
Why Does It Matter What Brand Of Switch It Is?
First, the different brands of switches use different commands to obtain the data you need to troubleshoot these problems. Secondly the way that they check and forward frames is different. This requires a different technique depending on the brand of switch. Cisco switches are what is called store-and-forward. This means that they wait for the entire frame to be received, then they check it, then if the frame is valid it gets forwarded. If not, it is dropped. Brocade switches are cut-through. As soon as they receive enough of the frame to know where it's going, they start forwarding it. If the frame ends up being bad, they try to correct it using Foward Error Correction. If that doesn't work the frame is tagged as bad. For the most part, end devices that receive frames that are already tagged as bad simply drop the frame and initiate recovery via ABTS. Troubleshooting commands and techniques vary for Brocade vs Cisco fabrics.
Identifying CRCs on Cisco Fabrics
Since Cisco fabrics are store-and-forward, you know that frames with CRC errors will be dropped as soon as they are detected. This can be either at the switch port they arrive on, or more rarely inside the switch. This post will focus on the CRC errors detected at the switch ports. If you suspect that you have questionable links, you can use these commands to check switch ports for CRC errors:
- 'show interface'
- 'show interface counters'
- 'show logging log
For the above - the 'show interface' and 'show interface counters' commands can be run specifying a switch port that you are interested in. this is done in the format of fcS/P where S is the slot and P is the port. For the 'show logging log' you are looking for messages that a port was disabled because the bit error rate was too high. This is often an indicator of a faulty link. Once you find the ports that are detecting the CRC errors, you can then proceed to the repair phase.
Identifying CRCs on Brocade Fabrics
Brocade fabrics use cut-through routing. As such, the link for the port that is detecting the CRC errors may not be the faulty link. Brocade has two statistics for CRCs: CRC and CRC_Good_EOF. If the CRC_Good_EOF counter is increasing, this means that the link it is increasing on is the source of the problem. If the CRC counter is increasing, then the frame has already been marked as bad, and the problem is occuring elsewhere on the SAN. The CRC_Good_EOF should be the only counter that increases on a device port. If the CRC_Good_EOF counter is increasing on an ISL port, the link between the sending and receiving switch is bad. If you the CRC counter is increasing on the ISL, this means the problem is occuring somewhere on the sending switch. So move to the sending switch and look for ports where CRC_Good_EOF is increasing. It is possible that both counters will increase on a link. If it is a device port, then the link is bad. If it is an ISL then the link itself is a problem, and the sending switch has other bad links attached to it. As you can see there are a few more steps to identify the source of the CRC errors on Brocade before you can proceed to the repair phase. The porterrshow may also show ports that do not have CRC_Good_EOF increasing, but do show a counter called PCS increasing. If so, this is also and indication of a bad link. Troubleshooting PCS errors is the same as troubleshooting CRC_Good_EOF errors.
- 'portstatsshow N'
The porterrshow command will display error stats for all ports. The portstatsshow N where N is a port index number will display more detailed stats for the specified port. If you see PCS errors increasing for a port in the porterrshow, the link oin that port is bad, regardless of what CRC or CRC_Good_EOF counters there are.
Correcting the Problem
Once you have identified the port(s) that have questionable links you need to correct the problem. As I mentioned earlier, this is an iterative process. You replace a part, then clear the switch statistics, then monitor for anywhere from several hours to a day, depending on the rate of increase. Repeat the process until the errors are no longer increasing. You can replace multiple parts at once - such as replacing a cable and an SFP at the same time. Another option is to isolate further by just swapping a cable, or moving the device to a new port on the switch. Just remember that it is critical to reset the statistics immediately after any change you make. REMEMBER THAT PATCH PANELS ARE PART OF CABLING. I emphasize that because customers will often replace the cable between the switch/device and the panel and forget that there is cabling between patch panels which is also suspect. Some years ago I went onside to to troubleshoot connectivity between two storage systems. The storage systems were located at different campuses in the same city. The replication paths would not stay up. When I got there, the client had them direct connected through several patch panels with no switching. I assisted them in putting the cabling through switches at each campus and immediately saw CRCs showing up on the links. They had 8 hops across patch panels between the storage systems. We found CRCs at the second hop at each side. I stopped checking after that. Their eventual permanent fix was to run a new direct run of cable between the two locations
If you have any questions, leave them in the comments or find me on LinkedIn or on Twitter.
Modified on by DavidGreen
Answers To Some Frequently Asked Questions about IBM Storage Insights
Over the last several months I have seen some common questions that are asked about IBM Storage Insights. I started collecting them and will answer them here. If you have any other questions, leave them in the comments or find me on LinkedIn or on Twitter. These questions are all about Storage Insights itself. Questions relating to managing specific types of storage with Storage Insights will be answered in future Blog posts. So, on to the questions.....
Q: Can I install a data collector in the same system as my IBM Spectrum Control server
A: Yes. However you need to pay attention to memory and CPU usage of you
Data Collector Authentication
If you use usernname/password authentication configure a dedicated user ID for the Data Collector on your storage systems. do not use the default or other Admin account. This allows for effective auditing and reduces security risks.
Q: What Are The Recommended System Specifications for the Data Collector?
A: The minimum specifications are 1 GB HDD space and 1 GB RAM available on the system you install it on. The recommended specifications 1 GB of HDD space, 1 CPU and 4 GB of memory. For a data collector in a virtual machine, add these specs to whatever the operating system requires
Q: Does Storage Insights Support Multi-Tenancy?
A: There is currently no support for multi-tenancy. This means that if you are managing storage from multiple datacenters, everyone with access to your Storage Insights instance will be able to see all storage. A suggestion is to edit the properties of the storage and fill out the location. You can then create a custom dashboard for each location. Setting the location property also helps IBM Storage Support know where storage is located. This assists with troubleshooting.
Q: Does The Data Collector Need To Be Backed Up? What about Redundancy?
A: Install at least two data collectors per instance for redundancy. Install at least two in each location if you are managing storage across multiple data centers. You do not need to back up the data collector. It does not store any collected data locally. All data collected is streamed to the cloud, and the data collector is always available for download if it needs to be re-installed. Downloading it also ensures that you alwasy get the latest. If you are using a virtual machine, you may want to back up the VM image, but this is to make it easier to re-deploy if there is an issue with the VM.
Q: What About Firewalls
A: You need to open port 443 on your firewall, this is the default HTTPS port to allow the Data Collector to communicate with the cloud service. This only needs to be for outbound traffic. IBM will never send anything down to the Data Collector. If there is a firewall between the Data Collector and the storage it is managing the firewall should be configured to pass SNMP traffic. Lastly ensure that data collector is in the VLAN used for SAN Switch and storage management, or that VLAN routing is configured to allow the data collector across VLANs
Q: You Just Said IBM Never Sends Anything To the Data Collector? How Does It Know What To Do?
A: The Data Collector is constantly checking a queue on the cloud for jobs to to do, such as a support log collection. This ensures that communication is only one-way (the data collector pushes data up to the cloud).
Q: I Have A Proxy Server. How Do I Configure The Data Collector for a Proxy Server?
A: During the installation of the Data Collector, it will ask for your proxy server configuration. The proxy server itself should not need any additional configuration.
Q: Can I Control Whether IBM Storage Support Can Collect Support Logs?
A: Yes. Instructions are here.
Some considerations when setting permissions:
- If this is turned off, IBM Storage Support will not be able to collect logs as they need them potentially delaying problem resolution
- If this is allowed, You are granting IBM Storage Support permission to collect support logs as-needed for troubleshooting without requesting permission each time.
- This is a simple toggle that can be turned on and off as often as you wish
- When you are doing maintenance on a storage system it is recommended that you turn this off for the duration of the maintenance
Q: I Want To Configure a Performance Alert. What Are Some Suggested Values for Thresholds?
A: Performance monitoring thresholds are different for every environment. Use historical performance data to guide alerting decisions for response time and other thresholds. For new Storage Insights instances, it is recommended to wait until you have two weeks of performance data before configuring any alerts
Q: I'm a Partner and my client has given me permission to monitor his Storage Insights free dashboard. Can I get SI Pro capabilities while he stays on free?*
A: No. You cannot see the Pro capabilities. You see exactly what your customer sees.
Modified on by DavidGreen
This week I started working on a new case for a customer. I'm trying to diagnose repeated error messages being logged by an IBM SVC Cluster that indicate problems communicating with the back-end storage that is being virtualized by the SVC. These messages generally indicate SAN congestion problems. The customer has Cisco MDS 9513 switches installed. They're older switches but not all that uncommon. What is uncommon is finding the switches at NX-OS version 5.X.X. I see downlevel firmware but this one is particularly egregious. This revision is several years out of date. Later versions of code contain numerous bug fixes both from Cisco and for the associated upstream Linux security updates that get incorporated into NX-OS. Also, while NX-OS versions don't officially go out of support, any new bugs identified won't be fixed as this version is no longer being actively developed
This level of firmware merits further investigation. Looking deeper on the switches I find this partial switch module list:
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
6 48 1/2/4 Gbps FC Module DS-X9148 ok
7 48 1/2/4 Gbps FC Module DS-X9148 ok
8 48 1/2/4 Gbps FC Module DS-X9148 ok
9 48 1/2/4 Gbps FC Module DS-X9148 ok
These modules are older than the firmware on the switches, and support ended 3 years ago. If this customer has problems with them (or the switches they are installed in) and the problem is traced back to the modules, there is not much that IBM Support can do. If a problem is traced to a bug in the firmware, the customer can't upgrade the firmware to something more current because of these old, unsupported modules still in the switches. This limits IBM's ability to provide support. The hardware is no longer supported and much of the data we can look at in the firmware was not introduced until the next major revision level of NX-OS - v6.2(13). There were also some options and improvements added to lower thresholds and timeout values to increase the frequency of some logging for performance issues.
I could see several 2Gb devices attached to these modules, which is probably why they are still installed. I could also see some of these slow devices zoned to the SVC which is connected to the SAN at 8Gbps. This violates a best practice of not zoning devices together where the port speeds are greater than 2x difference. So, a 2 Gb device should not be zoned to 8Gb. A 4Gb device should not be zoned to 16Gb, etc. The slow device will turn into a slow-drain device sooner rather than later. I suspect this is the customer's problem but can't confirm it because of lack of data due to the age of the hardware and firmware.
The recommendations I gave this customer:
1. move the applications on those slow servers to servers with a 4 or (ideally) 8Gb connection to the SAN on the other newer modules in the switches. This will allow for decommissioning of those modules and move to a best-practice solution.
2. decommission those old modules and upgrade them if the port density is needed. this will allow for firmware upgrades which are beneficial for all the reasons noted above
3. start planning for a refresh on the switches themselves. While the switch chassis will be supported for some time yet, they have already been end of life for a few years.
Modified on by DavidGreen
The first week in May IBM announced IBM Storage Insights. As of 11 June, Storage Insights has these key items:
- IBM Blue Diamond support
- Worldwide support for opening tickets
- Custom dashboards
- New dashboard table view
- Clients can now specify whether IBM Storage Support can collect support logs. This is done on a per-device basis.
You can get a complete list of the new features here: Storage Insights New Features
There are some other new features such as new capacity views on the Storage Insights Dashboard. With these new features, especially support for IBM Blue Diamond customers, Storage Insights is an increasingly important and valuable troubleshooting tool. My team here is seeing more and more customers that are using Storage Insights. I thought I would discuss the potential benefits of Storage Insights as a troubleshooting tool.
The problems my team fixes can be categorized as either:
- Root-cause analysis (RCA) - meaning a problem happened at some point in the past, and the customer wants to know why it happened
- Ongoing issue - the problem is happening now (or happens repeatedly, also called intermittent)
The above two types of problems can further be broken down into partially working or completely broken. Of the two, partially working can be much more difficult to troubleshoot, especially if it's an intermittent issue and not constant. As an example, some years ago my van had a misfire on one of its cylinders, but we didn't know which one. Of course it never occurred when my mechanic was driving it. It finally took several hours at the dealer with the dealer hooking the car up to a test rig to record the failure and identify the misfire. Had the problem been a completely broken spark plug wire instead of partial, it would have been much easier to identify.
You can imagine the difficulty of attempting to root-cause a problem that happened hours or days ago on a large and busy SAN if the problem is/was not severe enough to cause the switches to record any time-stamped errors or other indicators of problems. As an example I'm confident you've been in slow-moving traffic where the cause of the problem isn't readily apparent. The analogy isn't perfect, but suppose the traffic cameras in your city were configured to only start recording when traffic is moving less than 30 mph for 2 minutes and/or generate an alert back to the traffic center. They do record the number of cars passing by and the number of cars exiting and entering the freeway at each ramp but they don't timestamp these numbers. They only timestamp the video and/or alerts. Now further suppose you were stuck in traffic last week that was moving at 32 mph. Since it didn't meet meet the threshold, the cameras never recorded anything and no alerts were sent. You could collect the statistics on the number of cars counted by the cameras but without anything recorded from last week it would be extremely difficult to provide an explanation as to why traffic was slow, since you can't reconcile the count of cars to any specific point in time. If traffic had been completely stopped, the cameras would have started recording and then you'd be able to see the car fire, or accident, or whatever the cause of the problem was last week. The same limitations exist for ongoing issues. If traffic is moving slowly but not completely stopped, then identifying the cause of the slow traffic can be difficult.
How Storage Insights Can Help Provide Root-Cause
Storage Insights has the potential to provide an explanation for these partially working root-cause investigations by regularly sampling the performance statistics and providing timestamps on this data. If we had something like Storage Insights regularly sampling statistics from our traffic cameras, we could go back and analyze these for the time period where you were sitting in traffic. We might find a certain exit ramp from the freeway was congested at the time of the problem. We could take this information and correlate that with other data to try and determine why the ramp was backed up. We might find a concert was going on at a venue near the ramp, or some other event that caused an increase in traffic to that ramp.
How Storage Insights Can Decrease Problem Resolution Time
For a problem that is happening now, Storage Insights can help provide resolution more quickly than without it. Going back to our traffic example, suppose there is an accident or some other problem on a surface street that an exit ramp connects to. Traffic eventually will back up onto the exit ramp and then onto the freeway. Without Storage Insights, you'd have to look at each of your traffic cameras in turn and trying to figure out where the congestion starts. With Storage Insights, since it's collecting the statistics you can filter them to find out which of your exit ramps is the congested ramp.
Modified on by DavidGreen
A few years ago IBM introduced the virtual WWPN (NPIV) feature to the Spectrum Virtualization (SVC) and Spectrum Storwize products. This feature allows you to zone your hosts to a virtual WWPN (vWWPN) on the SVC/Storwize cluster. If the cluster node has a problem, or is taken offline for maintenance the vWWPN can float to the other node in the IO Group. This provides for increased fault tolerance as the hosts no longer have to do path failover to start I/O on the other node in the I/O group.
All of what I've read so far on this feature is from the perspective of someone who is going to be configuring this feature. My perspective is different, as I troubleshoot issues on the SAN connectivity side. This post is going to talk about some of the procedures and data you can use to troubleshoot connectivity to the SVC/Storwize when the NPIV feature is enabled, as well as some best-practice to hopefully avoid problems.
If you are unfamiliar with this feature, there is an excellent IBM RedPaper that covers both this feature and the Hot Spare Node feature:
An NPIV Feature Summary
1: SVC/Storwize has three modes for NPIV - "Enabled", "Transitional" or "Off".
2: Enabled means it is enabled. Hosts attempting to log into to the physical WWPN (pWWPN) of the Storwize port will be rejected. Transitional means it is enabled, but the SVC/Storwize will accept logins to either the vWWPN or the pWWPN. Off means the feature is not enabled.
3: Transitional mode is not intended to be enabled permanently. You would use it while you are in the process of re-zoning hosts to use the vWWPNs instead of the pPWWNs.
4: For the NPIV failover to work, each of the SVC/Storwize nodes has to have the same ports connected to each fabric. For example, assuming this connection scheme for an 8-port node with redundant fabrics:
All the nodes must follow the same connection scheme. Hot-spare node failover will also fail if the nodes are mis-cabled. To be clear I am not advocating the above scheme per se, just that all the nodes must match as to which ports are connected to which fabrics.
5: I was asked at IBM Tech U in Orlando if the SVC/Storwize Port Masking feature is affected by the NPIV feature. The answer is no. Any existing port masking configuration is still in effect.
6: pWWPNs are used for inter-node and inter-cluster (replication) as well as controller/back-end. vWWPNs are only used for hosts.
A Suggestion: A recommendation I heard at IBM Technical University in May is if you are using FC Aliases in your zoning to add the vWWPN to existing alias for each SVC/Storwize cluster port so that you don't have to rezone each individual host. While that is an easy way to transition, that creates a potential problem. After you move the cluster from Transitional to Enabled, the cluster starts rejecting the fabric logins (FLOGI) to the pWWPNs. At best, all this does is fill up a log with rejected logins, at which point you call for support because you notice a host logging FLOGI rejects. At worst this causes actual problems when the adapter and possibly multipath driver attempt to deal with the FLOGI rejects. Prior to moving the NPIV mode to Enabled, you need to remove the pWWPN from the FC Alias, but you must first ensure you are not using the same aliases for zoning of your back-end storage. If you are and you remove the pWWN from the alias you will lose controller connectivity. If you are using a Storwize product with internal storage and no controllers, then this will not be an issue and the pWWN can be removed from the alias. If you do have back-end storage and are currently using the same aliases for both host and controller zoning, it might be easier to establish new aliases for the pWWPNs and rezone the controllers to them, or just rezone the controllers to the pWWPNs before modifying the existing aliases to use the vWWPNs. It will be less zones to modify for the controllers than for the hosts.
Troubleshooting Connectivity to the SVC/Storwize Cluster
One of the most common problems that I see with connectivity issues or path failover not working as it should is incorrect zoning. To that end, you first need to verify the vWWPNs that you should be using. The easiest way is to run lstargetportfc on the Cluster CLI to get a listing of these vWWPNs. lsportfc will list the pWWPNs. This command output is included by default in the svcout file starting at version 8.1.1. Versions prior to that it is a separate command. Once you have that list, you can use the Fibre Channel Connectivity Listing in the SVC/Storwize GUI and the Filtering capabilities there to filter on the vWWPNs and/or the pWWPNs to determine if you have any hosts connected to the pWWPNs. You can also capture the output of
lsfabric --delim , and import that CSV into Excel or similar to get better sorting and filtering than the System GUI. If the host is missing, or is connected to the pWWPNs, you will need to check and verify zoning. This is also a good time to verify controllers are connecting to the pWWPNs, and that if you are using a hot-spare node, that the controllers are zoned to the ports on the hot-spare. I had a case recently where, while it wasn't the reason the customer opened the ticket, I noticed they had not zoned one of their controllers to the hot-spare node. In the event of a node failure, the failover would not have worked as expected.
Modified on by DavidGreen
I am in Orland Florida for IBM Technical University this week. If I had to sum up the week so far with one word, it would be SPEED. How to get more of it (more flash and NVMe) and how to ensure that when you do deploy new products you can keep that speed. Related to that is how to store more data in less space. The cost of storage is dropping, but not as quickly as it was. Deduplication and compression are becoming even more and more important.
IBM Distinguished Engineer Brian Sherman gave a good talk on trends in storage and how IBM is implementing NVMe over FC.
Barry Whyte, an IBM Master Inventor presented the implementation of Data Reduction Pools in version 8.1.2 - the latest IBM SAN Volume Controller (SVC)
DR Pools support a mix of Fully-allocated, Thin-Provisioned, and Compressed-Thin volumes. Deduplication will be supported in the future, but only on the new DR pools. Legacy pools will still be supported however IBM expects DR pools to be the most used type moving forward.
There are restrictions with the DR pools available today. A few of the migration methods that Legacy Storage Pools use are not available. There is a limit of four DR Pools per cluster. The capacity of each pool depends on the number of I/O groups supported and the size of the extents.
With DR pools, all CPU cores can be used for either I/O or compression. This is a change from existing compression which uses a single core.
I presented two sessions today.
The first gives an overview of the tools that are available for monitoring a Cisco Fabric to both detect problems more quickly, and the actions that can be taken to isolate the problem device from the rest of the SAN. These include the tools - Cisco DCNM and Port Monitoring - to manage/monitor the SAN This is not my example but I thought it was a good one. It's like picking up the car that's limping along the interstate with the 50mph spare on and dropping it onto the frontage road. I had a long conversation after this session with a client and gave him some tips and tricks on what he can be doing now from the CLI to gan some visibility into his SAN.
My second session is more focused on what goes on inside a Cisco Director and the stages a frame goes through to make it through the switch. I also had a few tips on some areas customers can check if they suspect problems. It also explains some general FC concepts such as store-and-forward vs cut-through switching and exchange vs source/destination routing. I got some great questions from some of the attendees at this session.
Modified on by DavidGreen
You can register here:
The schedule has been set! My session schedule:
Wednesday 11:30 AM - Proactive Monitoring Of a Cisco Fibre-Channel Fabric - this session introduces some of the tools available for monitoring Cisco Fabrics
Wednesday 3:15 PM - The Life Of a Frame Through A Cisco Switch
Thursday 10:15 AM - Best Practices for Spectrum Storwize SAN Design and Zoning - this includes a case study on what happens when you don't follow best practice
Thursday 11:30 AM - Spectrum Storwize Port Masking and Best Practices - the SAN Design session is not required but is recommended. This session also includes a case study
I also highly recommend the TechU Agenda app. It's useful for searching the sessions and setting your agenda. You can find it on the App Store or on Google Play - search for TechU Agenda Guide. Of course, I already know 4 sessions you'll be attending....
Modified on by DavidGreen
I’ll be a speaker at the upcoming IBM Systems Technical University in Orlando, Florida and I'm looking forward to it.
These are some of my favorite events. You can attend technical sessions, see demos and do it yourself in hands-on labs all at the same event. There are even open technology discussions.
The sessions I am scheduled to present are
- The Life Of A FC Frame Through A Cisco Fibre-Channel Director - this will include some troubleshooting suggestions
- Proactive Monitoring of Cisco Fibre-Channel Fabrics - this session includes some case studies
- Best Practices For IBM Storwize Virtualization Controller Connectivity and Zoning - this session includes a case study on the effects of best practice deviation
- IBM Storwize Virtualization Controller Port Masking - this session also includes some tips and tricks on analyzing the Storwize Device Connectivity Listing
I hope to see you in Orlando!
You can register here: [http://ibm.com/training/events/Orlando2018
I received the official approval today for sessions accepted for Tech U in Orlando. Dates are April 30 - May 4. You can register here.
The sessions I have are:
Once I know my schedule for the week I will post it here.
Modified on by DavidGreen
This past summer I was brought into a SAN performance problem for a customer. When I was initially engaged on the problem, it was a host performance problem. A day or two after I was engaged, the customer had an outage on a Spectrum Scale cluster. That outage was root-caused to a misconfiguration on the Spectrum Scale cluster where it did not re-drive some I/O commands that timed out. The next logical question was why the I/O timed out. Both the impacted hosts and Spectrum Scale cluster used an SVC cluster for storage. I already suspected the problem was due to an extremely flawed SAN design. More specifically, the customer had deviated from best-practice connectivity and zoning of his SVC Cluster and Controllers. A 'Controller' in Storwize/SVC-speak is any storage enclosure - Flash, DS8000, another Storwize product such as V7000, or perhaps non-IBM branded storage. In this case, the customer had 3 Controllers. Two were IBM Flash arrays, for the purposes of this blog post we will focus on those and how the customer SAN design negatively impacted their IBM Flash systems.
Best-Practice SVC/Storwize SAN Connectivity and Zoning
The figure below depicts best-practice port connectivity and zoning for SVC and Controllers on a dual-core Fabric Design. (This assumes you have two redundant fabrics, each of which is configured like the below). As we can see ideally our SVC Cluster and controllers are connected to both of our core switches. A single-core design obviously does not have the potential for mis-configuration since all SVC and Controller ports on a given fabric are connected to the same physical switch. A mesh design we would want to use the same basic principles of connecting SVC and controller ports to the same physical switch(es). Zoning must be configured such that the SVC ports on each switch are zoned only to the Controller ports attached to the same switch. The goal is to avoid unnecessary traffic flowing across the ISL between the switches. In the example below, we have two zones. Zone 1 includes the SVC and Controller ports attached to the left-most switch. Zone 2 includes the SVC and controller ports attached to the right-most switch.
Customer Deviations from Best-Practice on a Dual-Core Fabric
The next figure is the design the customer had. The switches in question were Brocade-branded, but the design would be flawed regardless of the switch vendor. The problem should be be obvious. With the below design, all traffic moving from the SVC to the backend controllers has to cross the ISL, in this case it was a 32 Gbps trunk. The switch data showed multiple ports in the trunk were congested - there were transmit discards and timeouts on frames moving in both directions, and both switches were logging bottleneck messages on the ports in the trunk. The SVC was logging repeated instances of command timeouts and errors indicating it was having problems talking to ports on the Controllers. Lastly, the SVC was showing elevated response times to the Flash storage. All of this was due to the congested ISL. With this design, the client was not getting the ROI or response times it should have been getting from the Flash storage. Of course all of the error correction and recovery caused an increased load on the fabric and re-transmission of frames which made an already untenable situation worse. The immediate fix to provide some relief was to double the bandwidth of the ISL on both fabrics. The long-term fix was to re-connect ports and zone appropriately to get to best-practice.
Customer Host Connectivity and a Visual of the Effect on the Fabric
The last figure shows the customer host connectivity and the effect on the fabric of this flawed fabric design. We can see from the figure that the client had both the underperforming hosts and the GPFS/Spectrum Scale cluster connected to DCX 2 where the Controllers were connected. With this design, we can see that data must traverse the ISL 4 times. Traffic on the ISLs could be immediately reduced by half by moving half of the SVC ports to DCX2 and half the controller ports to DCX1 and then zoning to best practice as in the first figure in this blog post. In addition to the unnecessary traffic on the congested ISL, redundancy is reduced since this design is vulnerable to a failure of either DCX 1 or DCX 2. While the client did have a redundant fabric, a failure of either of those switches means a total loss of connectivity from SVC to Controllers on one of the fabrics. That is significant. ISL traffic could be further reduced (and reliability increased at the host level) by moving half of the GPFS cluster (and other critical host ports) to DCX 1 and zoning appropriately. In this way, the only traffic crossing the ISLs would be hosts or other devices that don't have enough ports to be connected to both cores and whatever traffic is necessary to maintain the fabric. Both the SVC to Controller and the host to SVC traffic would then be much less vulnerable to any delays on the ISLs or congestion in either fabric.
Modified on by DavidGreen
Is it still plagiarism if you are crediting the original author? Anyway, this should have been my first post, or at least included in my first post. I stole the idea from Sebastian Thaele.
Yay! Yet another storage-related blog! As if the world needed another one. The difference with this one is it is written by me, which by itself should be enough to make everyone want to read it. Seriously, while I think I'm pretty good at what I do, I'm not that hubristic. I know that I don't have all the answers. Nobody does. That being said, here's why I created this blog.
First, I wanted to give a bit of an insider's view into IBM Storage Support. This previous post of mine is an example of that.
Second, I work as a world-wide Product Field Engineer for IBM SAN Central. My team is Level 2/Level 3 (depending on who's asking) support on problems related to Storage Networking. If the other IBM product teams can't solve the problem or suspect it may be the SAN, the case is escalated to my team. Many other professions exchange knowledge with peers from other companies, but this rarely happens for members of support organizations. By necessity there is a lot of knowledge sharing amongst the members of my team and across the support teams inside IBM but its too often limited to just IBM.
Third, most of the blogs I read will focus on marketing or high-level overviews of a new product or feature, but there isn't much technical content, and what technical content there is does not have a support perspective. For instance, I've stumbled across more than one blog post regarding IBM announcing Spectrum Storwize code v8.1. Among other things, the code supports a new feature for hot-spare SVC nodes. That's a great feature, but there are some requirements to implement a hot-spare node along with some best-practice recommendations when implementing it. I will have a future post detailing both the requirements and how to best implement the hot-spare node. The best way to fix a problem is to prevent it in the first place. I'm hoping posts like that one will help prevent future problems occuring for IBM's customers.