I listened to another great podcast from the Freakonomics team recently in which they recounted the story of Doctor Ignaz Semmelweis, which inspired me to make a connection to something I see in my day to day job.
Doctor Semmelweis worked at the Vienna General Hospital in the 1840s, delivering babies, teaching students and performing autopsies. Now while working there he realized there was something going horribly wrong at the hospital: up to 1 in 6 of the women whose babies were delivered by the male doctors were dying either during or after childbirth. This rate was far higher than the death rate for women whose babies were delivered by midwives and much higher even than the death rate for women who gave birth on the street!
Semmelweis studied this issue very closely and concluded (quite rightly) that the issue was invisible cadaverous particles on the hands of the doctors. The doctors were going straight from performing autopsies to delivering babies... and transmitting all sorts of foul material to the birthing mothers, killing some of them in the process.
His solution was simple: He made the doctors wash their hands.
The result? The rate of women dying after giving birth at that hospital went from a peak of 15% to less than 2%.
So you would like to think that this story ends with Semmelweis declared a hero and hospital hygiene achieving new heights. Sadly it instead ends with Semmelweis being mostly ignored, going mad and dying from injuries sustained from a beating he received in a mental asylum. His discoveries only really began getting wider recognition after work by greats such as Louis Pasteur and Joseph Lister.
So what on earth does this have to do with Fibre Channel attached storage?
Well the answer is invisible dirt particles and their role in causing hard to explain issues (work with me here your honour, I will make my point).
Fibre optic cable relies on the exposed fibre being absolutely clean. The center of the image below is the light coming from a light source being used with a fibre microscope. While that lit spot looks large, it is actually only 62.5 microns (which is tiny).
If you are using single mode (9 micron) fibre (commonly used with long wave adapters) that lit spot is even smaller:
So what does a dirty fibre look like? How about this:
What about a badly cleaned one?
Now these images are scary. Even worse, the contamination is invisible to the naked eye. It is almost impossible to see dirt on your fibres (and staring at the end of a cable is not recommended anyway, regardless of what is at the other end). So this leads to some obvious questions:
How can I keep my cables from getting dirty?
Quite simply don't expose them to dirt. Always leave dust covers in place on the cable ends and in the SFPs until they need to be used. Don't drag unprotected cables under the floor or leave them hanging in the racks. Don't re-use cables without cleaning them. In fact I recommend cleaning new cables before you start using them. Finally your dust covers need to be protected from dust too. Store dust covers in a sealed bag so that if you re-use them, they have not become contaminated.
How can I clean my cables?
Cleaning kits are something every site should have onsite and always available (like hand sanitizer for Doctors!). Google fibre optic cleaning kit for lots of products. I have used Cletops devices but there are plenty of other choices on the market.
Can I create images like the ones above?
You sure can. Google fibre microscopes for lots of products that can do the job for less than $500. There are plenty of choices on the market. Even if you are not willing to make the expense yourself, make sure your cable provider has one available. If they are testing your cables with a flash light, get another provider.
Can my SAN switch tell me I have dirty cables?
The two most common commands I use are porterrshow and statsclear (on Brocade switches). If you see any values in the highlighted six columns of evil, you may have bad SFPs, damaged cabling or dirty cables. Just be careful it is not ancient history. Clear the stats (with statsclear) and wait a decent interval before checking again with porterrshow.
I could talk in even more detail about monitoring at the switch, but I think that is a whole other blog post.
Feel free to share your horror stories. Who knows, maybe dirty cables are causing your current horror story?