last week I had a discussion with one of our customers - we were talking about sens and nonsense of monitoring.What is the difference between monitoring and analysis?
The reason for this discussion was again an article in the web with an performance analysis which stopped at the point where the problem became visible in the graphs but there was no further step to deeper understand an solve this problem.
The main reason for this was, that there was no tool which allowed to do an deeper analysis.
This analysis tool was a common instrument for many different platforms (not from IBM - something else) so it delivered tje parameters that are commonly available for all these different system . And this have been only the top level indicators like IOPS, data rate, transfer size and response time.
and with these indicators you will only be able to see that something is going wrong, but you will not be able to find out what the reason is.
And without knowing the reason, you can not solve the problem.
And this is exactly the difference between monitoring and analysis.
Analysis starts when monitoring ends
Find out the reason for this bad response time.
You now need to have access to storage indicators which are working much deeper in the system. We actually had a case where the customer recognized bad response time without any change in IO patterns. Everything looked the same. What we found at the end was, that we had no higher load in the storage frontend but much higher load in the storage backend. Reason for this was VMware virtual machine migration with the vaai - you will not see this load where you expect it - you will only see it levels deeper.
No chance for simple monitoring tools.
Somebody who wants to discuss this deeper?