|Michael Factor, IBM Fellow|
Let’s dive into Swift and what we learned from the data collected during monitoring.
|Dmitry Sotnikov, IBM storage engineer|
Increasing system complexity means increased monitoring complexity — since huge amounts of data need to be analyzed to find out what’s really going on. That’s where our work comes in. Our methodology uses an open source toolbox and enables understanding the behavior of Swift clusters by examining the data collected during monitoring.
Today, performance monitoring and troubleshooting of a running cloud-based object storage is as much an art as a science. Although there are a plethora of open source monitoring tools to gather system metrics, the real challenge is how to use them to find the root cause of a problem.
We developed a general, open-source-based, step-by-step methodology to understand performance bottlenecks in a Swift system. Our solution uses standard tools including Logstash, collectd, StatsD, Elasticsearch, Kibana and Graphite. It also includes an additional simple Swift middleware we developed to gain further insights into the source of system bottlenecks.
|Swift monitoring flow|
For example, if validation of Swift’s network configuration is required, for instance to understand unexpectedly low performance, , it can be done by using our methodology and the open source toolkit on which it is based. This can be seen in the following charts, which present the network utilization between the Proxy and the Object servers, as well as the Proxy’s public network utilization for a write only workload. Based on these charts one can easily validate that all the data received by the Proxy is replicated three times and sent to the Object servers.