How to start monitoring your infrastructure using IBM Cloud Monitoring.
The foundation of continuous software improvement is measurement. Businesses have found that additional latency will affect customer satisfaction and sales. What are the latencies observed by your customers? Are servers overloaded? Underloaded? How do you know if you do not measure? A proven way to improve the services you provide is to choose metrics that are affecting your business and measure, observe and improve over time.
IBM Cloud has great out-of-the-box support for observing account resources in real-time. As an example, enable platform metrics in the same region as an existing VPC. Open a VPC Virtual Server Instance and look at the Monitoring preview:
Click on Launch monitoring to see a comprehensive metrics view focused on the instance:
Metrics can also be collected in application source code. A monitoring agent process running on the same VPC instance as the application will push the metrics to the IBM Cloud Monitoring instance. The following example was implemented on a Linux host:
In the diagram, notice the following:
- Program example.py is running on a VPC instance.
- Program sends StatsD metrics to the agent.
- Agent scrapes Prometheus metrics from the program.
- Monitoring agent sends metrics to the Monitoring instance.
You can find a companion repository that contains the source code for Python examples and instructions for creating the resources.
The Monitoring agent is a StatsD server and can be configured as a Prometheus forwarder. Modern programming languages have open-source libraries for both the StatsD client and Prometheus client exporter. For Python, check out the following:
Most programs will use either Prometheus or StatsD, but the code example.py has both:
The Python annotations
@statsd.timer("custom_timing") can be prepended to a function and will capture the execution time as a metric. The metrics can then be visualized in the Monitoring instance. Here is a snapshot from the example:
There are open-source Prometheus exporters that can be installed on the VPC instance to gather metrics directly from the environment. These can be installed on the instance and configured to be scraped/forwarded by the dragent:
The node_exporter can capture some addition metrics directly from the VPC instance operating system (NFS metrics, for example).
The statsd_exporter will capture additional timing quantiles and acceptable error metrics.
I put together the dashboard to monitor my application:
The App latency outlier in my dashboard looks like a problem that I need to look into. The Monitoring service supports alerts to notify my team of these anomalies:
The repository explains how to create the alert.
Try it yourself
Start monitoring your infrastructure using the IBM Cloud Monitoring service. Create custom metrics to get visibility into your software. Be alerted when systems are not in your defined parameters. Continuously improve outcomes by observing metrics over time and driving change.
The source code for this blog post can be found here, along with instructions.