April 20, 2022 By Powell Quiring 3 min read

How to start monitoring your infrastructure using IBM Cloud Monitoring.

The foundation of continuous software improvement is measurement. Businesses have found that additional latency will affect customer satisfaction and sales. What are the latencies observed by your customers? Are servers overloaded? Underloaded? How do you know if you do not measure? A proven way to improve the services you provide is to choose metrics that are affecting your business and measure, observe and improve over time.

IBM Cloud has great out-of-the-box support for observing account resources in real-time. As an example, enable platform metrics in the same region as an existing VPC. Open a VPC Virtual Server Instance and look at the Monitoring preview:

Click on Launch monitoring to see a comprehensive metrics view focused on the instance:

Metrics can also be collected in application source code. A monitoring agent process running on the same VPC instance as the application will push the metrics to the IBM Cloud Monitoring instance. The following example was implemented on a Linux host:

In the diagram, notice the following:

  1. Program example.py is running on a VPC instance.
  2. Program sends StatsD metrics to the agent.
  3. Agent scrapes Prometheus metrics from the program.
  4. Monitoring agent sends metrics to the Monitoring instance.

You can find a companion repository that contains the source code for Python examples and instructions for creating the resources.

The Monitoring agent is a StatsD server and can be configured as a Prometheus forwarder. Modern programming languages have open-source libraries for both the StatsD client and Prometheus client exporter. For Python, check out the following:

Most programs will use either Prometheus or StatsD, but the code example.py has both:

h = prometheus_client.Histogram('custom_histogram', 'application prometheus example', buckets=buckets)
statsd = statsd.StatsClient()

@ h.time()
def prometheus_example(i):
  ... # do somethng

@statsd.timer("custom_timing")
def statsd_example(i):
  ... # do somethng

The Python annotations @h.time() and @statsd.timer("custom_timing") can be prepended to a function and will capture the execution time as a metric. The metrics can then be visualized in the Monitoring instance. Here is a snapshot from the example:

Prometheus exporters

There are open-source Prometheus exporters that can be installed on the VPC instance to gather metrics directly from the environment. These can be installed on the instance and configured to be scraped/forwarded by the dragent:

The node_exporter can capture some addition metrics directly from the VPC instance operating system (NFS metrics, for example).

The statsd_exporter will capture additional timing quantiles and acceptable error metrics. 

Application dashboard

I put together the dashboard to monitor my application:

Alerts

The App latency outlier in my dashboard looks like a problem that I need to look into. The Monitoring service supports alerts to notify my team of these anomalies:

The repository explains how to create the alert.

Try it yourself

Start monitoring your infrastructure using the IBM Cloud Monitoring service. Create custom metrics to get visibility into your software. Be alerted when systems are not in your defined parameters. Continuously improve outcomes by observing metrics over time and driving change.

The source code for this blog post can be found here, along with instructions.

Was this article helpful?
YesNo

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters