With PromQL, the Prometheus query language available to all IBM Cloud Monitoring users, we will be performing a metric format migration and increasing metrics limits.
Read on to understand the impact on your Prometheus metrics monitoring.
PromQL GA availability on IBM Cloud Monitoring
Since September 2020, PromQL functionality has been available on IBM Cloud Monitoring with Sysdig. We are now excited to announce that the PromQL functionality graduates to GA, becoming available to all existing and new monitoring instances.
While the graphical Query UI is the most simple and straightforward approach to query metrics, users can also leverage PromQL, the Prometheus query language. You can use PromQL within both Dashboards and Alerts to create advanced data queries and perform arithmetic operations on all metrics — custom metrics such as Prometheus and statsd, but also on out-of-the-box Sysdig agent infrastructure and container metrics:
Prometheus metrics native ingest and migration
Users ingesting Prometheus metrics will be affected by this migration. If you are only using infrastructure metrics and statsd, or you created your account after September 2020, you are not impacted.
PromQL requires Prometheus metrics native ingestion. Until now, Prometheus metrics were ingested as calculated metrics — a format that allowed simpler aggregations based on statsd-like monitoring. With the adoption of PromQL, and to align with the OpenMetrics standard and the Prometheus monitoring ecosystem, we started to ingest Prometheus metrics in native format (raw) in parallel.
This migration will automatically replace Dashboards and Alerts that were created using the old Prometheus metrics format (metrics prefixed with the promlegacy. prefix) with the native metric format (and remove the prefix). Agents will stop sending metrics in calculated format.
During the automated migration, alerts will be disabled. This process takes less than a minute, so your alerting service will not be affected.
What do I need to do?
You should upgrade your agent to the latest version. This is a good practice and will allow you to benefit from increased metrics limits. If you are running agents older than Sysdig agent version 10.0 Prometheus, metrics might stop working since Prometheus native ingestion is not supported.
If you are using the .avg or .count metrics generated by the calculated format from Histogram Prometheus metric types in Dashboards or Alerts, you will have to manually update them. Check the migration details for more information on this.
Increasing metrics limits
Together with this change, the limit of the number of custom metrics time series that agents can send to the IBM Cloud Monitoring service will be updated. The new default limits will be as follows:
- Prometheus metrics: 17k
- statsd metrics: 1k
- JMX metrics: 1k
- AppCheck metrics: 1k
Agents older than 10.8.0 will not exceed 10k time series, being Prometheus metrics limits 7K time series.
You can manually reconfigure your agent to customize any of these limits — from disabling the metrics that you don’t need, to going to a maximum of 20k time series. The aggregate of all custom metrics should not exceed 20k time series.
You can check your current agent versions, time series usage, and configured limits on the Sysdig agent health and status dashboard:
Further improvements and exciting features to improve Prometheus monitoring compatibility, Kubernetes monitoring and IBM Cloud service monitoring are coming to the IBM Cloud Monitoring service really soon, so stay tuned for updates.
Migration details and roll-out
Time aggregation changes
Calculated metrics format performs some aggregations within the agent for Prometheus counter type metrics. When users query these metrics using the Query UI, the rate is already calculated at ingestion time.
With the new Prometheus native format (where metrics are ingested raw), the following time aggregations will be automatically replaced in user’s dashboards and alerts — based on the type of the metric —when calculated metrics (with promlegacy. prefix) are found:
Prometheus Histogram support (manual action required)
We are deprecating the special handling we did with calculated format on Prometheus Histogram metrics type, where for one Prometheus Histogram metric, calculated metrics automatically generated two — .avg and .count metrics.
Now these generated metrics disappear, and a manual update is required. You should replace queries on Histograms with the right PromQL function for you. Typically, those would be avg, count, rate or histogram_quantile. You can do this any time before or after the migration, but after the migration, the dashboard panels and alerts with this metric type will stop working.
Migration roll-out timelines
The migration will across all IBM Cloud Monitoring regions according to the following schedule:
- AU-SYD: January 25, 2021
- EU-DE: January 25, 2021
- JP-TOK: January 26, 2021
- EU-GB: January 27, 2021
- US-EAST: January 27, 2021
- US-SOUTH: January 26, 2021
We’d love your feedback and an opportunity to help in event you have any questions, so feel free to email us directly.
Learn more about IBM Cloud Monitoring with Sysdig.