Displaying PMI metrics in Prometheus format with the metrics app
You can use the metrics.ear file to create a Prometheus endpoint for your WebSphere® Application Server runtimes to display PMI metrics in Prometheus format.
- Retrieves the PMI data objects by using the JMX Perf MBean
- Renders the data from the PMI data objects into Prometheus format output.
Before you begin
Similar to the PerfServlet, the metrics.ear provides a way to use HTTP requests to query the performance metrics for an entire WebSphere Application Server administrative domain. In contrast to the PerfServlet, which returns PMI data in XML format, the metrics.ear converts PMI data into Prometheus format. The metrics.ear enables the scraping of metrics from your application servers into Prometheus format.
The metrics available on the Prometheus endpoint correspond to the set of metrics enabled in the PMI configuration. For the Prometheus output, some PMI metrics are suppressed or split into two metrics to better follow Prometheus best practices. See Prometheus metrics for a mapping of the original PMI metrics to their corresponding Prometheus metrics.
com.ibm.ws.pmi.prometheus.includeCellNodeServerLabels=trueTuning performance
With a WebSphere Application Server (base) server, the metrics.ear
endpoint contacts the server to gather metrics. If the server is CPU-bound, or slow to respond for
other reasons, response time of the metrics endpoint is adversely affected. Monitor the response
time of metrics endpoint requests to determine whether tuning is needed.
The metrics.ear endpoint response time scales linearly with the number of metrics available at the endpoint. If response time is too slow, reduce the number of metrics that are collected by adjusting the PMI configuration.
The only metrics that are displayed are the ones from the JVM where the metrics.ear application is installed.
- PMI settings
- Enable only the PMI metrics that are relevant for your business needs. Review the PMI settings and use a custom setting to enable or disable metrics. If possible, avoid the use of the All metrics setting. For servers that do not require metrics collections, set the PMI settings to disable.
- URL filtering
You can use the metrics endpoint to query for metrics from a single WebSphere Application Server (base) server. The default endpoint /metrics shows the PMI metrics that are collected from the server. To select a server, use URL /metrics/<node_name>/<server_name>.
- Prometheus scrape_duration
- The default Prometheus scrape_duration is 15 seconds. If the response time for your Prometheus endpoint is a few seconds, increase the Prometheus scrape_duration value. Alternatively, you can scale down the number of PMI metrics available at the endpoint by modifying the PMI settings or by using URL filtering.
- Result caching
- The metrics app stores the most recent Prometheus metrics result in a cache for 5 seconds by default. A request that is made within 5 seconds of the previous request is served with the cached result. This default interval time value is configurable with the com.ibm.ws.pmi.prometheus.resultCacheInterval system property.
- Servers list refresh
A single entry server list for a WebSphere Application Server (base) server to be scraped for PMI data is refreshed when the metrics endpoint is accessed. To reduce the cost, the metrics app does not refresh this list more often than every 600 seconds by default. This default scrape interval value can be configured with the com.ibm.ws.pmi.prometheus.serverListUpdateInterval system property.
- Server metrics scrape response time
For metrics scraping, when Prometheus calls the
/metricsendpoint from the metrics.ear application, it makes JMX calls to the server to collect metrics. If the WebSphere Application Server (base) server is slow to respond, the/metricsendpoint response time might be large and Prometheus times out with no response, according to the Prometheus scrape_timeout configuration setting. The default timeout that is set in the metrics.ear application when it communicates with the servers in the cell is 8 seconds. After this timeout is reached, it returns a response back to Prometheus, even if some server data is omitted from the response. This configuration limits the Prometheus endpoint response time. You can configure the default server scrape timeout value with the following system property: com.ibm.ws.pmi.prometheus.serverScrapeTimeout. If the value is set to 0 or a negative value, no timeout value is set for the server scrapes.
Performance improvement recommendations
- Response time is proportional to the number of metrics returned. You can turn off some of the more verbose metrics, such as URI and EJB metrics, to help improve performance.
- Long scrape times occur when CPU usage of the application server is near 100%. When you use Prometheus with Grafana to consume metrics, gaps can occur in your Grafana graphs when scrape time exceeds the Prometheus scrape timeout.