[9.0.5.7 or later]

Displaying PMI metrics in Prometheus format with the metrics app

You can use the metrics.ear file to create a Prometheus endpoint for your WebSphere® Application Server runtimes to display PMI metrics in Prometheus format.

The metrics.ear performs two operations:
  • Retrieves the PMI data objects by using the JMX Perf MBean
  • Renders the data from the PMI data objects into Prometheus format output.

Before you begin

Similar to the PerfServlet, the metrics.ear provides a way to use HTTP requests to query the performance metrics for an entire WebSphere Application Server administrative domain. In contrast to the PerfServlet, which returns PMI data in XML format, the metrics.ear converts PMI data into Prometheus format. The metrics.ear enables the scraping of metrics from your application servers into Prometheus format.

The metrics available on the Prometheus endpoint correspond to the set of metrics enabled in the PMI configuration. For the Prometheus output, some PMI metrics are suppressed or split into two metrics to better follow Prometheus best practices. See Prometheus metrics for a mapping of the original PMI metrics to their corresponding Prometheus metrics.

[9.0.5.9 or later]Cell, node, and server labels are added automatically to metrics when the metrics.ear application is running on a WebSphere Application Server Network Deployment server. To omit these labels, set the following system property to false to omit the cell, node, and server name labels from the Prometheus output.
com.ibm.ws.pmi.prometheus.includeCellNodeServerLabels=false
[9.0.5.10 or later]Node agent metrics are added automatically to metrics when the metrics.ear application is running. You can exclude the node agent metrics so that only application server metrics are included in the Prometheus output. To exclude the node agent metrics, set the com.ibm.ws.pmi.prometheus.includeNodeAgents system property to false, as shown in the following example:
com.ibm.ws.pmi.prometheus.includeNodeAgents=false

Tuning performance

With network deployment, the metrics.ear endpoint contacts all servers in the cell to gather metrics. If any servers in the cell are CPU-bound, or slow to respond for other reasons, they adversely affect the response time of the metrics endpoint. Monitor the response time of metrics endpoint requests to determine whether tuning is needed.

The metrics.ear endpoint response time scales linearly with the number of metrics available at the endpoint. If response time is too slow, reduce the number of metrics that are collected by adjusting the PMI configuration.

PMI settings
Enable only the PMI metrics that are relevant for your business needs. Review the PMI settings and use a custom setting to enable or disable metrics. If possible, avoid the use of the All metrics setting. For servers that do not require metrics collections, set the PMI settings to disable.
URL filtering
You can use the metrics endpoint to query for metrics from a single node or single server. The default endpoint /metrics shows the PMI metrics that are collected from all of the servers and node agents in the cell. To select servers from a specified node or server, use URL /metrics/<node_name> or /metrics/<node_name>/<server_name>.
Prometheus scrape_duration
The default Prometheus scrape_duration is 15 seconds. If the response time for your Prometheus endpoint is a few seconds, increase the Prometheus scrape_duration value. Alternatively, you can scale down the number of PMI metrics available at the endpoint by modifying the PMI settings or by using URL filtering.
Result caching
The metrics app stores the most recent Prometheus metrics result in a cache for 5 seconds by default. A request that is made within 5 seconds of the previous request is served with the cached result. This default interval time value is configurable with the com.ibm.ws.pmi.prometheus.resultCacheInterval system property.
Servers list refresh
The list of servers to be scraped for PMI data is refreshed when the metrics endpoint is accessed. To reduce the cost, the metrics app does not refresh this list more often than every 600 seconds by default. New servers added to the cell with PMI enabled can be picked up by the metrics.ear at the next refresh. This default scrape interval value can be configured with the com.ibm.ws.pmi.prometheus.serverListUpdateInterval system property.
[9.0.5.9 or later]Server metrics scrape response time
For metrics scraping, when Prometheus calls the /metrics endpoint from the metrics.ear application, it makes JMX calls to each server in the cell to collect metrics. If one of the servers is slow to respond, the /metrics endpoint response time might be large and Prometheus times out with no response, according to the Prometheus scrape_timeout configuration setting. The default timeout that is set in the metrics.ear application when it communicates with the servers in the cell is 8 seconds. After this timeout is reached, it returns a response back to Prometheus, even if some server data is omitted from the response. This configuration limits the Prometheus endpoint response time. You can configure the default server scrape timeout value with the following system property: com.ibm.ws.pmi.prometheus.serverScrapeTimeout. If the value is set to 0 or a negative value, no timeout value is set for the server scrapes.

Performance improvement recommendations

As you create a Prometheus endpoint for your WebSphere Application Server runtimes to display PMI metrics in Prometheus format, consider these recommendations.
  • In cells with many servers, use a longer scrape_duration value than typical because the scrape is over the entire cell.
  • Response time is proportional to the number of metrics returned. You can turn off some of the more verbose metrics, such as URI and EJB metrics, to help improve performance.
  • Long scrape times occur when CPU usage of any node in the cell is near 100%. When you use Prometheus with Grafana to consume metrics, gaps can occur in your Grafana graphs when scrape time exceeds the Prometheus scrape timeout.

Procedure

  1. Install the metrics.ear file that is located in the app_server_root/installableApps directory, where app_server_root is the installation path for WebSphere Application Server.
    1. Deploy metrics.ear on a single WebSphere Application Server instance within the domain.

      For a network deployment cell, deploy the metrics.ear on a single-server instance within the domain.

    2. After the metrics application deploys, access it with the http://hostname/metrics default URL.
  2. Enable application security so that only users assigned the monitor role can access the endpoint. You can configure the role assignment to the user in the All Applications > metrics.ear > Security role to user/group mapping page of the administrative console.
  3. After the WebSphere Application Server user is assigned to the monitor role, configure the prometheus yaml to include the login information of the WebSphere Application Server user that has the monitor role.
    - job_name: 'was-nd'
    	     basic_auth:
                 username: someUser
                 password: somePassword
               metrics_path: /metrics
               static_configs:
              - targets: ['localhost:9080']