Part 2 of 3: Analyzing data with Explorer reports and the Spark resource usage aggregation data loader - Search, analyze, and visualize Spark application data with IBM Spectrum Conductor
TinaLangridge 270007S9W7 Visits (1360)
IBM Spectrum Conductor 2.3 offers Explorer reports, which includes a Spark Charge Back chart. The cluster management console within IBM Spectrum Conductor uses data from an Explorer reports data loader (called the Spark resource usage aggregation data loader). IBM Spectrum Conductor aggregates the data and puts the data back into Elasticsearch to make it consumable for the Spark Charge Back chart.
This blog highlights how the data loader rolls up Spark application resource metrics collected in the Spark Charge Back chart and to explore aggregations of these metrics with out-of-the box visualizations with Explorer reports.
This blog is the second part of a three-part blog series:
Collecting and storing Spark resource usage metrics
Spark resource usage metrics include allocated slots, cores, and memory. IBM Spectrum Conductor collects these metrics for all executors, drivers, and applications across all applications and Spark instance groups. It collects allocated slots as they change, and collects memory and cores every 30 seconds:
Each metric and metric name combination is indexed as key-value pairs into Elasticsearch, along with metadata to correlate the activity back to the corresponding Spark activity and application. These metrics are stored into distinct indices, directly correlated to their corresponding Spark instance group. The metrics are organized in this way to reduce the amount of disk space required to store the metrics and to optimally perform queries and aggregations.
When IBM Spectrum Conductor generates visualizations that aggregate activity metrics for one application or Spark instance group, this translates to Elasticsearch queries involving bucketing, aggregations, and non-trivial logic, to rebuild data not frequently collected.
Data loader for Spark resource usage aggregation
IBM Spectrum Conductor 2.3 introduces Explorer reports and a new Spark resource usage aggregation data loader (called spar
Here are example formulas used to total allocated slots usage for one hour for a single application:
This is an example formula used to total core usage for one hour for a single activity, totaled across all activities for one application:
This is a example formula used to total memory usage for one hour for a single activity, totaled across all activities for one application:
Elasticsearch index template
The aggregated metrics are stored into a new Elasticsearch index for all Spark instance groups. This index in turn can be used to create Spark Charge Back charts, with little to no additional effort.
Each record stores the Spark instance group, user name, and top consumer data, which in turn can be used as filters or to further aggregate metrics. Here is an example of the ibm-
Explorer report charts
You can view the allocated slots, cores, and memory data that the spar
By default, the horizontal dimension shows the time (day), and the vertical or color dimension is the Spark instance group. You can modify both dimensions to partition the data by time, Spark instance group, consumer name, or user name. Use the drop-down list to switch between CPU or GPU metrics. Use the additional drop-down lists to filter data by selecting one or more Spark instance groups, consumer names, or user names.
Explorer reports dashboard
IBM Spectrum Conductor 2.3 is also preloaded with an Explorer reports dashboard that contains all three Spark Charge Back charts.
For the dashboard, the filters are situated to the left, and apply to all charts. The drop-down lists to switch between CPU or GPU metrics, and to filter data, remain above each chart; they apply only to the chart below and do not affect the remaining charts.
Installation and configuration
When you install IBM Spectrum Conductor 2.3, Explorer reports and the Spark resource usage aggregation data loader (spa
Before enabling the Spark resource usage aggregation data loader, it is worthwhile to explore the data loader’s configuration file and modify where required; otherwise, the defaults will be used.
Metrics that will be aggregated are identified in the aggMetric and aggIntervalMetric properties. The aggBucket and aggIntervalFunction are properties that allow users to tweak the formula used to calculate the memory or core usage for one hour.
You can get full instructions to enable the Explorer reports and its data loader from IBM
Now that you have a better understanding of Explorer reports, the Spark resource usage aggregation data loader, and how to explore aggregations of these metrics with out-of-the box visualizations, check out the final blog in this series: Part