Explorer reports

In addition to standard and custom reports, you can use Explorer reports, which are available in the cluster management console.

To use Explorer reports, ensure the following requirements:

Enable the Explorer reports feature.
To view an Explorer dashboard, you must have EXPLORER permission.
To work with the Spark Charge Back chart, you do not require special permission; however, to create, edit, or delete other Explorer reports, you must have EXPLORER permission.

Explorer reports, dashboards, and charts

Reports: The term Explorer reports is a larger concept to indicate reporting or communicating data that is stored in database tables through the Reports & Logs > Explorer Reports option in the cluster management console. IBM® Spectrum Conductor also provides standard and custom reports within the console.

Dashboards: Currently, IBM Spectrum Conductor provides one dashboard that is called the Spark Charge Back. You can look at Spark Charge Back by the user, consumer, or application ID for a Spark instance group or across all Spark instance groups.
To see multiple Explorer charts in one view, use an Explorer dashboard. Access dashboards by selecting Reports & Logs > Explorer Reports > Dashboards. All charts in a dashboard share filters. You can create custom dashboards and add charts to each dashboard.

Charts

IBM Spectrum Conductor shows the data in the form of Explorer charts, available under Reports & Logs > Explorer Reports > Charts. By default, the charts display data as bar charts; you can also customize charts to display differently (for example, as a pie, area, line, or table chart).

For the Spark Charge Back dashboard, three charts are available:

Charge Back for Allocated Slots
Charge Back for Memory
Charge Back for Cores

IBM Spectrum Conductor uses the Spark resource usage data loader to record the CPU and memory metrics for running executors and drivers every 30 seconds; and uses the Spark resource usage aggregation data loader to aggregate the slots, cores, and memory usage recorded (across all executors and drivers) in one hour for one application.

Data loaders for Explorer reports

The cluster management console uses data from data loaders. With Explorer reports, IBM Spectrum Conductor additionally aggregates Elasticsearch data to make it consumable to be used for the Spark Charge Back report and then puts the data back into Elasticsearch.

Spark resource usage aggregation data loader (sparkresusageaggloader)

The sparkresusageaggloader data loader aggregates resource usage for cores, slots, and memory for both GPU and CPU by application ID, user name, and top consumer across each Spark instance group, across one hour for one application. This data loader is disabled by default.

The sparkresusageaggloader maintains the file $EGO_CONFDIR/../../integration/elk/conf/indexcleanup/cws_reporting, which includes the following data:

Spark instance group UUIDs
Spark instance group name
Consumer service

Notes:

Spark instance groups created before IBM Spectrum Conductor version 2.3.0 are not included in this list and are excluded from the aggregation. Once a Spark instance group is upgraded to version 2.3.0, ascd is modified to add the Spark instance group to the list.
Only the prior day (when the sparkresusageaggloader is running) is guaranteed to be available to aggregate.