Workload
Learn how to manage workload in IBM® Spectrum
Conductor . As part of managing workload,
add Spark versions and notebooks (optional); then, create and manage instance groups . Once you create an instance group , submit Spark applications to
the instance groups and manage those
applications.
Instance groups
Each instance group is an installation of Apache Spark that can run Spark core services (Spark master, shuffle, and history) and notebooks as configured.
Anaconda
An Anaconda distribution is the easiest way to work with Python data science and machine learning with IBM Spectrum Conductor . IBM Spectrum Conductor offers better performance and security to manage your Anaconda cluster.
Notebooks
Notebooks provide an interactive environment for data analysis, enabling you to explore and visualize data analytics from your browser. You can add your own notebook (along with packages that contain the components that are required to run that notebook) and manage it in IBM Spectrum Conductor .
Spark applications
You can manage Spark applications that are associated with an instance group .
Data connectors
Data connectors manage the libraries and configurations that are required for hosts to connect to various data sources. A data connector contains the type, the URI, the authentication method, and all of the required libraries to access the data source.
Spark versions
You can download and import new Spark versions for use with IBM Spectrum Conductor . These Spark versions are prepackaged to include Apache Spark binaries and other files required for IBM Spectrum Conductor capabilities.
Cloud bursting with host factory
The host factory framework for cloud bursting leverages the on-demand capabilities of cloud infrastructure for your on-premises cluster. With host factory , you can seamlessly extend your cluster infrastructure to include compute hosts from the cloud to handle the demands of excess workload. Your cluster dynamically grows or shrinks, supporting a hybrid mix of on-premises and cloud hosts, based on the resource demands of applications that run in the cluster.
GPUs
For faster execution of Spark applications, you can enable GPU scheduling and define the GPU mode on your hosts.
Managing services
Services within an instance group encapsulate business logic and is the part of your instance group that does the actual calculation.
Monitoring and the Elastic Stack
When you install and configure IBM Spectrum Conductor , the Elastic Stack (Elasticsearch, Logstash, and Beats) is integrated within IBM Spectrum Conductor and is used for data collection and visualization. Registered as system services, the Elastic Stack integration enables you to search, analyze, and visualize Spark application data for efficient monitoring.