Spark service CLI

The Analytic Engine Powered by Apache Spark CLI provides command line options to interact with instances. You can manage instances and Spark applications using the CLI.

Before you get started

Before you can get started using the Apache Spark CLI, you need to define the following environment variable because the analytics-engine namespace is hidden behind a feature flag.

You must define the CPDCTL_ENABLE_ANALYTICS_ENGINE environment variable as follows:

$ CPDCTL_ENABLE_ANALYTICS_ENGINE=1

CLI help command

The Spark service CLI help command shows you the supported CLI supports commands for:

Instance management actions
Spark application actions
History server actions

cpdctl analytics-engine --help

Response to help command:

API for Analytics Engine Instance and jobs.

Usage:
 cpdctl analytics-engine [command]

Aliases:
 analytics-engine, ae

Available Commands:
 instance    Commands for instance resource
 spark-app   Commands for Spark application resource
 history-server Commands for History server resource

Flags:
 -h, --help        help for analytics-engine
 -j, --jmes-query string  Provide a JMESPath query to customize output.
   --output string    Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
 -q, --quiet        Suppresses verbose messages.

Global Flags:
   --context string   Name of the configuration context to use
   --cpd-config string  Configuration file path
   --cpdconfig string  [Deprecated] Use --cpd-config instead
   --raw-output     If set to true, single values in JSON output mode are not surrounded by quotes

For information about a particular command, use:

cpdctl analytics-engine [command] --help

Instance management commands

The instance management help command shows you the supported instance management CLI commands for:

Retrieving instance details
Changing the CPU and memory quota of an instance

$ cpdctl analytics-engine instance  --help

Response to help command:

Instance management commands.

Usage:
  cpdctl analytics-engine instance [command]

Available Commands:
  get         Get instance details.
  set-quota   Change the default cpu quota and memory quota of the instance.

Flags:
  -h, --help   help for instance

Global Flags:
      --context string      Name of the configuration context to use
      --cpd-config string   Configuration file path
      --cpdconfig string    [Deprecated] Use --cpd-config instead
  -j, --jmes-query string   Provide a JMESPath query to customize output.
      --output string       Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
  -q, --quiet               Suppresses verbose messages.
      --raw-output          If set to true, single values in JSON output mode are not surrounded by quotes

For information about a particular instance management command, use:

cpdctl analytics-engine instance [command] --help

instance `get`

Use this command to get the instance details, for example the instance home volume, the available resource quota and other configurations of a provisioned instance. For help on the command syntax, enter:

$ cpdctl analytics-engine instance get --help

Response to help command:

Retrieve the details of a single instance.

Usage:
  cpdctl analytics-engine instance get --instance-id INSTANCE-ID

Flags:
  -h, --help                 help for get
  -i, --instance-id string   Identifier of the instance to retrieve.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the instance get command:

$ cpdctl analytics-engine instance get --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --output json

{
  "configs": {},
  "context_id": "d57ea5e1-fbca-44ea-b72a-bb63ebecae9c",
  "context_type": "space",
  "home_volume": "volumes-silpi-test-vol-pvc",
  "instance_id": "62f8f5de-6c56-499a-a01a-744c6e16caa1",
  "resource_quota": {
    "avail_cpu_quota": 64,
    "avail_memory_quota_gibibytes": 200,
    "cpu_quota": 64,
    "memory_quota_gibibytes": 200
  }
}

instance `set-quota`

Use this command to set the instance quota (the CPU and memory quota).

$ cpdctl analytics-engine instance set-quota --help

Response to help command:

Change the default cpu quota and memory quota of the instance.

Usage:
  cpdctl analytics-engine instance set-quota --instance-id INSTANCE-ID [--cpu-quota CPU-QUOTA] [--memory-quota MEMORY-QUOTA]

Flags:
      --cpu-quota int        Max cpu quota for an instance.
  -h, --help                 help for set-quota
  -i, --instance-id string   Identifier of the instance to retrieve.
      --memory-quota int     Max mamory quota for an instance in gibibytes.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the instance set-quota command:

$ cpdctl analytics-engine instance set-quota --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --cpu-quota 64 --memory-quota 200
...
OK

Spark application commands

The Spark application help command shows you the supported Spark application CLI commands for:

Submitting Spark applications
Stopping Spark applications
Getting the details of an application by application ID

$ cpdctl analytics-engine spark-app --help

Response to help command:

spark-app management commands.

Usage:
  cpdctl analytics-engine spark-app [command]

Available Commands:
  submit      Creates an application.
  stop        Delete the application.
  get         Get application details by application id.

Flags:
  -h, --help   help for spark-app

Global Flags:
      --context string      Name of the configuration context to use
      --cpd-config string   Configuration file path
      --cpdconfig string    [Deprecated] Use --cpd-config instead
  -j, --jmes-query string   Provide a JMESPath query to customize output.
      --output string       Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
  -q, --quiet               Suppresses verbose messages.
      --raw-output          If set to true, single values in JSON output mode are not surrounded by quotes

For information about a particular spark-app command, use:

cpdctl analytics-engine spark-app [command] --help

With the Spark application CLI commands, you can perform the following operations:

spark-app submit
spark-app get
spark-app stop

spark-app `submit`

Use this command to submit a Spark application in an instance. For help on the command syntax, enter:

$ cpdctl ae spark-app submit --help

Response to help command:

Deploy a Spark application.

Usage:
  cpdctl analytics-engine spark-app submit --instance-id INSTANCE-ID [--application-details APPLICATION-DETAILS] [--volumes VOLUMES]

Flags:
      --application-details string   Application details.
  -h, --help                         help for submit
  -i, --instance-id string           The identifier of the instance where the Spark application is submitted.
      --volumes string               a list of pvcs to mount in spark cluster.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the spark-app submit command:

$ cpdctl ae spark-app submit --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1  --application-details "{\"application\":\"/opt/ibm/spark/examples/src/main/python/wordcount.py\",\"application_arguments\":[\"/opt/ibm/spark/examples/src/main/resources/people.txt\"]}" --output json
{
  "application_id": "03ae6297-d6a6-4032-adc6-861ead5f3ad2",
  "spark_application_id": "app-20220407090305-0000",
  "start_time": "Thursday 07 April 2022 09:03:05.686+0000",
  "state": "WAITING"
}

spark-app `get`

Use this command to show the details of a submitted Spark application in an instance. For help on the command syntax, enter:

$ cpdctl analytics-engine spark-app get --help

Response to help command:

Retrieve the details of a given Spark application.

Usage:
  cpdctl analytics-engine spark-app get --instance-id INSTANCE-ID --application-id APPLICATION-ID

Flags:
      --application-id string   Identifier of the application for which details are requested.
  -h, --help                    help for get
  -i, --instance-id string      Identifier of the instance where the application runs.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the spark-app get command:

The following example shows the details of a submitted Spark application:

$ cpdctl analytics-engine spark-app get --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --application-id 4da21012-837b-4d61-b8c8-44140a4da956 --output json
{
  "application_details": {},
  "application_id": "4da21012-837b-4d61-b8c8-44140a4da956",
  "finish_time": "Tuesday 05 April 2022 16:35:42.355+0000",
  "mode": "stand-alone",
  "spark_application_id": "app-20220405163526-0000",
  "start_time": "Tuesday 05 April 2022 16:35:26.911+0000",
  "state": "FINISHED"
}

spark-app `stop`

Use this command to stop a submitted Spark application in an instance. For help on the command syntax, enter:

$ cpdctl analytics-engine spark-app stop --help

Response to help command:

Stop a Spark application. This is an idempotent operation. Performs no action if the requested application is already stopped or completed.

Usage:
  cpdctl analytics-engine spark-app stop --instance-id INSTANCE-ID --application-id APPLICATION-ID

Flags:
      --application-id string   Identifier of the application that needs to be stopped.
  -h, --help                    help for stop
  -i, --instance-id string      Identifier of the instance to which the application belongs.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the spark-app stop command:

The following example shows how to stop a Spark application:

$ cpdctl analytics-engine spark-app stop --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --application-id 63acf863-1aea-4aa8-a93d-08e30112fae9 --output json
""

Spark history server commands

The Spark history server help command shows you that the CLI supports commands for:

Starting the Spark history server
Stopping the Spark history server

$ cpdctl analytics-engine history-server --help

Response to the help command:

Commands for HistoryServer resource.

Usage:
  cpdctl analytics-engine history-server [command]

Available Commands:
  stop        Stop history server.
  start       Start history server.

Flags:
  -h, --help   help for history-server

Global Flags:
      --context string      Name of the configuration context to use
      --cpd-config string   Configuration file path
      --cpdconfig string    [Deprecated] Use --cpd-config instead
  -j, --jmes-query string   Provide a JMESPath query to customize output.
      --output string       Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
  -q, --quiet               Suppresses verbose messages.
      --raw-output          If set to true, single values in JSON output mode are not surrounded by quotes

For information about a particular spark-history command, use:

cpdctl analytics-engine history-server [command] --help

history-server `start`

Use this command to start the Spark history server. For help on the command syntax, enter:

$ cpdctl analytics-engine history-server start --help

Response to the help command:

Start the history server for provisioned instance of Analytics Engine.

Usage:
  cpdctl analytics-engine history-server start --instance-id INSTANCE-ID

Flags:
  -h, --help                 help for start
  -i, --instance-id string   The identifier of the instance where the history server is started.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the spark-server start command:

The following example shows how to start the Spark history server:

cpdctl analytics-engine history-server start --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --output json
{
  "message": "History server started successfully"
}

history-server `stop`

Use this command to stop the Spark history server. For help on the command syntax, enter:

$ cpdctl analytics-engine history-server stop --help

Response to the help command:

Stop the history server for the provisioned instance of Analytics Engine.

Usage:
  cpdctl analytics-engine history-server stop --instance-id INSTANCE-ID

Flags:
  -h, --help                 help for stop
  -i, --instance-id string   The identifier of the instance where the history server is to be stopped.

Global Flags:
      --context string       Name of the configuration context to use
      --cpd-config string    Configuration file path
      --cpdconfig string     [Deprecated] Use --cpd-config instead
  -j, --jmes-query string    Provide a JMESPath query to customize output.
      --output string        Choose an output format - can be 'json', 'yaml', or 'table'. (default "table")
      --output-file string   If set, all output is redirected to a file of the given path
      --output-path string   If set, all output is redirected to a file of the given path (DEPRECATED: use --output-file instead)
  -q, --quiet                Suppresses verbose messages.
      --raw-output           If set to true, single values in JSON output mode are not surrounded by quotes

Example of using the spark-server stop command:

The following example shows how to stop the Spark history server:

$ cpdctl analytics-engine history-server stop --instance-id 62f8f5de-6c56-499a-a01a-744c6e16caa1 --output json
{
  "message": "History stopped successfully"
}