wx-data commands and usage

The wx-data command further has different commands within, using which you can perform various operations specific to watsonx.data. This topic lists the commands with brief description of the tasks that can be performed.

watsonx.data on IBM Software Hub

watsonx.data on IBM Cloud®

The wx-data command perform operations such as, ingesting data, managing engines, storage and data sources in watsonx.data.

Syntax
./cpdctl wx-data [command] [options]
The wx-data command supports the following commands:

How to use wx-data command --help (-h)

  • To list all the commands in the wx-data plugin:
    ./cpdctl wx-data --help
  • To get details of all options and its descriptions for a specific command in wx-data plugin:
    ./cpdctl wx-data [command] --help

    For example:

    ./cpdctl wx-data ingestion -h
    NAME:
      ingestion - Commands for Ingestion resource.
    
    USAGE:
       cpdctl wx-data ingestion [action]
    
    COMMANDS:
      list     List ingestion jobs.
      create   Create an ingestion job.
      get      Get ingestion job details.
    
    GLOBAL OPTIONS:
          --cpd-config string   Configuration file path
          --cpdconfig string    [Deprecated] Use --cpd-config instead
      -h, --help                Show help
          --profile string      Name of the configuration profile to use
      -q, --quiet               Suppresses verbose messages.
          --raw-output          If set to true, single values in JSON output mode are not surrounded by quotes
    
    Use "cpdctl wx-data ingestion service-command --help" for more information about a command.
    
  • To get the details of all available options and arguments in the wx-data commands to execute an operation:
    ./cpdctl wx-data [command] [options] --help
  • To use the wx-data plugin to execute an operation:
    ./cpdctl wx-data [command] [options]

ingestion

The ingestion command is used for executing different ingestion operations in watsonx.data.

Syntax
./cpdctl wx-data ingestion [options]
The ingestion command further supports the following commands:
Options Description
./cpdctl wx-data ingestion list Lists the ingestion jobs executed in watsonx.data instance.
./cpdctl wx-data ingestion create Create an ingestion job in watsonx.data instance.
./cpdctl wx-data ingestion get Get the details of an ingestion job executed in watsonx.data instance.

engine

The engine command is used for executing different engine related operations in watsonx.data.

Syntax
./cpdctl wx-data engine [options]

The engine command supports the following commands:

Options Description
./cpdctl wx-data engine list Lists all the engines available in watsonx.data instance.
./cpdctl wx-data engine create Create or register an engine in watsonx.data instance.
./cpdctl wx-data engine delete Delete an engine from watsonx.data instance.
./cpdctl wx-data engine attach Associate catalogs to a Presto engine in watsonx.data instance.
./cpdctl wx-data engine detach Disassociate the catalogs associated with a Presto engine in watsonx.data instance.
Note: In watsonx.data 2.1.1 version, create and delete commands are used only to create and delete a Presto (Java) engine.
Note: From watsonx.data 2.1.2 version and later, create and delete commands are used to create all available engines in watsonx.data.

bucket

The bucket command is used for executing different storage related operations in watsonx.data.

Syntax
./cpdctl wx-data bucket [options]

The bucket command supports the following commands:

Note: With watsonx.data version 2.2.2 and later, watsonx.data automatically activates any newly created storage without requiring an associated catalog, eliminating the need for manual activation. Manual deactivation is also no longer required when deleting a storage.
Options Description
./cpdctl wx-data bucket list Lists all the storages available in watsonx.data instance.
./cpdctl wx-data bucket create Register a storage in watsonx.data instance. Use of secrets from an external vault (HashiCorp) is enabled with create option. Custom S3 storage creation is supported from CPDCTL version v1.8.219 and later.
./cpdctl wx-data bucket get Get the details of a registered storage in watsonx.data instance.
./cpdctl wx-data bucket delete Delete a storage from watsonx.data instance.
./cpdctl wx-data bucket activate Activate a storage bucket in watsonx.data on IBM Cloud instance only.
./cpdctl wx-data bucket list-objects List all objects in the bucket.
./cpdctl wx-data bucket upload Upload a file from local filesystem to a watsonx.data object storage bucket.
Limitations:
  • When using the list-objects command, buckets with a large number of objects might not list all objects because of API timeouts.
  • When using the --paginated parameter with the list-objects command, only top-level objects are listed. Nested objects are not expanded by default.
  • Listing the objects using list-objects is not supported in ADLS and GCS buckets currently.

database

The database command is used for executing different data source related operations in watsonx.data.

Syntax
./cpdctl wx-data database [options]

The database command supports the following commands:

Options Description
./cpdctl wx-data database list Lists all the data sources available in watsonx.data instance.
./cpdctl wx-data database create Create or add a data source in watsonx.data instance. Use of secrets from an external vault (HashiCorp) is enabled with create option.
./cpdctl wx-data database get Get the details of a registered data source in watsonx.data instance.
./cpdctl wx-data database delete Delete a data source from watsonx.data instance.

sparkjob

The sparkjob command is used for executing different Spark related operations such as submitting a Spark application, listing all applications, and getting the status of a Spark application in watsonx.data.

Syntax
./cpdctl wx-data sparkjob [options]

The sparkjob command supports the following commands:

Options Description
./cpdctl wx-data sparkjob list List all applications available in a Spark engine.
./cpdctl wx-data sparkjob create Submit a Spark application.
./cpdctl wx-data sparkjob get Get the status of a Spark application.
For more information about how to submit a Spark application by using IBM cpdctl in watsonx.data on IBM Software Hub, see Submitting Spark application by using IBM cpdctl.

tablemaint

The tablemaint command is used for executing different Iceberg table maintenance operations in watsonx.data.
Important: This is now applicable only for Amazon S3 storage.
Syntax
./cpdctl wx-data tablemaint [options]

The tablemaint command supports the following commands:

Options Description
./cpdctl wx-data tablemaint rollback-to-snapshot Roll back, or restore the table to a specific snapshot ID.
./cpdctl wx-data tablemaint rollback-to-timestamp Roll back a table to the snapshot at a specific timestamp.
./cpdctl wx-data tablemaint set-current-snapshot Sets the current snapshot ID for a table.
./cpdctl wx-data tablemaint cherrypick-snapshot Cherry-picks changes from a snapshot into the current table state. Cherry-picking creates a new snapshot from an existing snapshot without altering or removing the original.
./cpdctl wx-data tablemaint expire-snapshot Remove older snapshots and their files which are no longer needed.
./cpdctl wx-data tablemaint remove-orphan Remove files that are not referenced in any metadata files of an Iceberg table and can thus be considered "orphaned".
./cpdctl wx-data tablemaint rewrite-data Rewrites the data files.
./cpdctl wx-data tablemaint rewrite-manifests Rewrite manifests for a table to optimize scan planning.
./cpdctl wx-data tablemaint register-table Creates a table.
The following flags are listed when you run each table maintenance command:
  • Force : If the value is set to TRUE, the SQL query that you are going to run will not be printed.
  • Debug : If the value is set to TRUE, a copy of the Spark application file is stored to your computer.
For more information about how to perform Spark table maintenance by using IBM cpdctl in watsonx.data on IBM Software Hub, see Spark table maintenance by using IBM cpdctl.

service

The service command is used for executing different serviceability related operations in watsonx.data.

Syntax
./cpdctl wx-data service [options]

The service command supports the following commands:

Options Description
./cpdctl wx-data service list-tables Lists all table names of hive or iceberg connectors in watsonx.data instance.
./cpdctl wx-data service get-qhmm-config Get the qhmm enabled bucket name in watsonx.data instance.
./cpdctl wx-data service monitor To run stats and qhmm related queries in watsonx.data instance.
./cpdctl wx-data service generate-engine-dump Generate heap or thread dump specific to Presto worker or coordinator watsonx.data instance.

component

The component command is used for getting the configurations of various components in watsonx.data.

Syntax
./cpdctl wx-data component [options]

The component command supports the following commands:

Options Description
./cpdctl wx-data component get-mds-status Get configuration for Metadata Service (MDS) in watsonx.data instance.
./cpdctl wx-data component get-ces-status Get CES status in watsonx.data instance.
./cpdctl wx-data component get-cas-cpg-endpoint Get CPG and CAS endpoints in watsonx.data instance.
./cpdctl wx-data component get-hms-status List all HMS meta stores in watsonx.data.
./cpdctl wx-data component get-console-status Check console status of watsonx.data instance.

access-control

The access-control command is used for managing access policies for resources from watsonx.data version 2.2.2 and CPDCTL version 1.8.33.

Syntax
./cpdctl wx-data access-control [options]

The access-control command supports the following commands:

Options Description
./cpdctl wx-data access-control list-users-groups Get users and groups who have access to watsonx.data instance.
./cpdctl wx-data access-control list-access List resource access policies.
./cpdctl wx-data access-control update-access Update resource access policies.
./cpdctl wx-data access-control revoke-access Revoke resource access policies.
Note: revoke-access command will be supported only from watsonx.data 2.3.0 release