Storage and data access for IBM Orchestration Pipelines

Learn where files and data are stored outside of IBM Orchestration Pipelines and share it between DataStage and Pipelines.

Getting started with storage volume

A storage volume is used as file storage for projects and deployment spaces. Specify the path to the file.

Specifying the pipeline scope

By default, the scope for a pipeline is the project that contains the pipeline. You can explicitly specify a scope other than the default, to locate an asset used in the pipeline. The scope is the project, catalog, or space that contains the asset.

From the pipeline canvas, you can browse for the scope.

In a notebook, specify the scope as part of the path to an asset, as follows:

[cpd://]/(projects|spaces|catalogs)/<scope-id>/<resource-type>/<resource-ID>

Where:

  • cpd:// is the URL for the cluster or server where you access Cloud Pak for Data.
  • scope id is the ID of the project, space, or catalog that contains the asset.
  • resource-type is the name of the resource type that you are using. For example, if you are using models as a resource, enter models.
  • resource-id is the ID of the resource. To find your resource ID, open your resource in the project and open the information page.

Sharing files between DataStage and Pipelines

Pipelines can read the files that are extracted from DataStage, or archive them, or run an SCP to move files to another place.

Before you begin

  • The persistent volume claims (PVC) that you mount DataStage from must have a large storage size (1-10 TB).

    Note:

    If you do not have a PVC, you can ask your admin to create one. See Managing storage volumes for more details.

  • Expendable storage size is recommended.

  • Storage must be isolated.

    • If you mount a persistent volume (PV) on a Network File System (NSF), you can use a common persistent volume with different supplemental groups.
    • Otherwise, use multiple DataStage parallel engines (PX runtime) instances that are mounted to separate PVs. To learn more about DataStage environments, see DataStage environments.

Access PVCs

DataStage can provide a REST endpoint that returns PVCs from the PX runtime instances that the user has access to.

Pipelines mounts the file systems automatically through a data connection. A Cloud Pak for Data system administrator must set up the storage volume and ensure that the user can run the pipeline that has access to the required PVCs.

Mounting and connecting to the PVC

Here are the required steps for mounting and connecting to a PVC:

Step 1: Mount the PVC (admin)

The Cloud Pak for Data system administrator must complete these steps to mount the storage volume on the same cluster where DataStage is installed.

  1. Open the storage volume to view the PVC name and mount path.
  2. Log in to the DataStage cluster.
  3. Edit the pxruntime file to add the mount path and PVC name for the additional_storage field.
  4. Restart the pods with the following command: oc edit pxruntimes.ds.cpd.ibm.com ds-px-defaul
  5. Add an additional_storage section under specs. For example: spec: additional_storage: - mount_path: /mnts/ds-data pvc_name: volumes-ds-data-pvc
  6. Check to make sure that the pods restarted: oc get pods |grep px-default
  7. Add the mount_path: oc get pods ds-px-default-ibm-datastage-px-compute-1 -o yaml|grep ds-data

Step 2: Connect to the PVC (user)

When the storage volume is available, follow these steps to connect to the volume and share assets with DataStage:

  1. To create a storage volume connection in your project by clicking New asset > Connect to a data source > Storage volume. See Storage volume connection.
  2. To access a file from DataStage, you can use various nodes in Pipelines:
    • Copy asset: Click Source assets, select the storage volume and the file.
    • Import asset: Click Archive file to import, select the storage volume and the file.
    • Send email: Attach the file to an email. Click Attachments, select the storage volume and the file.
    • Create data asset: Click File, select the storage volume and the file.
    • Wait for file: Click File location, select the storage volume and the file.
Tip: To all the files present in the storage volume connection, you can use a bash script with the following command: `ls -al  /mount path`