Planning a pipeline
Review these considerations as you plan for how you will connect to resources, add assets, and manage resources to your pipeline.
Accessing the components in your pipeline
When you use a pipeline to automate a flow, you must have access to all of the elements in the pipeline. Make sure that you create and run pipelines with the proper access to all assets, projects, and spaces used in the pipeline. Collaborators who run the pipeline must also be able to access the pipeline components.
Managing pipeline credentials
To run a job, the pipeline must have access to Cloud Pak for Data credentials. Typically, a pipeline uses your personal API key to execute long-running operations in the pipeline without disruption. If credentials are not available when you create the job, you are prompted to supply an API key or create a new one.
To generate an API key from yourIBM Cloud Pak for Data user account:
- Go into your user profile.
- Click API keys > Generate new token.
- Create or select an API key for your user account.
Adding assets to a pipeline
When you create a pipeline, you add assets, such as data, notebooks, deployment jobs, or Data Refinery jobs to the pipeline to orchestrate a sequential process. The strongly recommended method for adding assets to a pipeline is to collect the assets in the project containing the pipeline and use the asset browser to select project assets for the pipeline.
Connection with DataStage
Creating a storage volume where DataStage resides is recommended for separation of duties (SOD) clusters. Otherwise, Bash scripts will not run in Pipelines. See Storage and data access for more details.
Manage resources by setting memory limits
Set your Cloud Pak for Data instance's memory size limit of Redis to avoid memory overconsumption. The recommended memory size is a multiple of the maximum parallel runs and user variable size limit. For example, if you accommodate 1000 parallel pipelines and your user variable size limit is 256Ki, consider setting your memory limit to 256Mi.
- Check your Redis memory limit with the following command:
$ oc get StatefulSet redis-ha-server -o yaml
- Set your user variable limit.
Update default runtime type
You can update the default runtime type for nodes by updating your ConfigMap.
Open ConfigMap watson-pipelines-config
. and update the value default_runtime_type
with:
shared
defaults nodes to use shared runtimes.standalone
defaults nodes to use standalone runtimes.
An example is as follows:
oc -n cpd-instance get cm watson-pipelines-config -o yaml
apiVersion: v1
data:
default_runtime_type: shared
shutdown: "false"
user_variables_size_limit: 64Ki
kind: ConfigMap
Updates to the ConfigMap affects new nodes only. Existing nodes are unaffected.
Parent topic: Getting started with Orchestration Pipelines