IBM StreamSets as Client-Managed Software
IBM StreamSets as client-managed software is managed by customers.
You install IBM Software Hub on a supported cloud deployment environment and then install IBM StreamSets on IBM Software Hub.
After installation, you can administer IBM StreamSets on an ongoing basis.
Installing
An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to install IBM StreamSets on IBM Software Hub.
5.1.1 and later Install IBM StreamSets in the same way that you install any other service. For more information, see the IBM Software Hub documentation.
Prerequisites
- IBM Software Hub
version 5.1.0 is installed.
For more information about installing IBM Software Hub, see the IBM Software Hub documentation.
- The Red Hat® OpenShift® Container Platform cluster meets the minimum
requirements for installing IBM StreamSets.
For more information about system requirements, see the IBM Software Hub documentation.
- The Red Hat OpenShift Container Platform cluster includes a default storage class.
For more information about designating a default storage class, see the Red Had OpenShift documentation.
- The Red Hat OpenShift Container Platform cluster includes namespaces for the IBM StreamSets operator and operands.Important: For version 5.1.0, the name of the operand namespace cannot include a hyphen (-).
- The workstation from which you run the installation is set up as a client workstation and
includes the following command-line interfaces:
- Red Hat OpenShift CLI,
oc. - Kubernetes command-line tool,
kubectl. The tool must be configured to access your cluster. - Istio command-line tool,
istioctl. For more information about installing the tool, see the Istioctl documentation.
- Red Hat OpenShift CLI,
- You have an environment variables script to use with the installation commands.The IBM StreamSets installation commands use the following environment variables so that you can run the commands exactly as written:
${OC_LOGIN}is an alias for theoc logincommand${PROJECT_CPD_INST_OPERATORS}refers to the operators project${PROJECT_CPD_INST_OPERANDS}refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
To use the environment variables from the script, you must source the environment variables before you run the installation commands. For example, run:source ./cpd_vars.sh
Installing IBM StreamSets on IBM Software Hub
A Red Hat OpenShift Container Platform cluster administrator uses the command line to install IBM StreamSets on IBM Software Hub version 5.1.0.
Post-installation Tasks
After a cluster administrator installs IBM StreamSets, other administrators work together to complete the post-installation tasks.
Administrators must complete the post-installation tasks before users can begin building data pipelines. Complete the same post-installation tasks for all supported versions of IBM Software Hub.
- Creating an organization
- Enabling the Transformer engine (requires the IBM StreamSets Cartridge for Apache Spark Add On license)
Verifying User Email Addresses
IBM StreamSets requires an email address for each user account.
An IBM Software Hub instance administrator must verify that each user account that requires access to IBM StreamSets has an email address.
Granting Users Access to the Service
An IBM Software Hub instance administrator must grant users access to the IBM StreamSets service.
Grant access to the following types of IBM StreamSets users:
- System administrator
- The system administrator manages all organizations across IBM StreamSets. Grant this user account the Admin role.
- Organization administrators and users
- Organization administrators and users work within a single IBM StreamSets organization. Grant all organization administrators and users the User role.
Creating an Organization
The IBM StreamSets system administrator must create an organization before users can log in to IBM StreamSets.
An organization is a secure space that is provided to a set of IBM StreamSets users. All Data Collector engines, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets as a member of an organization and can access data that belongs to that organization only.
As the system administrator, you can create a single organization for all users. Or you can create multiple organizations for different groups of users.
Enabling the Transformer Engine
If you have the IBM StreamSets Cartridge for Apache Spark Add On license, the IBM StreamSets system administrator must enable the Transformer engine and accept the license agreement.
After the Transformer engine is enabled, an organization administrator deploys the engine so that users can run pipelines on Apache Spark.
Inviting Users to the Organization
The organization administrator must invite users to the organization.
Deploying an Engine
Before users can begin building pipelines, the organization administrator must use IBM StreamSets Control Hub to deploy a Data Collector engine, and then grant users access to the engine.
Engines process data. As an organization administrator, you deploy engines to the location where data resides, which can be on-premises or on a protected cloud computing platform.
To get started with IBM StreamSets, create a self-managed deployment to deploy an engine. When you create the deployment, share the deployment with all users invited to your organization, granting them full access to the deployment. When users build a pipeline, they select this deployed engine. For more information, see Self-Managed Deployments.
After you get started, you might consider that use Kubernetes environments and deployments. With the Kubernetes integration, Control Hub automatically provisions the resources that are needed to run an engine in your Kubernetes cluster, and then deploys engine instances to those resources.
Upgrading
An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to upgrade IBM StreamSets on IBM Software Hub.
5.1.1 and later To upgrade IBM StreamSets from version 5.1.1 and later, use the same upgrade process that you use for any other service. For more information, see the IBM Software Hub documentation.
Prerequisites
Before you upgrade IBM StreamSets from version 5.1.0, complete the following prerequisites:
- The workstation from which you run the upgrade is set up as a client workstation and includes
the following command-line interfaces:
- Red Hat OpenShift CLI,
oc. - Kubernetes command-line tool,
kubectl. The tool must be configured to access your cluster. - Helm command-line tool. For more information about installing Helm, see the Helm documentation.
- Istio command-line tool,
istioctl. For more information about installing the tool, see the Istioctl documentation.
- Red Hat OpenShift CLI,
- The IBM Software Hub
control plane is upgraded.
For more information about upgrading IBM Software Hub, see the IBM Software Hub documentation.
- You have an environment variables script to use with the upgrade commands.The IBM StreamSets upgrade commands use the following environment variables so that you can run the commands exactly as written:
${OC_LOGIN}is an alias for theoc logincommand${PROJECT_CPD_INST_OPERATORS}refers to the operators project${PROJECT_CPD_INST_OPERANDS}refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
To use the environment variables from the script, you must source the environment variables before you run the upgrade commands. For example, run:source ./cpd_vars.sh
Upgrading IBM StreamSets from Version 5.1.0 to 5.1.1
A Red Hat OpenShift Container Platform cluster administrator uses the command line to upgrade IBM StreamSets from version 5.1.0 to version 5.1.1.
Upgrading IBM StreamSets from Version 5.1.1 to a Later 5.1 Refresh
After a Red Hat OpenShift Container Platform cluster administrator upgrades IBM StreamSets from version 5.1.0 to version 5.1.1, an instance administrator can upgrade IBM StreamSets from version 5.1.1 to a later 5.1 refresh in the same way that any other service is upgraded.
For more information, see the IBM Software Hub documentation.
Administering Organizations
As the system administrator for IBM StreamSets as client-managed software, you can complete full administrative tasks across all organizations.
An organization is a secure space that is provided to a set of users. All environments, deployments, pipelines, jobs, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets Control Hub as a member of an organization and can access data that belongs to that organization only.
When you create an organization, you create an organization administrator that can complete administrative tasks for that organization.
You can create a single organization for all users. Or you can create multiple organizations for different sets of users. For example, you might create one organization for the Northern Office and another organization for the Southern Office. Users in the Northern Office organization cannot access any data that belongs to the Southern Office organization. For more information, see Comparing Organizations and Groups.
Comparing Organizations and Groups
You can use both organizations and groups to create sets of users. However, be aware of the following differences between organizations and groups:
- Organizations
- Only the system administrator can create organizations.
- Groups
- An organization administrator can create groups within the organization.
- To create a multitenant environment with organizations, the system administrator creates multiple organizations and then organization administrators add the appropriate users to each organization.
- To create a multitenant environment with multiple groups in a single organization, an organization administrator creates groups of users. The organization administrator shares objects within the groups to grant each group access to the appropriate objects.
For more information about using groups and permissions to create a multitenant environment, see Users and Groups.
Changing a Primary Organization Administrator
An organization can include multiple organization administrators, but only one primary organization administrator.
The IBM StreamSets system administrator configures the primary organization administrator when the organization is created.
The current primary organization administrator can change the primary administrator for the organization. However, as the system administrator, you can also change the primary administrator for any organization.
Activating or Deactivating an Organization
An organization must be active so that users can log in as members of that organization.
As the system administrator, you might temporarily deactivate an organization to disable access to IBM StreamSets.
Deleting an Organization
As the system administrator, you can delete an organization. Deleting an organization permanently removes the organization, including all objects created for the organization.
Configuring Global Organization Properties
As the system administrator, you can configure organization properties at a global level to affect all organizations or at an organization level to affect a specific organization.
The organization administrator can override some properties for each organization.
Increasing Default System Limits
Control Hub sets default system limits on the number of objects that can exist in each organization. The limits protect the system from run-away scripts or unintended automation usage.
These limits are sufficient for most organizations. However, as the system administrator, you can increase the limits globally for all organizations or for a specific organization.
For more information about the default values, see Organization Default System Limits.
Uninstalling
An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to uninstall IBM StreamSets from an instance of IBM Software Hub.
5.1.3 and later Uninstall IBM StreamSets in the same way that you uninstall any other service. For more information, see the IBM Software Hub documentation.
Prerequisites
Before you uninstall IBM StreamSets from IBM Software Hub version 5.1.0 to 5.1.2, complete the following prerequisites:
- The workstation from which you run the uninstallation is set up as a client workstation and
includes the following command-line interfaces:
- Red Hat OpenShift CLI
oc. - Helm command-line tool. For more information about installing Helm, see the Helm documentation.
- Red Hat OpenShift CLI
- You have an environment variables script to use with the uninstallation commands.The IBM StreamSets uninstallation commands use the following environment variables so that you can run the commands exactly as written:
${OC_LOGIN}is an alias for theoc logincommand${PROJECT_CPD_INST_OPERATORS}refers to the operators project${PROJECT_CPD_INST_OPERANDS}refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
To use the environment variables from the script, you must source the environment variables before you run the uninstallation commands. For example, run:source ./cpd_vars.sh
Deleting the Service Instance
An instance administrator can delete the service instance that is associated with IBM StreamSets from IBM Software Hub version 5.1.0 to 5.1.2.
Delete the service instance to ensure that the instance releases the resources that it reserved.
- Log in to IBM Software Hub.
- From the navigation menu, select .
- Locate the streamsets instance.
- From the action menu, select Delete.
Uninstalling the Service
A Red Hat OpenShift Container Platform cluster administrator can uninstall the IBM StreamSets service from IBM Software Hub version 5.1.0 to 5.1.2.