IBM StreamSets as Client-Managed Software
IBM StreamSets as client-managed software is managed by customers.
You must install IBM Software Hub on a supported cloud deployment environment and then install IBM StreamSets on IBM Software Hub.
After installation, you can administer IBM StreamSets on an ongoing basis.
Installing
- Verifying the
prerequisites
Verify that the prerequisites have been completed before beginning the installation.
- Installing IBM StreamSets on IBM Software Hub
Use the command line to install IBM StreamSets on IBM Software Hub.
- Completing post-installation
tasks
Complete the post-installation tasks before users can begin building streaming data pipelines.
Prerequisites
- IBM Software Hub version 5.1.0 is installed.
For more information about installing IBM Software Hub, see the IBM Software Hub documentation.
- The Red Hat® OpenShift® Container Platform cluster meets
the minimum requirements for installing IBM StreamSets.
For more information about system requirements, see the IBM Software Hub documentation.
- The Red Hat OpenShift Container Platform cluster includes a default storage
class.
For more information about designating a default storage class, see the Red Had OpenShift documentation.
- The Red Hat OpenShift Container Platform cluster includes namespaces for the IBM StreamSets operator and operands.Important: The name of the operand namespace cannot include a hyphen (-).
- The workstation from which you run the installation is set up as a client
workstation and includes the following command-line interfaces:
- OpenShift CLI,
oc
. - Kubernetes command-line tool,
kubectl
. The tool must be configured to access your cluster. - Istio command-line tool,
istioctl
. For more information about installing the tool, see the Istioctl documentation.
- OpenShift CLI,
- You have an environment variables script to use with installation
commands.The IBM StreamSets installation commands use the following environment variables so that you can run the commands exactly as written:
${OC_LOGIN}
is an alias for theoc login
command${PROJECT_CPD_INST_OPERATORS}
refers to the operators project${PROJECT_CPD_INST_OPERANDS}
refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
To use the environment variables from the script, you must source the environment variables before you run the installation commands. For example, run:source ./cpd_vars.sh
Installing IBM StreamSets on IBM Software Hub
After completing the prerequisite tasks, a Red Hat® OpenShift® Container Platform cluster administrator uses the command line to install IBM StreamSets on IBM Software Hub.
Post-installation Tasks
After a cluster administrator installs IBM StreamSets, an IBM Software Hub instance administrator, an IBM StreamSets system administrator, and an IBM StreamSets organization administrator work together to complete the post-installation tasks.
Administrators must complete the post-installation tasks before users can begin building data pipelines.
Verifying User Email Addresses
IBM StreamSets requires that each user account have an email address.
An IBM Software Hub instance administrator must verify that each user account that requires access to IBM StreamSets has an email address.
Granting Users Access to the Service
An IBM Software Hub instance administrator must grant users access to the IBM StreamSets service.
Grant access to the following types of IBM StreamSets users:
- System administrator
- The system administrator manages all organizations across IBM StreamSets. Grant this user account the Admin role.
- Organization administrators and users
- Organization administrators and users work within a single IBM StreamSets organization. Grant all organization administrators and users the User role.
Creating an Organization
The IBM StreamSets system administrator must create an organization before users can log in to IBM StreamSets.
An organization is a secure space provided to a set of IBM StreamSets users. All Data Collector engines, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets as a member of an organization and can access data that belongs to that organization only.
As the system administrator, you can create a single organization for all users. Or you can create multiple organizations for different groups of users.
Inviting Users to the Organization
The organization administrator must invite users to the organization.
Deploying a Data Collector Engine
Before users can begin building pipelines, the organization administrator must use IBM StreamSets Control Hub to deploy a Data Collector engine and then grant users access to the engine.
Data Collector is an engine that processes data. As an organization administrator, you deploy Data Collector engines to the location where data resides, which can be on-premises or on a protected cloud computing platform.
To get started with IBM StreamSets, create a self-managed deployment to deploy a Data Collector engine. When you create the deployment, share the deployment with all users invited to your organization, granting them full access to the deployment. When users build a pipeline, they select this deployed engine. For more information, see Self-Managed Deployments.
After getting started, you might consider using Kubernetes environments and deployments. With the Kubernetes integration, Control Hub automatically provisions the resources needed to run a Data Collector engine in your Kubernetes cluster, and then deploys engine instances to those resources.
Administering Organizations
As the system administrator for IBM StreamSets as client-managed software, you can complete full administrative tasks across all organizations.
An organization is a secure space provided to a set of users. All environments, deployments, pipelines, jobs, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets Control Hub as a member of an organization and can access data that belongs to that organization only.
When you create an organization, you create an organization administrator that can perform administrative tasks for that organization.
You can create a single organization for all users. Or you can create multiple organizations for different sets of users. For example, you might create one organization for the Northern Office and another organization for the Southern Office. Users in the Northern Office organization cannot access any data that belongs to the Southern Office organization. For more information, see Comparing Organizations and Groups.
Comparing Organizations and Groups
You can use both organizations and groups to create sets of users. However, there are important differences between the two:
- Organizations
- Only the system administrator can create organizations.
- Groups
- An organization administrator can create groups within the organization.
- To create a multitenant environment with organizations, the system administrator creates multiple organizations and then organization administrators add the appropriate users to each organization.
- To create a multitenant environment with multiple groups in a single organization, an organization administrator creates groups of users, and then shares objects within the groups to grant each group access to the appropriate objects.
For more information about using groups and permissions to create a multitenant environment, see Users and Groups.
Changing a Primary Organization Administrator
An organization can include multiple organization administrators, but only one primary organization administrator.
The IBM StreamSets system administrator configures the primary organization administrator when creating the organization.
The current primary organization administrator can change the primary administrator for the organization. However, as the system administrator, you can also change the primary administrator for any organization.
Activating or Deactivating an Organization
An organization must be active so that users can log in as members of that organization.
As the system administrator, you might temporarily deactivate an organization to disable access to IBM StreamSets.
Deleting an Organization
As the system administrator, you can delete an organization. Deleting an organization permanently removes the organization, including all objects created for the organization.
Configuring Global Organization Properties
As the system administrator, you can configure organization properties at a global level to affect all organizations or at an organization level to affect a specific organization.
Some properties can be overridden by the organization administrator for each organization.
Increasing Default System Limits
Control Hub sets default system limits on the number of objects that can exist in each organization. The limits protect the system from run-away scripts or unintended automation usage.
These limits are sufficient for most organizations. However, as the system administrator, you can increase the limits globally for all organizations or for a specific organization.
For more information about the default values, see Organization Default System Limits.
Uninstalling
An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to uninstall IBM StreamSets from an instance of IBM Software Hub.
Deleting the Service Instance
An instance administrator can delete the service instance associated with IBM StreamSets.
Delete the service instance to ensure that the instance releases the resources that it reserved.
- Log in to IBM Software Hub.
- From the navigation menu, select .
- Locate the streamsets instance.
- From the action menu, select Delete.
Uninstalling the Service
A Red Hat OpenShift Container Platform cluster administrator can uninstall the IBM StreamSets service.
${OC_LOGIN}
is an alias for theoc login
command${PROJECT_CPD_INST_OPERATORS}
refers to the operators project${PROJECT_CPD_INST_OPERANDS}
refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
source ./cpd_vars.sh