Installing Control Hub
You can install Control Hub on the same machine as the required databases or on a remote machine. For best performance, we recommend installing on a remote machine.
During the Control Hub installation process, you can optionally install and configure a system Data Collector. When installed, administrators can enable or disable the system Data Collector for use as the default authoring Data Collector in Control Hub.
After starting Control Hub, log in to the instance and create the required organizations.
Control Hub also includes a separate Admin tool to monitor and troubleshoot Control Hub issues. For example, if Control Hub becomes inaccessible, the Admin tool remains running. You can still log into the Admin tool to troubleshoot the Control Hub issues.
Step 1. Install the System Data Collector (Optional)
You can optionally install and configure a Data Collector that functions as the system Data Collector for Control Hub. For more information about how Control Hub uses the system Data Collector, see System Data Collector.
If you do not install the system Data Collector, then organization administrators must register Data Collectors for the organization before users can design pipelines and fragments.
Requirements
The system Data Collector must meet all of the following requirements:
- Version
- StreamSets recommends using the latest version of Data Collector.
The minimum supported Data Collector version is 3.0.0.0. To design pipeline fragments, the minimum supported Data Collector version is 3.2.0.0.
- Installation type
- Use any of the supported installation methods for the system Data Collector - including a tarball, RPM, Cloudera Manager, or Docker installation.
- Installation location
- In a development environment, you can install the system Data Collector on the same machine as the Control Hub instance as long as the machine has enough resources.
For best performance in a production environment, we recommend installing on a remote machine within the same internal network as the Control Hub instance. The Control Hub instance must be able to access the system Data Collector URL.
- Authentication
- Configure the system Data Collector to use the default file-based authentication and the form authentication type. By default, a new installation uses filed-based authentication with the form type. If you choose to use an existing Data Collector, verify that the http.authentication property is set to form in the $SDC_CONF/sdc.properties file.
Installing and Configuring the System Data Collector
Step 2. Set Up Time Synchronization
When you install the databases and Control Hub on separate machines, you must set up time synchronization using Network Time Protocol (NTP).
NTP synchronizes all participating machines to within a few milliseconds of Coordinated Universal Time (UTC). To use NTP, install and set up an NTP server as described in your operating system documentation.
If you do not set up time synchronization, Control Hub might stop processing tasks due to out of order timestamps among the machines.
Step 3. Install from the Tarball or RPM Package
You can install Control Hub from the tarball and start it manually. Or, you can install Control Hub from the tarball or RPM package and run it as a service.
Tarball for a Manual Start
You can install the Control Hub tarball and start it manually on all supported operating systems.
Install the tarball on a machine that meets the installation requirements. The tarball does not include Java; you must manually install Java on the machine.
When you start Control Hub manually from the command line, Control Hub runs as the system user account logged into the command prompt when you run the start command. You can alternatively impersonate another user account when you run the command.
Tarball for a Service Start for SysV Init Systems
You can install the Control Hub tarball and start it as a service for supported operating systems that use the SysV init system - including CentOS 6.x, Oracle Linux 6.x, Red Hat Enterprise Linux 6.x, or Ubuntu 14.04.
Install the tarball on a machine that meets the installation requirements. The tarball does not include Java; you must manually install Java on the machine.
Installing Control Hub as a service requires sudo privileges on the root directory.
Tarball for a Service Start for Systemd Init Systems
You can install the Control Hub tarball and start it as a service for supported operating systems that use the systemd init system - including CentOS 7.x, Oracle Linux 7.x - 8.x, or Red Hat Enterprise Linux 7.x - 8.x.
Install the tarball on a machine that meets the installation requirements. The tarball does not include Java; you must manually install Java on the machine.
Installing Control Hub as a service requires sudo privileges on the root directory.
RPM Package for a Service Start
You can install the Control Hub RPM package on CentOS, Oracle Linux, or Red Hat Enterprise Linux.
Install the RPM package on a machine that meets the installation requirements. Each RPM package includes a supported version of Java. The installation process automatically installs the selected Java version if it is not already installed on the machine.
When you install from the RPM package, Control Hub runs as a service using the default system user account and group named dpm. If a dpm user and a dpm group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.
Installing Control Hub as a service requires sudo privileges on the root directory.
Step 4. Download the JDBC Driver
Control Hub requires a JDBC driver to connect to the relational database.
Step 5. Set Environment Variables
Before you run the Control Hub installation scripts, you must set the DPM_HOME and DPM_CONF environment variables on the command line.
Step 6. Set Up Control Hub
- Install the dialog command line utility.For CentOS, Oracle Linux, or Red Hat Enterprise Linux, use the following command:
yum install dialog
For Ubuntu, use the following commands:apt-get update apt-get install dialog
- If using PuTTY as the SSH client to install Control Hub on a remote machine,
configure PuTTY to use linux as the terminal emulation mode.
By default, PuTTY uses xterm emulation which does not correctly display the dialog command line utility.
In the PuTTY Configuration dialog box, click Terminal-type string to linux.
and then set - Use the following command to run the Control Hub
setup script from the $DPM_HOME directory:
dev/setup.sh
When you run the script for the first time, configure all of the properties. If necessary, you can run the script again to change a few properties, navigating to the appropriate configuration group.
See the sections below for a description of each property.
Navigation Tips
The Control Hub setup script contains multiple configuration groups that you navigate through to configure the required properties. The initial dialog box displays the configuration groups:
Use the arrow keys, the numbers assigned to each section, and the OK, Cancel, and Back options to navigate through the dialog boxes. Type a number to jump to that section rather using arrow keys to cycle through each section.
In dialog boxes that offer a selection of two options, use the space bar to select another option. Let's look at the Mail Transport Protocol dialog box:
In this example, the SMTP protocol is currently set, as displayed by the asterisk (*) next to the option. To change to the SMTPS protocol, press the down arrow or the number 2 to highlight SMTPS. Then press the space bar to switch the selection - the screen displays the asterisk next to the SMTPS option. Then press Enter with the OK option highlighted to save your selection.
Control Hub Configuration
Control Hub Configuration Property | Description |
---|---|
Control Hub Base URL | URL to access Control Hub based on your installation type:
|
Admin Tool 'admin' Password | For the Control Hub Admin tool, enter a password for the default "admin" user account. To protect the password, store the password in an external location and then use a function to retrieve the password. |
Mail Transport Protocol | Protocol to use for the SMTP account used for emails. Use the
space bar to select SMTP or SMTPS - the asterisk (*) shows your
selection - and then press Enter with the OK option
highlighted. Default is SMTP.
Note: In a development
environment, you can choose not to use an SMTP server and
instead configure Control Hub to use the user ID for each user’s initial password.
|
Mail Server Host | Host name of the mail server. |
Mail Server Port | Port number of the mail server. |
Mail 'From' Address | Email address to use to send email. |
Mail Server Authentication | Whether the mail server host uses authentication. Use the space
bar to select enabled or disabled - the asterisk (*) shows your
selection - and then press Enter with the OK option
highlighted. Default is disabled. |
Mail Server Username | If the mail server host uses authentication, user name for the email account to send email. |
Mail Server Password | If the mail server host uses authentication, password for the email account. To protect the password, store the password in an external location and then use a function to retrieve the password. |
Relational Database Configuration
The Relational Database configuration group includes the connection details for the databases created for each application in MariaDB, MySQL, or PostgreSQL.
Select each application, and then enter the following database connection details:
Relational Database Property | Description |
---|---|
Driver Class | Name of the JDBC driver class used by the relational database.
|
JDBC Connection String | Connection string to use to connect to the database.
|
Username | User name for the JDBC connection. The user account must have all privileges on the database. |
Password | Password for the user account. To protect the password, store the password in an external location and then use a function to retrieve the password. |
System Data Collector Configuration
System Data Collector Property | Description |
---|---|
System Data Collector URL | URL of the system Data Collector. |
System Data Collector Username | Data Collector user account with the admin or creator role. The system Data Collector uses Data Collector authentication unlike registered Data Collectors, which use Control Hub authentication. Default is |
System Data Collector Password | Password for the Data Collector user account. To protect the password, store the password in an external location and then use a
function to retrieve the password. Default is |
Time Series Database Configuration
Time Series Database Property | Description |
---|---|
Metrics Database URL | Metrics database URL using the following
format:
For
example:
|
Metrics Database Name | Name of the Metrics database. For example, sch. |
Metrics Database Username | User name for the database. The user account must have all privileges on the database. |
Metrics Database Password | Password for the database user account. To protect the password, store the password in an external location and then use a function to retrieve the password. |
Application Metrics Database URL | Application Metrics database URL using the following
format:
For
example:
|
Application Metrics Database Name | Name of the Application Metrics database. For example, sch_app. |
Application Metrics Database Username | User name for the database. The user account must have all privileges on the database. |
Application Metrics Database Password | Password for the database user account. To protect the password, store the password in an external location and then use a function to retrieve the password. |
Step 7. Enable PostgreSQL for the Scheduler Application
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
Step 8. Build Schemas in the Relational Databases
Run the Control Hub database initialization script to create the required tables for each database in the relational database instance. Then, run a command to add the required indexes to the LATEST_METRICS table in the Time Series database.
Step 9. Generate Authentication Tokens for Applications
Run the security script to generate a unique authentication token for each Control Hub application.
Control Hub uses authentication tokens to authenticate each message or request sent by an application. The application includes the authentication token when it issues authenticated messages or requests to other applications.
dev/02-initsecurity.sh
Step 10. Activate the Control Hub License
Each Control Hub system requires an active license before you can start Control Hub.
Step 11. Start Control Hub
Start Control Hub from the command prompt, using the required command for your installation type.
Tarball Installation for a Manual Start
When you install Control Hub from the tarball for a manual start, you start Control Hub manually from the command line. Control Hub runs as the system user account logged into the command prompt when you run the start command. You can alternatively impersonate another user account when you run the command.
bin/streamsets dpm
nohup bin/streamsets dpm &
sudo -u <user> bin/streamsets dpm
Tarball or RPM Installation for a Service Start
When you run Control Hub as a service, Control Hub runs as the system user account and group defined in environment variables. The default system user and group are named dpm.
- For CentOS 6.x, Oracle Linux 6.x, Red Hat Enterprise Linux 6.x, or
Ubuntu 14.04,
use:
service dpm start
- For CentOS 7.x, Oracle Linux 7.x - 8.x, or Red Hat Enterprise Linux
7.x - 8.x,
use:
systemctl start dpm
Step 12. Log Into Control Hub
After launching Control Hub, log in to Control Hub using the default system administrator account.
Step 13. Create a Backup System Administrator
Log into Control Hub and create a backup system administrator for the system organization - in case you lose the password for the default system administrator.
Step 14. Create Organizations
An organization is a secure space provided to a set of Control Hub users. All Data Collectors, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to Control Hub as a member of an organization and can access data that belongs to that organization only.
When you create an organization, you create an organization administrator that can perform administrative tasks for that organization only.
Control Hub includes a system organization with an ID of admin that includes the default system administrator user account. The system administrator can complete tasks across all Control Hub organizations. We recommend creating at least one backup system administrator account, as described in the previous step.
You can add additional non-admin users to the system organization. However, as a best practice, create one or more organizations for your enterprise separate from the system organization.
For example, you might create a single organization named My Company for your enterprise where you add all user accounts. When you log in to Control Hub as the system administrator, you can see both the system and the My Company organizations:
When the organization administrator for the My Company organization logs in, the organization administrator can see the My Company organization only.
You can create multiple organizations for your enterprise. For example, you might create one organization for the Northern Office and another organization for the Southern Office. Users in the Northern Office organization cannot access any data that belongs to the Southern Office organization.
For more details about the system organization and creating multiple organizations, see Organizations.
Creating Organizations
Create organizations before creating additional user accounts or registering Data Collectors. When you create an organization, you also create an organization administrator that can perform administrative tasks for that organization only.