Architecture
StreamSets Control Hub architecture includes applications that manage user requests and databases that store application metadata and time series data.
Control Hub applications manage different user requests, such as designing and publishing a pipeline, starting a job, or measuring topology performance. The applications are independent and isolated from each other. They use REST APIs to communicate and cannot access the database data owned by another application. The applications are internal to Control Hub - when you install Control Hub, all of the applications are installed with Control Hub.
Control Hub uses authentication tokens to authenticate each message or request sent by an application. When you install Control Hub, you generate a unique authentication token for each application. The application includes the authentication token when it issues authenticated messages or requests to other applications.
Control Hub applications store data in relational and time series databases. Before you install Control Hub, you must set up the required database instances.
The following image displays the architecture of Control Hub:
Applications
Application | Description |
---|---|
Classification | Not used at this time. |
Connection | Manages connection creation. Manages execution engine requests for connection properties for running pipelines that use connections. |
Dynamic Preview | Supports dynamically generated pipelines. |
Job Runner | Manages job creation and requests to start, stop, and synchronize jobs. |
Messaging | Manages the messages sent between Control Hub, registered Data Collectors, registered Transformers, and Provisioning Agents. |
Notification | Manages and triggers subscriptions that listen for Control Hub events. Manages the display and acknowledgement of data SLA alerts configured for topologies in Control Hub. Also manages the display of all metric, data, and data drift alerts configured for pipelines. |
Pipeline Store | Manages pipeline designing, publishing, versioning, and history. |
Policy | Not used at this time. |
Provisioning | Manages the automatic provisioning of Data Collectors to a container orchestration framework, such as Kubernetes. Provisioning includes deploying, registering, starting, stopping, and scaling Data Collector containers to work with Control Hub. |
Reporting | Manages data delivery reports - including creating report definitions, generating reports, and displaying generated reports. |
Scheduler | Manages the scheduling of jobs and data delivery reports. |
Security | Ensures the integrity of your data by managing the following
components:
|
SLA | Manages the creation and triggering of data SLA alerts for topologies. |
Time Series | Reads and stores statistics collected by registered execution engines running remote pipeline instances. Displays the statistics as metrics when you monitor jobs and topologies. |
Topology | Manages the creation and history of topologies. |
Databases
- Relational
- Control Hub applications store metadata in a relational database instance. Control Hub currently supports MariaDB, MySQL, or PostgreSQL for the relational database instance.Important: Each application requires a unique database in the relational database instance. Create a database for each application, even if you do not plan to use the functionality offered by that application.
- Time Series
- Control Hub applications store metric data in a time series database instance. Control Hub currently supports InfluxDB for the time series database instance.