Getting started with IBM StreamSets
Use IBM StreamSets to build, run, and monitor streaming data pipelines. A streaming data pipeline runs continuously to read, process, and write data as soon as the data becomes available. With streaming pipelines, you can act on time-sensitive data, rather than waiting to process data on an intermittent or scheduled basis.
- Access data from multiple types of external data sources that are located in the cloud or on premises.
- Detect and correct unexpected data drift.
- Collaboratively build pipelines as a team.
- Design reusable fragments to add the same processing logic to multiple pipelines.
Checking whether the service is installed
An administrator must install IBM StreamSets.
To check whether the service is installed:
- From the navigation menu, select .
- Search for IBM StreamSets.
If the service is installed and ready to use, the tile in the catalog shows Ready to use.
If the service is installed but no service instances have been created, the tile in the catalog shows Ready to provision.
Accessing the service
Pop-out IBM StreamSets is a pop-out service. You can access the service from the page.
Checking whether an engine is deployed
Data Collector is an engine that processes data. An IBM StreamSets organization administrator must deploy a Data Collector engine and grant you access to the engine before you can begin building a streaming pipeline.
To check whether you have access to a deployed Data Collector engine:
- Open IBM StreamSets from the page.
- From the IBM
StreamSets navigation menu, select .
If you have access to a deployed Data Collector engine, the engine URL is listed.
Building and running a streaming pipeline
To build a pipeline, you add origins, processors, and destinations to the graphical pipeline canvas.
- Create a pipeline.
- Open IBM StreamSets, and then click .
- Select the Data Collector engine that your organization administrator has deployed, and then click Next.
- Choose the source.
- In the pipeline canvas, click Add Origin and select the external system that you want to read from.
- Configure the origin properties.
For more information, see Origins in the IBM StreamSets documentation.
- Specify how to transform the data.
- Click and select a processor. For example, you might want to mask sensitive data, remove unnecessary fields, or perform calculations on data.
- Configure the processor properties.
For more information, see Processors in the IBM StreamSets documentation.
- Add more processors to transform the data in other ways.
- Choose the target.
- Click and select the external system that you want to write to.
- Configure the destination properties.
For more information, see Destinations in the IBM StreamSets documentation.
- Run the pipeline.
Click
. As the pipeline runs, you can view statistics and error information about the data as it flows from origin to destination systems.
Learn more
To learn more about IBM StreamSets, see the following topics in the IBM StreamSets documentation: