StreamSets Resource Configuration
Before you configure your scanner, make sure you meet the prerequisites. Read our guide on StreamSets integration requirements to double-check.
Source System Properties
This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / Streamsets / specific connection. New connection can also be created via Manta Orchestration API.
One IBM Automatic Data Lineage connection for StreamSets corresponds to one StreamSets server that will be analyzed.
|
Property name |
Description |
Example |
|---|---|---|
|
streamsets.extractor.server |
Custom name used to identify this StreamSets connection in Automatic Data Lineage |
template |
|
streamsets.extractor.service |
Type of StreamSets service used for the extraction; values to pick from are:
If not set, the default value is “Data Collector” |
Data Collector |
|
streamsets.extractor.address |
Address of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises |
192.168.0.16 |
|
streamsets.extractor.port |
Port of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises |
80 |
|
streamsets.extractor.scheme |
Scheme of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises |
http |
|
streamsets.extractor.user |
Name of the user used for the connection to SDC or SCH |
guest |
|
streamsets.extractor.password |
Password of the user used for the connection to SDC or SCH |
guest |
|
streamsets.extractor.use.pipeline.extraction |
Whether pipelines should be extracted as a part of the extraction process; if true, the following four properties should be set |
true |
|
streamsets.extractor.include.pipelines |
Comma-separated list of pipeline IDs to include in the extraction Note that if both the
|
pipelineId01,pipelineId02,pipelineId03 |
|
streamsets.extractor.include.labels |
Comma-separated list of pipeline labels to include in the extraction Note that if both the
|
label01,label02,label03 |
|
streamsets.extractor.exclude.pipelines |
Comma-separated list of pipeline IDs to exclude from the extraction |
pipelineId01,pipelineId04 |
|
streamsets.extractor.exclude.labels |
Comma-separated list of pipeline labels to exclude from the extraction |
label01,label04 |
|
streamsets.extractor.use.job.extraction |
Whether jobs should be extracted as part of the extraction process; job extraction is only supported for SCH; if true, the following four properties should be set |
true |
|
streamsets.extractor.include.jobs |
Comma-separated list of job IDs to include in the extraction Note that if both the
|
jobId01,jobId02,jobId03 |
|
streamsets.extractor.include.job.tags |
Comma-separated list of job tags to include in the extraction Note that if both the |
tag01,tag02,tag03 |
|
streamsets.extractor.exclude.jobs |
Comma-separated list of job IDs to exclude from the extraction |
jobId01,jobId04 |
|
streamsets.extractor.exclude.job.tags |
Comma-separated list of job tags to exclude from the extraction |
tag01,tag04 |
|
streamsets.input.encoding |
Encoding of extracted pipelines. See Encodings for applicable values. |
UTF-8 |
|
streamsets.extractor.verifyHostname |
When using HTTPS, whether the hostname of the server's certificate should be validated to match the hostname of the server |
true |
Common Scanner Properties
This configuration is common for all Streamsets source systems and for all Streamsets scenarios, and is configure in Admin UI > Configuration > CLI > Streamsets> Streamsets Common. It can be overridden on individual connection level.
|
Property name |
Description |
Example |
|---|---|---|
|
streamsets.input.dir |
Directory with pipelines extracted from the StreamSets server |
${manta.dir.temp}/streamsets/${streamsets.extractor.server} |
|
streamsets.runtime.values.dir |
Directory with manually provided runtime values that are used in SDC pipelines; the properties file and TXT files used in the pipelines should be stored here |
${manta.dir.input}/streamsets/${streamsets.extractor.server} |
|
filepath.lowercase |
Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise) |
false |
|
streamsets.data.collectors.settings.file |
Path to the automatically generated file with Data Collector settings |
${manta.dir.temp}/streamsets/${streamsets.extractor.server}/dataCollectors.csv |
|
streamsets.data.collectors.manual.settings.file |
Path to the optional file with manual Data Collector settings |
${manta.dir.input}/streamsets/${streamsets.extractor.server}/dataCollectors.csv |
|
streamsets.data.collectors.manual.settings.encoding |
Encoding of the manual Data Collector settings file. See Encodings for applicable values. |
UTF-8 |
|
streamsets.extractor.itemsPerRequest |
Number of items (pipelines/jobs) extracted per HTTP request |
100 |
|
streamsets.extraction.method |
Set to Agent:default when the desired extraction method is the default Manta Extractor Agent, set to Agent:{remote_agent_name} when a remote Agent is the desired extraction method, set to Git:{git.dictionary.id} when the Git ingest method is the desired extraction method. For more information on setting up a remote extractor Agent please refer to the Manta Flow Agent Configuration for Extraction documentation. For additional details on configuring a Git ingest method, please refer to the Manta Flow Agent Configuration for Extraction:Git Source documentation. |
default Git agent |