StreamSets Resource Configuration

Before you configure your scanner, make sure you meet the prerequisites. Read our guide on StreamSets integration requirements to double-check.

Source System Properties

This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / Streamsets / specific connection. New connection can also be created via Manta Orchestration API.

One IBM Automatic Data Lineage connection for StreamSets corresponds to one StreamSets server that will be analyzed.

Property name	Description	Example
streamsets.extractor.server	Custom name used to identify this StreamSets connection in Automatic Data Lineage	template
streamsets.extractor.service	Type of StreamSets service used for the extraction; values to pick from are: “Data Collector”, “Control Hub Cloud”, and “Control Hub On-Premises” If not set, the default value is “Data Collector”	Data Collector
streamsets.extractor.address	Address of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises	192.168.0.16 prod.getmanta.com
streamsets.extractor.port	Port of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises	80 443
streamsets.extractor.scheme	Scheme of the server used for the actual connection to the StreamSets repository; only considered for the extraction from Data Collector or Control Hub On-Premises	http
streamsets.extractor.user	Name of the user used for the connection to SDC or SCH	guest
streamsets.extractor.password	Password of the user used for the connection to SDC or SCH	guest
streamsets.extractor.use.pipeline.extraction	Whether pipelines should be extracted as a part of the extraction process; if true, the following four properties should be set	true
streamsets.extractor.include.pipelines	Comma-separated list of pipeline IDs to include in the extraction Note that if both the `streamsets.extractor.include.pipelines` and `streamsets.extractor.include.labels` properties are left empty, all pipelines will be extracted (except those that have been excluded)	pipelineId01,pipelineId02,pipelineId03
streamsets.extractor.include.labels	Comma-separated list of pipeline labels to include in the extraction Note that if both the `streamsets.extractor.include.pipelines` and `streamsets.extractor.include.labels` properties are left empty, all pipelines will be extracted (except those that have been excluded)	label01,label02,label03
streamsets.extractor.exclude.pipelines	Comma-separated list of pipeline IDs to exclude from the extraction	pipelineId01,pipelineId04
streamsets.extractor.exclude.labels	Comma-separated list of pipeline labels to exclude from the extraction	label01,label04
streamsets.extractor.use.job.extraction	Whether jobs should be extracted as part of the extraction process; job extraction is only supported for SCH; if true, the following four properties should be set	true
streamsets.extractor.include.jobs	Comma-separated list of job IDs to include in the extraction Note that if both the `streamsets.extractor.include.jobs` and `streamsets.extractor.include.job.tags` properties are left empty, all jobs will be extracted (except those that have been excluded)	jobId01,jobId02,jobId03
streamsets.extractor.include.job.tags	Comma-separated list of job tags to include in the extraction Note that if both the `streamsets.extractor.include.jobs` and `streamsets.extractor.include.job.tags` properties are left empty, all jobs will be extracted (except those that have been excluded)	tag01,tag02,tag03
streamsets.extractor.exclude.jobs	Comma-separated list of job IDs to exclude from the extraction	jobId01,jobId04
streamsets.extractor.exclude.job.tags	Comma-separated list of job tags to exclude from the extraction	tag01,tag04
streamsets.input.encoding	Encoding of extracted pipelines. See Encodings for applicable values.	UTF-8
streamsets.extractor.verifyHostname	When using HTTPS, whether the hostname of the server's certificate should be validated to match the hostname of the server	true

Common Scanner Properties

This configuration is common for all Streamsets source systems and for all Streamsets scenarios, and is configure in Admin UI > Configuration > CLI > Streamsets> Streamsets Common. It can be overridden on individual connection level.

Property name	Description	Example
streamsets.input.dir	Directory with pipelines extracted from the StreamSets server	${manta.dir.temp}/streamsets/${streamsets.extractor.server}
streamsets.runtime.values.dir	Directory with manually provided runtime values that are used in SDC pipelines; the properties file and TXT files used in the pipelines should be stored here	${manta.dir.input}/streamsets/${streamsets.extractor.server}
filepath.lowercase	Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise)	false true
streamsets.data.collectors.settings.file	Path to the automatically generated file with Data Collector settings	${manta.dir.temp}/streamsets/${streamsets.extractor.server}/dataCollectors.csv
streamsets.data.collectors.manual.settings.file	Path to the optional file with manual Data Collector settings	${manta.dir.input}/streamsets/${streamsets.extractor.server}/dataCollectors.csv
streamsets.data.collectors.manual.settings.encoding	Encoding of the manual Data Collector settings file. See Encodings for applicable values.	UTF-8
streamsets.extractor.itemsPerRequest	Number of items (pipelines/jobs) extracted per HTTP request	100
streamsets.extraction.method	Set to Agent:default when the desired extraction method is the default Manta Extractor Agent, set to Agent:{remote_agent_name} when a remote Agent is the desired extraction method, set to Git:{git.dictionary.id} when the Git ingest method is the desired extraction method. For more information on setting up a remote extractor Agent please refer to the Manta Flow Agent Configuration for Extraction documentation. For additional details on configuring a Git ingest method, please refer to the Manta Flow Agent Configuration for Extraction:Git Source documentation.	default Git agent