Shell
The Shell executor executes a shell script every time it receives an event. Use the Shell executor as part of an event stream.
When you configure the executor, you define the shell script that you want to execute and environment variables to propagate configuration for the script. You also specify the maximum amount of time for the shell script to run. After the specified time elapses, the executor stops the script.
For more information about dataflow triggers and the event framework, see Dataflow triggers overview.
Data Collector shell impersonation mode
Enable the Data Collector shell impersonation mode to enable the secure use of shell scripts. You enable the impersonation mode by configuring the shell impersonation mode property in the Data Collector configuration properties. Enabling the impersonation mode is not required, but strongly recommended. You can also configure related shell and sudo properties as needed.
The Shell executor runs a user-defined shell script each time the stage receives an event. By default, Data Collector executes the script as the operating system user who starts Data Collector. Thus, using the default configuration means that the shell script can stop Data Collector as well as any other tasks the user has the rights to perform.
When you enable shell impersonation mode, the scripts are executed by the user who starts the flow. To use this option, the Data Collector user who starts the flow must have a corresponding operating system user account, and sudo must be configured to allow passwordless use. For greater security, you can also limit the permissions for the operating system user account to restrict its access.
- For each user who starts Shell executor flows, create a matching user
account in the operating system and configure permissions as needed.
For example, if Data Collector users Ops1 and Ops2 start all flows, create Ops1 and Ops2 user accounts in the operating system and grant them limited permissions.
- Ensure that the each of the operating system users has passwordless sudo for Data Collector.
- On the Manage tab of your project, edit the
StreamSets environment. Open the
Advanced Configuration dialog box, then add the following Data Collector property as a key and value:
stage.conf_com.streamsets.pipeline.stage.executor.shell.impersonation_mode=CURRENT_USER - Save the changes to the StreamSets environment and restart all engine instances.
Script configuration
- When you configure the shell script, ensure that the script returns zero (0) to indicate
successful execution. For example, in a bash script, you can use "exit 0" to return the required
zero value.
A script that does not return zero might run successfully when tested on the command line, but will generate errors when used by the Shell executor in a flow.
- You cannot use expressions directly in the shell script. To use Data Collector expressions in a
script:
- Use the Environment Variables property in the stage to declare environment variables for the script. Create an environment variable for each expression that you want to use.
- Use the environment variables as needed in the script.
For example, say you want to perform an action on a file that was closed by a Local FS target. And you want to use the
filepathfield in the event record to specify the absolute path to the closed file.You can define a filepath environment variable using the following expression:
${record:value('/filepath')}as shown below, then use the filepath environment variable in the script: