Try Transformer for Snowflake
This tutorial covers the steps needed to try Transformer for Snowflake. You will learn how to work with the Control Hub user interface, build a basic Transformer for Snowflake pipeline, preview pipeline activity, and run a job.
Although the tutorial provides a simple use case, keep in mind that IBM StreamSets is a powerful platform that enables you to build and run large numbers of complex pipelines.
To complete this tutorial, you must have an existing IBM StreamSets account. If you do not have one, use the following URL to sign up for a free trial:
https://cloud.login.streamsets.com/signup
When you sign up, your user account receives all of the roles required to complete the tasks in this tutorial. If you are invited to join an existing organization, your user account requires the Pipeline Editor and Job Operator roles to complete tutorial tasks.
Complete Prerequisite Tasks
- Verify Snowflake requirement for network policies
- If you are just starting to use Transformer for Snowflake, and your Snowflake account uses network policies, complete the Snowflake requirement.
- Verify user permissions
-
Make sure that the user account for the tutorial has the following permissions on the database where you create the tables:
- Read
- Write
- Create Table
- Create source tables
- Use the following SQL queries to create and populate two tables in Snowflake.
Build a Snowflake Pipeline
With these steps, you build a Transformer for Snowflake pipeline that uses two Snowflake Table origins to read from the two source tables that you created, a Union processor to merge the data, and a Snowflake Table destination to write to a new output table.
You also preview the pipeline to verify how the stages process data.
Run a Job
Jobs are the execution of a dataflow that is represented in a pipeline.
When pipeline development is complete, you check in the pipeline to indicate that the pipeline is ready to be added to a job and run. When you check in a pipeline, you enter a commit message. Control Hub maintains the commit history of each pipeline.
When the Transformer for Snowflake engine is hosted, job configuration is simple. You can use the default values when creating the job.
Next Steps
- Modify the tutorial pipeline
- Add a couple processors to the tutorial pipeline to see how easily you can
add a wide range of processing to the pipeline:
- Add a Filter processor to remove the
multitool
rows from the data set. - Use a Column Transformer processor to double the inventory values, and overwrite the existing values.
- To see how the data drift feature works, change the destination
Overwrite Mode property from
Drop Table
toTruncate Table
. Then, add a Column Renamer processor to rename theBin
column.
- Add a Filter processor to remove the
- Create a new pipeline
- Create a new pipeline using your own Snowflake data or the Snowflake sample
data. You might explore some of the following functionality:
- If you have a Snowflake query that you want to enhance, use the Snowflake Query origin to generate data for the pipeline, then add processors to perform additional processing.
- Use the Join processor to join data from two Snowflake tables or views.
- As you develop the pipeline, use the Trash destination with data preview to see if the pipeline processes data as expected. For more information about data preview, see the Control Hub documentation.
- Explore useful features
-
- Use a pipeline to create a Snowflake view.
- Try using an existing user-defined-function (UDF) or define one in the pipeline.
- If you have an entire pipeline that you want to run with small changes, use runtime parameters to easily reuse and adapt pipeline logic.
- If you have a series of stages that you want to reuse in multiple pipelines, try creating a pipeline fragment.
- Use a Snowflake Notification executor to send an email after a pipeline run completes.
- Configure Snowflake pipeline defaults to make configuring pipelines easier when you use the same Snowflake details in all or most of your pipelines.
- Learn more about Control Hub
-
- Become more familiar with the Control Hub pipeline canvas.
- Learn how Control Hub tracks pipeline version history and gives you full control of the evolving development process.
- Need to import data to Snowflake? You can use pipelines that run on Data Collector or Transformer engines to make that happen.
- Add users to your organization
-
- Invite other users to join your organization and collaboratively manage pipelines as a team.
- To create a multitenant environment within your organization, create groups of users. Grant roles to these groups and share objects within the groups to grant each group access to the appropriate objects.