Comparing Snowflake and Other Engines

For users already familiar with Control Hub and Data Collector or Transformer, here's how working with Transformer for Snowflake is similar... and different.

Transformer for Snowflake pipelines are configured on the Control Hub canvas, just like Data Collector and Transformer pipelines. The difference lies in the available functionality within the pipelines and how the pipelines run.

As described in How It Works, Transformer for Snowflake does not perform actual pipeline processing like Data Collector. Instead, Transformer for Snowflake follows the Transformer model. Just as Transformer passes pipeline configuration to Spark for processing, Transformer for Snowflake generates a SQL query based on pipeline configuration and passes the query to Snowflake for processing. This structural similarity explains how Transformer for Snowflake got its name.

With Transformer and Data Collector, you can use heterogeneous origins and destinations to read from and write to a wide range of systems. Transformer for Snowflake pipelines process Snowflake data – all origins and destinations read from and write to Snowflake.

However, many concepts and behaviors remain exactly the same. For example, you use origin, processor, and destination stages to define processing in all pipeline types. You create jobs to run pipelines. You can use runtime parameters in all pipelines.

Here are some highlights of the similarities and differences between Transformer for Snowflake and Transformer and Data Collector:

Similarities

Since you design and run Transformer for Snowflake in Control Hub, some basic concepts remain the same:

Create pipelines in the pipeline canvas.
Preview pipelines to help with pipeline development. For more information, see the Control Hub documentation.
Use origin, processor, destination, and executor stages to design the pipeline data flow.
Processors with the same names as Transformer and Data Collector stages probably do what you expect at a high level, but might have subtle differences or additional features since the processing occurs in Snowflake.

For example, Transformer for Snowflake supports using the Snowflake SQL query language for data processing. See stage documentation for details, such as the Filter processor.
Like Transformer, you use the IBM StreamSets expression language in properties that are evaluated only once, before pipeline processing begins, such as runtime parameters in pipeline properties.
Create jobs to run pipelines.
Use Control Hub team-based features, such as version control and user management.

Differences

Unlike Transformer and Data Collector:

Transformer for Snowflake provides a hosted engine that most organizations use to avoid installing and maintaining Transformer for Snowflake engines. You can deploy Transformer for Snowflake engines based on the account agreement for your organization.
For more information, see Hosted or Deployed Engines.
Transformer for Snowflake includes stages based on Snowflake functionality, such as the Cube processor to apply the Group by Cube command.
Transformer for Snowflake uses the terms "column" and "row" to align with Snowflake terminology. Transformer and Data Collector use the terms "field" and "record" to refer to the same concepts.
Like Data Collector, Transformer for Snowflake includes executor stages to perform tasks, such as sending an email notification.
Transformer for Snowflake executors perform tasks using Snowflake integrations after all pipeline writes complete, when triggered by the data. These executors can be placed anywhere in the data flow.

Data Collector executors expect to be triggered by special event records, which are only generated by certain stages. These executors should be placed downstream from event-generating stages.
You can monitor Snowflake jobs as you would any other Control Hub job. However, the Snowflake job summary displays the following different information:
- Input and output row count
  You cannot view an error row count, row throughput, or runtime statistics as you can for other Control Hub jobs.
- Log messages
- Snowflake queries run for the job