Google BigQuery
The Google BigQuery executor runs one or more SQL queries on Google BigQuery each time it receives an event record. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
Use the executor as part of an event stream in the pipeline. For example, you might use the Google BigQuery executor to execute a stored procedure in the database when the pipeline generates a pipeline stop event.
When you configure the executor, you specify authentication information for Google BigQuery. You can optionally configure the executor to connect to BigQuery through a proxy server. You can also use a connection to configure the destination.
You specify one or more SQL queries to run and how to submit the queries. You can also configure the executor to generate events for another event stream, and whether you want a query result count included in generated event records. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Before you use the Google BigQuery executor, you must complete a prerequisite task.
Prerequisite
Executing a Google BigQuery SQL query requires that the user specified in the Google BigQuery executor has the appropriate BigQuery permissions.
- bigquery.jobs.create
The user might require additional permissions depending on the types of SQL queries specified in the executor. For information about permissions needed for different types of queries, see the Google BigQuery documentation.
Credentials
When the Google BigQuery executor connects to BigQuery, the executor must pass credentials to Google Cloud Storage and then to Google BigQuery.
- Google Cloud default credentials
- Credentials in a file
- Credentials in a stage property
For details on how to configure each option, see Security in Google Cloud Stages.
SQL Queries
You can specify one or more queries to perform each time that the Google BigQuery executor receives an event record.
When a query fails, the Google BigQuery executor treats the event record that triggered the query like an error record. If you specify more than one query and multiple queries fail for an event record, the executor creates an error record for each failed query, and includes query failure details in the header attributes for the error record.
- Query submission
- Configure the Query Submission property to define how the executor submits
your queries:
- Sequential - For each incoming event record, the executor submits one query at a time, and waits until the previous query is complete before submitting the next query. Use when the run order for the queries is important. Queries are submitted in the order that they appear in the executor.
- Parallel - For each incoming event record, the executor submits all queries at the same time. Use when the run order for the queries is not important.
- Expressions in queries
- You can include a subset of the functions provided with the StreamSets expression language in a SQL query. These expressions are evaluated before the executor passes the query to BigQuery.
Event Generation
The Google BigQuery executor can generate events that you can use in an event stream. When you enable event generation, the executor generates events for each successful or failed query.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following event types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
- Successful query
-
The executor generates a successful-query event record after successfully completing a query.
Successful-query event records have thesdc.event.type
record header attribute set tosucessful-query
and include the following fields:Event Field Name Description query Query completed. query-result Number of rows affected by query. Included if the Query Result Count in Events property is selected. - Failed query
-
The executor generates a failed-query event record after failing to complete a query.
Failed-query event records have thesdc.event.type
record header attribute set tofailed-query
and include the following field:Event Field Name Description query Query attempted.