You can configure BigQuery Connector stage to read rows from the select statement using
Google Cloud Storage as a temporary staging area to improve the read performance.
Procedure
-
From the job design canvas, double-click the BigQuery
Connector stage.
- Set Generate SQL at runtime to No, and then
specify the Select statement in the Select statement property.
- Set Use GCS staging to Yes and follow the
steps below:
- Set the name of the schema in Schema name property. The temporary staging
table is created under this schema.
- Optionally, specify the name of the Google project Id where the staging table will be created,
in the Database name property. If it is not specified, the project id used
during BigQuery connection will be used.
- Specify the name of Google cloud storage bucket property to be used as a
temporary staging area during this read process.
- Optionally, the File name prefix for the temporary file to be created
under the Google cloud storage bucket can be provided, which will be deleted at the end of the
job.
- Optionally, provide an integer value under the File part size property,
which is used as a part size in MB at which a file gets split. The default value is 50. This
property can be adjusted accordingly to achieve higher performance for larger datasets. Heap size
property should be modified based on part size used. If a larger file part size is specified, heap
size should be increased accordingly.
-
Click OK, and then save the job.