Optional: Read using GCS Staging

You can configure BigQuery Connector stage to read rows from the select statement using Google Cloud Storage as a temporary staging area to improve the read performance.

Procedure

  1. From the job design canvas, double-click the BigQuery Connector stage.
  2. Set Generate SQL at runtime to No, and then specify the Select statement in the Select statement property.
  3. Set Use GCS staging to Yes and follow the steps below:
    • Set the name of the schema in Schema name property. The temporary staging table is created under this schema.
    • Optionally, specify the name of the Google project Id where the staging table will be created, in the Database name property. If it is not specified, the project id used during BigQuery connection will be used.
    • Specify the name of Google cloud storage bucket property to be used as a temporary staging area during this read process.
    • Optionally, the File name prefix for the temporary file to be created under the Google cloud storage bucket can be provided, which will be deleted at the end of the job.
    • Optionally, provide an integer value under the File part size property, which is used as a part size in MB at which a file gets split. The default value is 50. This property can be adjusted accordingly to achieve higher performance for larger datasets. Heap size property should be modified based on part size used. If a larger file part size is specified, heap size should be increased accordingly.
  4. Click OK, and then save the job.