Snowflake Extraction Error: JDBC Driver Internal Error "Timeout waiting for the download of #chunk0"

Problem

The following message appears in the Manta Flow Snowflake extractor log.

2021-03-16 18:14:08.950 [pool-2-thread-1] 0 ERROR eu.profinit.manta.platform.master.MasterScenario$TaskLauncher [Context: /apps/mantaflow/cli/platform/bin/../../temp/snowflake/QAT SNF/ddl] Exception during execution of task: ScenarioExecutionTask and input: qat_SNF.properties.
org.springframework.jdbc.UncategorizedSQLException: 
### Error querying database.  Cause: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Timeout waiting for the download of #chunk0(Total chunks: 1) retry=0.
### The error may exist in URL [jar:file:/apps/mantaflow/cli/scenarios/manta-dataflow-cli/lib/manta-connector-snowflake-dictionary-extractor-1.31.1.jar!/eu/profinit/manta/connector/snowflake/dictionary/extractor/mappers/information_schema/TableMapper.xml]
### The error may involve eu.profinit.manta.connector.snowflake.dictionary.extractor.mappers.TableMapper.getAllSchemaTables
### The error occurred while handling results
### SQL: SELECT tbls.table_name       AS table_name,          tbls.table_catalog    AS table_catalog,          tbls.table_schema     AS table_schema,                cols.column_name      AS column_name,                cols.data_type        AS data_type,                cols.is_nullable      AS is_nullable,                cols.ordinal_position AS ordinal_pos,                false                 AS is_external         FROM "CTC_CMN_MRT_EDW_TS_DB".information_schema.tables AS tbls         LEFT OUTER JOIN "CTC_CMN_MRT_EDW_TS_DB".information_schema.columns AS cols ON tbls.table_name = cols.table_name         WHERE tbls.table_schema = ? AND cols.table_schema = ? AND tbls.table_type = 'BASE TABLE'         ORDER BY table_name, ordinal_pos;
### Cause: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Timeout waiting for the download of #chunk0(Total chunks: 1) retry=0.
; uncategorized SQLException; SQL state [XX000]; error code [200001]; JDBC driver internal error: Timeout waiting for the download of #chunk0(Total chunks: 1) retry=0.; nested exception is net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Timeout waiting for the download of #chunk0(Total chunks: 1) retry=0.
        ...
Caused by: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Timeout waiting for the download of #chunk0(Total chunks: 1) retry=0.
    ... 22 more

More Details

Snowflake SQL driver stores SQL query results in the so-called “user stage”, which is a Snowflake-internal logical object over a blob storage in the current cloud provider’s infrastructure (AWS S3, Azure, GCS). The "Timeout waiting for the download of #chunk0" error message suggests that the JDBC client had problems reaching the “user stage” to read the query results.

Solution

  1. First, execute the following query on Snowflake deployment.
    SELECT t.value:type::varchar as type,
           t.value:host::varchar as host,
           t.value:port as portfrom 
    FROM TABLE(flatten(input => parse_json(system$whitelist()))) as t;

If you are using the Snowflake PrivateLink feature, use SYSTEM$WHITELIST_PRIVATELINK() instead of the SYSTEM$WHITELIST() function.

    SELECT t.value:type::varchar as type,
           t.value:host::varchar as host,
           t.value:port as portfrom 
    FROM TABLE(flatten(input => parse_json(SYSTEM$WHITELIST_PRIVATELINK()))) as t;
  1. Look for the STAGE-type resources.

    No alt text provided

To obtain only the host addresses, modify the previous query.

    select 'https://' || host as stage_address from 
    (
        select t.value:type::varchar as type,
               t.value:host::varchar as host,
               t.value:port as port 
        from table(flatten(input => parse_json(system$whitelist()))) as t
    )
    WHERE type = 'STAGE';

No alt text provided

(In this example, there is only one STAGE-type resource present, but there can be more.)

  1. Make sure that all the addresses from step #2 are reachable from the host that is running the IBM Manta Data Lineage software and the Snowflake JDBC driver. For example, if:
    select 'https://' || host as stage_address from 
    (
        select t.value:type::varchar as type,
               t.value:host::varchar as host,
               t.value:port as port 
        from table(flatten(input => parse_json(system$whitelist()))) as t
    )
    WHERE type = 'STAGE';

outputs:

    STAGE_ADDRESS
    https://sfc-prod2-ds1-15-customer-stage.s3.us-west-2.amazonaws.com
    https://sfc-prod2-ds1-15-customer-stage.s3-us-west-2.amazonaws.com
    https://sfc-prod2-ds1-15-customer-stage.s3.amazonaws.com

then you will need to run:

    curl -v https://sfc-prod2-ds1-15-customer-stage.s3.us-west-2.amazonaws.com
    curl -v https://sfc-prod2-ds1-15-customer-stage.s3-us-west-2.amazonaws.com
    curl -v https://sfc-prod2-ds1-15-customer-stage.s3.amazonaws.com

from the host trying to connect to Snowflake (where Manta Data Lineage is installed) to verify that it is actually capable of connecting to the Snowflake stage, which hosts the query result chunks. Use the actual stage addresses output by the selected run on your environment.

The expected result from the above test commands is an HTTP 403 Access Denied, as we are just testing the HTTP(S) connection. If the above cURL commands time out or return with “Connection refused”, then you need to work with the internal teams responsible for configuring firewalls and proxies to allow the connection to succeed.

  1. If the connection succeeds and the cURL commands were able to reach the cloud blob storage but the problem still persists, assistance from the Snowflake support team is needed. Please raise a service request with Snowflake and provide:

    • The full output of the above cURL commands—to verify if there's any proxy in between

    • The full output of echo | openssl s_client -connect stage.host.name:443 -servername stage.host.name -showcerts for each STAGE returned by SYSTEM$WHITELIST() — to verify that there's no certificate interception

    • Set the JDBC driver to log at verbose loglevel as described in https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors, reproduce the problem, and attach the JDBC logfile generated with the traces of the problem, now in verbose loglevel for analysis