Installing and using dbt-watsonx-spark
This section covers the steps to install and use
dbt-watsonx-spark.
Before you begin
- Install watsonx.data.
- Provision native Spark engine in watsonx.data.
- Install DBT core.
Create a Spark query server
For the Spark engine to integrate with dbt tool and work as a query engine, you must create a Spark query server. See Create a Spark query server.
Retrieve the query server connection details
To configure the profile file in dbt tool, you must save the query server connection details. See Retrieve the query server connection details.
Set up profiles.yaml for dbt tool
- Go to the profiles.yml file that is located in .dbt of your home directory.
- Paste the connection details by modifying the parameter values.
- Set up the
profiles.ymlfile. For more information, see Configuration (setting up your profile).
Install the dbt tool and verify the connection
- Run the following command on your system to install
dbt-watsonx-spark.pip install dbt-watsonx-spark - Run the following command to verify the dbt version.
dbt –version - If you want to create a dbt project, provide a
<project_name>and run the following command .dbt init <project_name>- The system prompts to select the database to be used. Select
watsonx_spark. - Provide watsonx.data host, URI, and schema.
- The system prompts to select the database to be used. Select
- To verify the connection, run the following command
dbt debugfrom the dbt project path.Note: For more information on how to use the dbt tool, see About dbt projects. - Run the seeds by using the following command to create a table and insert the
data.
cd <project_name> dbt seed - In <project_name>/models, you have the models that perform the
operations. By default, dbt sets the operations as
view. You can create the tables or views by one of the following methods:- Specify inside the models (applicable for that model only)
-
{{ config(materialized='table/view') }}Note: If this statement is commented out using(--), dbt still uses the configuration. To disable it, remove it entirely or comment it in Jinja style ({# … #}). - Specify in dbt_project.yml (applicable for all models)
Example:models: <project_name>: <model_folders>: +materialized: table/viewmodels: demo: example: +materialized: tableNote: Onlyselectstatements are supported within models.
Important: The semicolon (;) character is restricted in models. - Run the models by using the following command to create the tables or
views.
You can also specify the tests that you want:cd <project_name> dbt run
For example:models: - name: <model_name> description: "some description" columns: - name: <col_name> description: "some description" data_tests: - <test_name_1> - <test_name_2>models: - name: my_first_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" data_tests: - unique - not_nullImportant: Connectors must support Create Table as Select (CTAS) for dbt runs to work. - To generate the documents about the actions performed, run:
cd <project_name> dbt docs generate dbt docs serveNote: By default, it runs onlocalhost:8080. To change the port, run:dbt docs serve –-port <port_number>