Copying data from source to target (Data Refinery)

You'll primarily use Data Refinery to read data from a source location, refine that data, and then load the analytics-ready data to a target location. But you can also use Data Refinery to securely copy data from a source to a target.

These instructions are for copying data from a data asset in a project to a target data source defined by a connection. You can also copy data in the other direction- from a source that is defined by a connection to a data asset in the project. Or between two connections. You can also copy data from a data asset in a project to another data asset in the same project.

To copy data:

  1. From within a project, add the data that you want to copy. This creates a data asset in the project.

  2. Create a connection for the target, if it doesn't already exist. Be sure to use credentials that have Write permission.

Important: Not all connection types can be targets. Review restrictions in Supported data sources.

  1. From the project's Assets page, select Refine from the data asset's menu. Alternatively, click the data asset to see a preview and then click the Refine link.

  2. In the Information side pane Details tab, click the Edit button.

  3. In the DATA REFINERY FLOW OUTPUT pane, click Edit output.

  4. Click the edit icon for the Location field.

  5. Click Connections, and then drill-down to the desired location.

  6. Click Save Location.

  7. If you select an existing relational database table or view or you select a connected relational data asset as the target for the Data Refinery flow output, in the IMPACT TO EXISTING DATA SET drop-down, select what to do if the data set already exists in the target location:

    • Overwrite - Overwrite the rows in the existing data set with those in the Data Refinery flow output
    • Recreate - Delete the rows in the existing data set and replace them with the rows in the Data Refinery flow output
    • Insert - Append all rows of the Data Refinery flow output to the existing data set
    • Update - Update rows in the existing data set with the Data Refinery flow output; don’t insert any new rows
    • Upsert - Update rows in the existing data set and append the rest of the Data Refinery flow output to it

      For the Update and Upsert options, you'll need to select the columns in the output data set to compare to columns in the existing data set. The output and target data sets must have the same number of columns, and the columns must have the same names and data types in both data sets.

      If you select a file in a connection as the target for your Data Refinery flow output, you can select one of the following formats for that file:

    • Avro
    • CSV
    • JSON
    • Parquet
  8. Optional: Change the target data set name.

  9. In the Edit output pane, click the Save check mark.

  10. Click Done.

  11. To run the Data Refinery flow, create a job for it. On the Data Refinery flow toolbar, click either Save and create a job or Save and view jobs.

Parent topic: Refining data