How do I Join Two Previously Uploaded Data Sets in #WatsonAnalytics

Blog Home > How do I Join Two Previously Uploaded Data Sets in #WatsonAnalytics

How do I Join Two Previously Uploaded Data Sets in #WatsonAnalytics

wa-support-logo

Question:

I have two separate data sets which I’ve uploaded into Watson Analytics for further analysis. Is it possible for me to join these two data sets so that they may be used in conjunction to create Discovery and Display Assets?

Solution:

Below you’ll find the steps required to join two previously separate data sets. In this use case, the two data sets beings joined are titled ‘Game of Thrones – House of’ and ‘Game of Thrones – Character Description’. In the Final step, we see the combined data set titled ‘Game of Thrones – Joined’

Ensure you have both data assets uploaded.

Pic1

Click on the ellipsis on the data tile and select ‘Refine’ for the first data asset.

Pic2

You are then brought to the Refine portion of WA where you may initiate the join process.

Pic3

Add the second data asset to the first by tapping the ‘+’ icon at the top of your screen.

Pic4

A dialog box will then be presented. Navigate and select the second data asset which is to be joined with the first. Once selected, click ‘OK’

Pic5

Both data assets will now be available within the refinement section.  You may view either by switching tabs.

Pic6

Note: ‘Sheet1’ is the default name given to the data asset.  This is determined by the name of the sheet within the spreadsheet program (such as Microsoft Excel).   To override this name, you must assign the sheet name desired from within the spreadsheet program prior to uploading into Watson Analytics.

Pic7

Click the ‘Join’ button.

Pic8

A new tab will be opened, displaying the join configuration parameters.  You will notice the columns from the first data asset are listed at the top while the columns from the second data asset are listed at the bottom, each with its own color label.

Pic9

Columns must now be mapped in order to specify which columns are to be joined. Mapping between columns can be specified by dragging an arrow from one column to another.

Pic10

Note: The color under each column name is indicative of its originating data asset.

The type of join may be selected by clicking on the text at the bottom bar (in this case ‘matching rows’).

Pic11

Pic12

Matching rows: This is an inner join.  Based on the joined column, includes only the rows that match in both data assets.

A + matching rows: This is a left join.  All rows from data asset ‘A’ are included along with rows from data asset ‘B’ that match the joined column in data asset ‘A’.

B + matching rows: This is a right join.  All rows from data asset ‘B’ are included along with rows from data asset ‘A’ that match the joined column in data asset ‘B’.

  • The join is now complete. You must save the newly joined data asset.  ‘Save’ will overwrite the original data asset while ‘Save as’ will create a new data asset without overwriting the original.

Pic13

The joined data asset will be reflected with the data asset tile below.

Pic14

More Information

Please review the following documentation for more information on joining data assets.