I have two separate data sets which I’ve uploaded into Watson Analytics for further analysis. Is it possible for me to join these two data sets so that they may be used in conjunction to create Discovery and Display Assets?
Below you’ll find the steps required to join two previously separate data sets. In this use case, the two data sets beings joined are titled ‘Game of Thrones – House of’ and ‘Game of Thrones – Character Description’. In the Final step, we see the combined data set titled ‘Game of Thrones – Joined’
Ensure you have both data assets uploaded.
Click on the ellipsis on the data tile and select ‘Refine’ for the first data asset.
You are then brought to the Refine portion of WA where you may initiate the join process.
Add the second data asset to the first by tapping the ‘+’ icon at the top of your screen.
A dialog box will then be presented. Navigate and select the second data asset which is to be joined with the first. Once selected, click ‘OK’
Both data assets will now be available within the refinement section. You may view either by switching tabs.
Note: ‘Sheet1’ is the default name given to the data asset. This is determined by the name of the sheet within the spreadsheet program (such as Microsoft Excel). To override this name, you must assign the sheet name desired from within the spreadsheet program prior to uploading into Watson Analytics.
Click the ‘Join’ button.
A new tab will be opened, displaying the join configuration parameters. You will notice the columns from the first data asset are listed at the top while the columns from the second data asset are listed at the bottom, each with its own color label.
Columns must now be mapped in order to specify which columns are to be joined. Mapping between columns can be specified by dragging an arrow from one column to another.
Note: The color under each column name is indicative of its originating data asset.
The type of join may be selected by clicking on the text at the bottom bar (in this case ‘matching rows’).
Matching rows: This is an inner join. Based on the joined column, includes only the rows that match in both data assets.
A + matching rows: This is a left join. All rows from data asset ‘A’ are included along with rows from data asset ‘B’ that match the joined column in data asset ‘A’.
B + matching rows: This is a right join. All rows from data asset ‘B’ are included along with rows from data asset ‘A’ that match the joined column in data asset ‘B’.
- The join is now complete. You must save the newly joined data asset. ‘Save’ will overwrite the original data asset while ‘Save as’ will create a new data asset without overwriting the original.
The joined data asset will be reflected with the data asset tile below.
Please review the following documentation for more information on joining data assets.