Db2 Big SQL complex data types and JSON data
Db2® Big SQL supports complex types, such as array-of-rows and map-of-rows, and you can use these complex data types with JSON data.
JSON data can contain nested data in both array-of-rows and map-of-rows contexts. For
example:
Array-of-Rows:
"my_friends":
[
{"id":1,"name":"Michael Strong","gender":"male"},
{"id":2,"name":"John Smith","gender":"male"},
{"id":3,"name":"Pat Brown","gender":"female"}
]
Map-of-Rows:
"my_users":{
"user1":{"id":1,"firstname":"Michael ","surname":"Strong","age":48},
"user2":{"id":2,"firstname":"John ","surname":"Smith","age":72},
"user3":{"id":3,"firstname":"Pat ","surname":"Brown","age":71}
}
When you use a JSON SerDe to work with JSON data, the table field names must match the JSON field names, and it's important that you be familiar with the structure and layout of the data being used for the table, because the column types and names determine the table that must be created.
The following example shows you how to use Db2 Big SQL complex data types with JSON data. The example is based on the following table:
CREATE EXTERNAL HADOOP TABLE cdtj
(
id BIGINT,
this_user ROW (
id INT,
name VARCHAR(20),
screen_name VARCHAR(100)
),
myvalues DECIMAL(10,3) array[3],
available MAP<VARCHAR(10), BOOLEAN>,
my_friends ROW (
id BIGINT,
name VARCHAR(20),
gender VARCHAR(6)
) array[3],
my_users MAP<VARCHAR(10), STRUCT<id : BIGINT, firstname :
VARCHAR(10), surname : VARCHAR(10), age : INT > >
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/myjsondata';
Copy your data files to the appropriate location in the HDFS. In this example, the JSON data
files are copied to /tmp/myjsondata. The following snippet shows some example
data.
Important: Because
JSON data is typically used with the TEXTFILE format, this snippet represents a single row of data
that must be formatted as a single line of
text.
The following snippet shows the
same example data formatted for readability:{"id":1,"this_user":{"id":1,"name":"Steve Mike","screen_name":"smk22"}, "myvalues":1248347.424,3774492.115,7029108.14],"available":{"Michael":true,"John":false,"Pat":false},"my_friends":[{"id":11,"name":"Michael Strong","gender":"male"},{"id":2,"name":"John Smith","gender":"male"},{"id":3,"name":"Pat Brown","gender":"female"}],"my_users":{"user1":{"id":100,"firstname":"Peter","surname":"Jones","age":20},"user2":"id":200,"firstname":"Jess","surname":"Carre","age":29},"user3":{"id":300,"firstname":"Tom","surname":"Mitch","age":32}}}
{"id":1,"this_user":
{"id":1,"name":"Steve Mike","screen_name":"smk22"},
"myvalues":[1248347.424,3774492.115,7029108.14],
"available":{"Michael":true,"John":false,"Pat":false},
"my_friends":[
{"id":11,"name":"Michael Strong","gender":"male"},
{"id":2,"name":"John Smith","gender":"male"},
{"id":3,"name":"Pat Brown","gender":"female"}],
"my_users":{
"user1":{"id":100,"firstname":"Peter","surname":"Jones","age":20},
"user2":{"id":200,"firstname":"Jess","surname":"Carre","age":29},
"user3":{"id":300,"firstname":"Tom","surname":"Mitch","age":32}}}
Use a SELECT statement to query both simple and complex data. For
example:
SELECT id, this_user.name, myvalues[1],available['Pat'],
my_friends[1].id,my_users['user1'].id
FROM cdtj
The output might look like the following
example:
ID 2 3 4 5 6
-------------------- -------------------- ------------ ------ -------------------- --------------------
1 Steve Mike 1248347.424 0 11 100
1 record(s) selected.