Start of change

Generating JSON data

By using SQL functions, you can generate formatted JSON data from relational tables.

We are going to work with relational data to generate JSON. We will start with the data in two tables. One contains basic employee data. The other one contains account information. They are created and populated as shown here.


create table empdata (id int,
                      last_name varchar(10),
                      first_name varchar(10), 
                      office_number varchar(10),
                      work_phone varchar(20),
                      home_phone varchar(20));
 
create table emp_account (id int,
                          account varchar(20));                      
                      
insert into empdata values (901, 'Doe', 'John', 'E-334', '555-7242', '555-3762');
insert into emp_account values (901, '36232'), (901, '73263');

insert into empdata values (902, 'Pan', 'Peter', 'E-216', '555-8925', null);
insert into emp_account values (902, '76232'), (902, '72963');

insert into empdata values (903, 'Jones', 'Mary', 'E-739', '555-4311', '555-6312');

insert into empdata values (904, 'Smith', 'Sally', null, null, null);
Our goal is to generate four pieces of employee information in JSON that look like the following JSON objects that were used as input for the JSON_TABLE examples.


{"id":901, "name" : { "first":"John", "last":"Doe" }, "office" : "E-334",   
      "phones" : [ { "type":"home", "number":"555-3762" }, 
                   { "type":"work", "number":"555-7242" } ],      
      "accounts" : [ { "number":"36232"}, { "number":"73263"}] } 

{"id":902, "name" : { "first":"Peter", "last":"Pan" }, "office" : "E-216", 
      "phones" : [ { "type":"work", "number":"555-8925" } ], 
      "accounts" : [ { "number":"76232"}, {"number":"72963"}] }

{"id":903, "name" : { "first":"Mary", "last":"Jones" }, "office" : "E-739", 
      "phones" : [ { "type":"home", "number":"555-6312" }, 
                   { "type":"work", "number":"555-4311" } ], }

{"id":904, "name" : { "first":"Sally", "last":"Smith" } }

Building a basic JSON object

Let's start with a simple example to see how easy it is to generate a basic JSON object.

select json_object ('id' value id, 
                    'name' value last_name, 
                    'office' value office_number) 
from empdata;
This example uses the JSON_OBJECT scalar function to generate a JSON object. It defines three key:value pairs using the ID, LAST_NAME, and OFFICE_NUMBER columns from the EMPDATA table for the values. Each key name is specified as a character string exactly as it will appear in the output. The result of this query is four rows, one for each row in the EMPDATA table.
{"id":901,"name":"Doe","office":"E-334"}
{"id":902,"name":"Pan","office":"E-216"}
{"id":903,"name":"Jones","office":"E-739"}
{"id":904,"name":"Smith","office":null}
That is a good start toward our final goal, but maybe we want to omit the office value when it is null rather than including it. That is easy to do by using the ABSENT ON NULL clause. This indicates that when any of the values for the JSON object are the null value, the null key:value pair should not be included in the result.
select json_object ('id' value id, 
                    'name' value last_name, 
                    'office' value office_number
                    absent on null) 
from empdata;
Now there is no office value for 904.
{"id":901,"name":"Doe","office":"E-334"}
{"id":902,"name":"Pan","office":"E-216"}
{"id":903,"name":"Jones","office":"E-739"}
{"id":904,"name":"Smith"}

Nesting JSON objects

Next, we will work to embed the compound name object. First, let's generate it on its own.
select json_object ('first' value first_name, 
                    'last' value last_name) 
from empdata;
The result of this query is:
{"first":"John","last":"Doe"}
{"first":"Peter","last":"Pan"}
{"first":"Mary","last":"Jones"}
{"first":"Sally","last":"Smith"}
Since that looks correct, we will include it in the master query. All we need to do is embed this name JSON object within the outer one.
select json_object ('id' value id, 
                    'name' value json_object ( 'first' value first_name, 
                                               'last' value last_name), 
                    'office' value office_number  
                    absent on null) 
from empdata;
Now our result looks like this, with a name object nested within the outer JSON object:
{"id":901,"name":{"first":"John","last":"Doe"},"office":"E-334"}
{"id":902,"name":{"first":"Peter","last":"Pan"},"office":"E-216"}
{"id":903,"name":{"first":"Mary","last":"Jones"},"office":"E-739"}
{"id":904,"name":{"first":"Sally","last":"Smith"}}

Building a JSON array

Next we will tackle the phone array. This needs to be a JSON array, not a JSON object. That means we will use the JSON_ARRAY scalar function. To avoid generating array entries for phone numbers that are not provided, a CASE statement is used. If neither phone number exists, an empty JSON array will be generated. Since we know whether we are generating the home or work phone numbers, we specify type values as home and work.
select json_array 
        (case when home_phone is not null then 
            json_object('type' value 'home', 
                        'number' value home_phone) end, 
         case when work_phone is not null then 
            json_object('type' value 'work', 
                        'number' value work_phone) end 
         )
from empdata;
Here is the result. The output for each entry is broken across two lines to show the complete result.
["{\"type\":\"home\",\"number\":\"555-3762\"}",
    "{\"type\":\"work\",\"number\":\"555-7242\"}"]

["{\"type\":\"work\",\"number\":\"555-8925\"}"]

["{\"type\":\"home\",\"number\":\"555-6312\"}",
    "{\"type\":\"work\",\"number\":\"555-4311\"}"]

[]

This is a bit unexpected. Why are all those extra \ characters in the result? This demonstrates the difference between processing normal character data and character data that has already been formatted as JSON. When JSON_ARRAY (or any of the JSON publishing functions) is looking at its arguments, it recognizes when the argument is the direct result of another JSON function. If it is, the string is interpreted as already formatted JSON data. That means that the function will not escape any of the special characters in the string. If the argument is not the direct result of a JSON function, it is interpreted as non-JSON character data and escape processing is performed on the value. Since this example embedded the JSON_OBJECT in a CASE expression, the fact that the string was already formatted JSON is not known. To avoid these unwanted escape sequences, you need to explicitly tell JSON_ARRAY that the argument is already JSON by using the FORMAT JSON clause.

With that knowledge, let's try it again.
select json_array 
  (case when home_phone is not null then 
      json_object('type' value 'home', 
                  'number' value home_phone) end 
      format json,
   case when work_phone is not null then 
      json_object('type' value 'work', 
                  'number' value work_phone) end 
      format json)  
from empdata;
Now the result is:
[{"type":"home","number":"555-3762"},{"type":"work","number":"555-7242"}]
[{"type":"work","number":"555-8925"}]
[{"type":"home","number":"555-6312"},{"type":"work","number":"555-4311"}]
[]
Now this information is ready to include as the next piece of our larger JSON object.
select json_object ('id' value id, 
                    'name' value json_object ( 'first' value first_name, 
                                               'last' value last_name), 
                    'office' value office_number, 
                    'phones' value json_array 
                                 (case when home_phone is not null then 
                                     json_object('type' value 'home', 
                                                 'number' value home_phone 
                                                 ) end format json,
                                  case when work_phone is not null then 
                                     json_object('type' value 'work', 
                                                  'number' value work_phone 
                                                  ) end format json)
                     absent on null) 
from empdata;
This returns:
{"id":901,"name":{"first":"John","last":"Doe"},"office":"E-334",
     "phones":[{"type":"home","number":"555-3762"},{"type":"work","number":"555-7242"}]}

{"id":902,"name":{"first":"Peter","last":"Pan"},"office":"E-216",
     "phones":[{"type":"work","number":"555-8925"}]}

{"id":903,"name":{"first":"Mary","last":"Jones"},"office":"E-739",
     "phones":[{"type":"home","number":"555-6312"},{"type":"work","number":"555-4311"}]}

{"id":904,"name":{"first":"Sally","last":"Smith"},
     "phones":[]}

Building an array from multiple rows of data

We are almost done. We need to pick up the account information which is stored in a separate table and put it in an array. Let's generate an array of objects that contain account numbers.
select json_array(json_object('number' value account)) 
from emp_account;
The result of this query is:
[{"number":"36232"}]
[{"number":"73263"}]
[{"number":"76232"}]
[{"number":"72963"}]
That generated one array for each account number. What we need is one array for each ID value. That requires aggregating all the number JSON objects for a single ID value into a single array.
select json_arrayagg(json_object('number' value account)) 
from emp_account group by id;
JSON_ARRAYAGG is an aggregate function that works on groups of data. In this case, we are grouping on the ID column. Each account for an ID generates an object, then all of those objects are aggregated into a single array. This query returns only two rows, one for each ID value which is exactly what we were looking for.
[{"number":"36232"},{"number":"73263"}]
[{"number":"76232"},{"number":"72963"}]
This piece can be added to what we have so far to complete the generated JSON object. To return only the accounts from the EMP_ACCOUNT table for the current ID, a WHERE clause is needed. Also note that since the account array is being returned from a query, a FORMAT JSON clause needs to be added so the outer JSON_OBJECT correctly identifies the value as JSON that is already formatted.
select json_object ('id' value id, 
                    'name' value json_object ( 'first' value first_name, 
                                               'last' value last_name), 
                    'office' value office_number, 
                    'phones' value json_array 
                                 (case when home_phone is not null then 
                                     json_object('type' value 'home', 
                                                 'number' value home_phone 
                                                 ) end format json,
                                  case when work_phone is not null then 
                                     json_object('type' value 'work', 
                                                  'number' value work_phone 
                                                  ) end format json),
                    'accounts' value (select json_arrayagg(
                                                json_object('number' value account)) 
                                      from emp_account a 
                                      where a.id = e.id group by id) format json
                    absent on null) 
from empdata e;
Now we have arrived at our final result.
{"id":901,"name":{"first":"John","last":"Doe"},"office":"E-334",
      "phones":[{"type":"home","number":"555-3762"},
                {"type":"work","number":"555-7242"}],
      "accounts":[{"number":"36232"},{"number":"73263"}]}

{"id":902,"name":{"first":"Peter","last":"Pan"},"office":"E-216",
      "phones":[{"type":"work","number":"555-8925"}],
      "accounts":[{"number":"76232"},{"number":"72963"}]}

{"id":903,"name":{"first":"Mary","last":"Jones"},"office":"E-739",
      "phones":[{"type":"home","number":"555-6312"},
                {"type":"work","number":"555-4311"}]}

{"id":904,"name":{"first":"Sally","last":"Smith"},"phones":[]}

Combining JSON into one object

Let's take this one step further. Sometimes you need a single JSON document containing all the information. We will take the individual results and bundle them together.

select json_object('employees' value json_arrayagg(

       json_object ('id' value id, 
                    'name' value json_object ( 'first' value first_name, 
                                               'last' value last_name), 
                    'office' value office_number, 
                    'phones' value json_array 
                                 (case when home_phone is not null then 
                                     json_object('type' value 'home', 
                                                 'number' value home_phone 
                                                 ) end format json,
                                  case when work_phone is not null then 
                                     json_object('type' value 'work', 
                                                  'number' value work_phone 
                                                  ) end format json),
                    'accounts' value (select json_arrayagg(
                                                json_object('number' value account)) 
                                      from emp_account a 
                                      where a.id = e.id group by id) format json
                    absent on null))) 
from empdata e;
First we wrap the results with a JSON_ARRAYAGG to create an array of employee information. Then we wrap the array with a JSON_OBJECT so the final result is a single JSON object. Now the result is one JSON object.
{"employees":[
{"id":901,"name":{"first":"John","last":"Doe"},"office":"E-334",
      "phones":[{"type":"home","number":"555-3762"},
                {"type":"work","number":"555-7242"}],
      "accounts":[{"number":"36232"},{"number":"73263"}]},
{"id":902,"name":{"first":"Peter","last":"Pan"},"office":"E-216",
      "phones":[{"type":"work","number":"555-8925"}],
      "accounts":[{"number":"76232"},{"number":"72963"}]},
{"id":903,"name":{"first":"Mary","last":"Jones"},"office":"E-739",
      "phones":[{"type":"home","number":"555-6312"},
                {"type":"work","number":"555-4311"}]},
{"id":904,"name":{"first":"Sally","last":"Smith"},"phones":[]}
]}

Setting the result data type

Each of the publishing functions allows you to specify the data type and format of the function's result. If you omit the RETURNING clause, the result will be a CLOB with a length of 2G and a CCSID of 1208. It will be formatted as FORMAT JSON. If you require a different result such as BSON in a VARBINARY(2000) column, add the RETURNING clause as shown here:
select json_object ('id' value id, 
                    'name' value last_name, 
                    'office' value office_number
                    returning varbinary(2000) format bson) 
from empdata;

Using a common table expression to aggregate distinct values

Suppose we want to aggregate the sales information for each sales person, listing each region in which a sale was made. In the SALES table there is an entry for every sale that was made. We need a list of the unique sales people and a list of the regions for each sales person. These are generated by using a common table expression for each one. These temporary results are used for generating the JSON objects, one for each sales person.
with sales_tmp(sales_person) as (
    select distinct(sales_person) from sales),
  region_tmp(region,sales_person) as ( 
    select distinct region, sales_person from sales)

  select json_object( key sales_person 
                      value (select JSON_ARRAYAGG(region order by region) 
                             from region_tmp r where r.sales_person = s.sales_person) 
                      format json)
FROM sales_tmp s order by sales_person;
From the sample table, the following three rows are returned:
{"GOUNOT":["Manitoba","Ontario-North","Ontario-South","Quebec"]}
{"LEE":["Manitoba","Ontario-North","Ontario-South","Quebec"]}
{"LUCCHESSI":["Manitoba","Ontario-South","Quebec"]}
If you want to return a single JSON object containing all the results, the following query using JSON_OBJECTAGG will generate it for you.
with sales_tmp(sales_person) as (
    select distinct(sales_person) from sales),
  region_tmp(region,sales_person) as ( 
    select distinct region, sales_person from sales)

  select json_objectagg( key sales_person 
                         value (select JSON_ARRAYAGG(region order by region) 
                               from region_tmp r where r.sales_person = s.sales_person) 
                         format json)
FROM sales_tmp s;
This returns one row containing one JSON object. It is broken into several lines here to make it easy to read.
{"LEE":["Manitoba","Ontario-North","Ontario-South","Quebec"],
"GOUNOT":["Manitoba","Ontario-North","Ontario-South","Quebec"],
"LUCCHESSI":["Manitoba","Ontario-South","Quebec"]}
End of change