Differences between CSV and JSON

CSV and JSON are common data exchange formats. Inherent differences between these formats affect, and in some cases dictate, differences in the CSV and JSON output by Transaction Analysis Workbench.

Inherent differences

The following table lists some differences between CSV and JSON that are unrelated to Transaction Analysis Workbench.

Table 1. CSV versus JSON
Point of difference	CSV	JSON
Fields and field order	All records (lines) in a CSV file should have the same fields, in the same order. According to the Internet Engineering Task Force (IETF) RFC 4180, Each line [in a CSV file] should contain the same number of fields throughout the file. However, in practice, if each record in a CSV file or stream contains, in the same column, a field that identifies the record type, then records might be heterogeneous: different column layouts in each record. Applications that interpret the records, such as analytics platforms, can use that field to detect the record type, and then apply a corresponding column layout definition. (How an application detects the record type and applies a column layout definition depends on the application.)	Each JSON object can have different fields. Field order is not significant. More precisely, a JSON object is an unordered set of name/value pairs; the order of name/value pairs is not significant. Name/value pairs are also known as key-value (KV) pairs.
Header	If the data records in a CSV file are homogeneous (all data records have the same column layout), then the first record might be a header that contains field names. The following example shows a header followed by two data records: `tran,count A,5 B,10` In this context, field names are also known as labels.	No header. Each JSON object contains field names. The following example shows two JSON objects separated by a newline character. This format is known as JSON Lines: `{"tran": "A", "count": 5} {"tran": "B", "optfld": "X", "count": 10}`
Structured fields	CSV is strictly a two-dimensional, tabular format, with no nested structures. Tip: To handle repeating sections in CSV output, Transaction Analysis Workbench offers a method known as vertical separation. For details, see Log records that contain repeating sections.	JSON can contain nested structures. In JSON, a value can be one of several types, including an array or an object. Some z/OS®-based log record types contain nested structures, such as repeating sections. If you want to preserve such structures in a common data exchange format, JSON is a better fit than CSV. For example, in the following JSON object, the value of the `steps` field is an array; each element of the array is an object. This is a simplified example to illustrate nesting, not actual log data. `{ "job": "A", "rc": 8, "steps": [ {"name": "A1", "rc": 0}, {"name": "A2", "rc": 0}, {"name": "A3", "rc": 8} ] }`
Data types	CSV does not identify data types. CSV needs external metadata to identify data types, so some analytics platforms might need more configuration to ingest CSV than JSON. Tip: Transaction Analysis Workbench creates metadata for DB2®, Elastic (Logstash), and Hadoop. For details, see the SCHEMA, LOGSTASHCONFIGURATION, and HCATALOG parameters of the CSV command.	JSON uses JavaScript™ data types. For example, strings are enclosed in double quotation marks, while numbers are unquoted. If you forward logs to an analytics platform that needs to distinguish between strings and numbers, JSON already does this. Regardless of whether you use CSV or JSON, you might need to configure the analytics platform to recognize the format of date values, such as event time stamps.
Verbosity	CSV is significantly more concise than JSON.	JSON is significantly more verbose than CSV. Each value is accompanied by a field name (also known as a key). Disadvantage: if you forward logs to an analytics platform with a licensing model that is based on the volume of data ingested, and that calculation includes keys, then you might pay more to ingest keys than values. Advantage: reduced risk of data corruption due to incorrect field mapping, such as changes to record formats over time. Each value in each record is explicitly paired with the correct key.

Differences specific to Transaction Analysis Workbench

The first field of each CSV record or JSON object output by Transaction Analysis Workbench is the event time stamp.

In JSON output, the second field of each JSON object is the event type. Here, the terms first and second refer to the order in which Transaction Analysis Workbench writes these fields to an output file or stream. If you use string-based methods, such as regular expressions, to parse JSON, then you can rely on this order. However, you should not rely on a JSON parser to return fields in this order, because JSON defines objects as an unordered set of name/value pairs.

In CSV output, the second field (column) is the event type only if the CSV command that created the data specified the TYPECOLUMN parameter. By default, CSV records output by Transaction Analysis Workbench do not contain this event type field because, typically, all data records in a CSV file have the same event type.

By default, the event type refers to the log type and code that Transaction Analysis Workbench uses to identify the input log record. However, you can use the TYPE parameter of the CSV or JSON command to set your own event type.

Subsequent fields are selected from the input log record.

Here is the start of an example CSV data record, showing the event time stamp and the event type:

2015-12-31T13:59:00.123456Z,cmf-6e13,…

Here is the start of an equivalent JSON object (shown with line breaks, spaces, and indenting for readability):

{
  "time": "2015-12-31T13:59:00.123456Z",
  "type": "cmf-6e13",
  …

In JSON output, how the remaining fields are represented depends on the following parameters of the JSON command:

FLAT(YES|NO)

FLAT(YES) writes flat JSON: each JSON object is a sequence of key-value pairs without nested structures. This is the default behavior. You can abbreviate FLAT(YES) as FLAT.

FLAT(NO) writes JSON objects that reflect the structure of the original log records. Fields are nested in sections.

For details, see “Flat” JSON versus JSON with nested structures.

MISSING(EXCLUDE|INCLUDE)

Specifies whether to exclude or include fields that are selected for output but are missing from the input log record.

MISSING(EXCLUDE) excludes such missing fields from the output. This is the default behavior.

MISSING(INCLUDE) includes such missing fields in the output, with the JavaScript value null.

Finally, for both CSV and JSON output, you can optionally append correlation tokens.