CSV and JSON commands

The CSV command converts logs to comma-separated values (CSV) format. The JSON command converts logs to JavaScript Object Notation (JSON) format. The CSV and JSON commands have similar syntax with many common parameters. Both commands can write to a ddname, a z/OS® UNIX file path, or a TCP socket.

Except for a few parameters that apply only to CSV or only to JSON, the difference between a command that converts log data to CSV and a command that converts the same log data to JSON is the command keyword, CSV or JSON. If you already have a CSV command, but you want to output JSON instead, then, as a starting point, simply replace the CSV command keyword with JSON.

To create a CSV file specifically for the Mobile Workload Reporting Tool (MWRT), use the MWP command instead of the CSV command.

To convert logs to a delimiter-separated values (DSV) format that uses a delimiter other than a comma, use the CSV command with a DELIMITER parameter.

Each CSV or JSON command processes log records of one specific combination of log type and code. To specify the log type and code, you must use either a CODE parameter or a FORM parameter.

A SYSIN data set can contain multiple CSV and JSON commands for the same log type and code or any combination of different log types and codes.

A CSV or JSON command that writes to a ddname cannot write to the same ddname as another command. However, multiple JSON commands can write to the same network stream or to stdout.

To filter the input log records that are processed by a CSV or JSON command, use one of the following methods:

Immediately after the CSV or JSON command, specify filter conditions in CODE qualifying commands
In the CSV or JSON command, specify a FILTER parameter that refers to a filter in a control repository

You can use the CSV or JSON command to produce some outputs without any input logs. For details, see the SCHEMAONLY global command.

Format

Common parameters

Notes:

¹ Section parameters specify which sections of a log record to include in the output, and how to handle repeating sections.
² The FIELDS parameter specifies which fields of a log record to include in the output. The FIELDS parameter has the same syntax as the FIELDS qualifying command.

CSV-only parameters

The following parameters apply only to the CSV command. If you specify these parameters on a JSON command, they are ignored.

There is one exception: the FIXEDWIDH parameter applies to both the CSV command and the JSON command. FIXEDWIDTH is presented here as a CSV-only parameter because, while you can use it with the JSON command, its behavior is much more likely to be useful for CSV output than JSON output.

DELIMITER

The field delimiter, also known as the column delimiter: the character that separates field values. The default delimiter is a comma, with one exception: if the TIMEFORMAT parameter specifies a comma as the decimal sign (the character that separates the integer number of seconds from the decimal fraction), the default delimiter is a semicolon.

Strictly, the output is comma-separated values (CSV) only if the delimiter is a comma. Otherwise, the output is more correctly known as delimiter-separated values (DSV). CSV is a specific type of DSV that uses a comma as the delimiter.

NOLABELS | LABELS

LABELS includes a header row with field labels. The default NOLABELS omits the header row.

FIXEDWIDTH

Right-pads field values, including numeric field values, with blanks to the maximum width of each field.

The maximum output width of numeric fields is 9 characters, including any decimal point. For example, the floating-point value 12.456, which has an output width of 6 characters (2-digit integer part, a decimal point, and a 3-digit decimal fraction), will be right-padded with 3 blanks. The 4-digit integer value 1234 will be right-padded with 5 blanks.

By default, the maximum output width of a character field depends on the original log record definition. To set a different maximum output width, use the WIDTH parameter of the FIELDS qualifying command. For example, if a character field has a maximum width of 8 characters, then the 4-character value MYZ1 will be right-padded with 4 blanks.

Specifying FIXEDWIDTH causes the DB2LOAD parameter to create field specifications that use the POSITION option to determine column boundaries, rather than relying on delimiters.

TYPECOLUMN

Inserts the TYPE parameter value as the second column of each output CSV data row.

Some analytics platforms can use data in each incoming event to detect the event type, and then interpret the data accordingly. Using TYPECOLUMN to include such data as a column in each CSV row enables analytics platforms to interpret a stream of heterogeneous CSV rows.

If the LABELS parameter is also specified, then the corresponding header row label is Type, TYPE, or type, depending on whether the FIELDCASE parameter specifies ASIS, UPPER, or the default value LOWER. However, if you are writing heterogeneous CSV rows (rows with different column layouts) from multiple CSV commands to the same destination, such as a stream, then it is unlikely that you want to include header rows in the output.

DB2LOAD

Writes a DB2® load utility LOAD control statement that loads the CSV data output by this CSV command into a DB2 table.

To create JCL that runs the DB2 load utility with this generated LOAD statement, see Transaction Analysis Workbench ISPF dialog option 5 Analytics.

By default, DB2LOAD writes a LOAD control statement that uses FORMAT DELIMITED to determine column boundaries. To write a LOAD control statement that uses POSITION instead, specify DB2LOAD and FIXEDWIDTH.

To specify the DB2 table name in the LOAD statement, use the TABLE parameter. If you omit the TABLE parameter, the LOAD statement uses the value of the OUTPUT parameter as a placeholder for you to replace later with an actual table name.

To write a corresponding DB2 DDL CREATE TABLE statement, specify the SCHEMA parameter.

HCATALOG

Writes a Hive DDL CREATE TABLE statement (also referred to as an HCatalog table schema) that corresponds to the output CSV data, for use with Hadoop and Hadoop-based software such as IBM® BigInsights® and Cloudera. The CREATE TABLE statement creates a catalog table; specifically, an external table that uses data in the CSV files stored in the HDFS directory referred to by the LOCATION parameter.

Abbreviation: HCAT

To specify the table name, use the TABLE parameter. If you omit TABLE, the default table name is the ddname specified by the HCATALOG parameter.

To specify the value of the LOCATION keyword of the CREATE TABLE statement, use the LOCATION parameter:

LOCATION: The path of the HDFS directory that contains, or will contain, one or more CSV files that match the HCatalog table schema.
If you omit the LOCATION parameter from the CSV command, then the LOCATION keyword in the CREATE TABLE statement specifies the placeholder value HDFS_PATH for you to replace later with an actual HDFS path.

JSON-only parameters

The following parameters apply only to the JSON command. If you specify these parameters on a CSV command, they are ignored, with one exception: you cannot specify HORIZONTALSECTIONS on a CSV command.

LINES | ARRAY

LINES writes log data in JSON Lines format. Each output line is a JSON object that represents a log record. This is the default behavior.

ARRAY writes log data in a single JSON array. Each element of the array is a JSON object that represents a log record.

FLAT(YES|NO)

FLAT(YES) writes flat JSON: each JSON object is a sequence of key-value pairs without nested structures. This is the default behavior. You can abbreviate FLAT(YES) as FLAT.

FLAT(NO) writes JSON objects that reflect the structure of the original log records. Fields are nested in sections.

For details, see "Flat" JSON versus JSON with nested structures.

MISSING(EXCLUDE|INCLUDE)

Specifies whether to exclude or include fields that are selected for output but are missing from the input log record.

MISSING(EXCLUDE) excludes such missing fields from the output. This is the default behavior.

MISSING(INCLUDE) includes such missing fields in the output, with the JavaScript value null.

HORIZONTALSECTIONS

Only relevant to log records that contain repeating sections. Writes instances of the specified sections as array elements, keeping the values of each instance separate, rather than merging them into a single set of values for the section. For details, see Section parameters.

Common parameters

The following parameters apply to the CSV command and the JSON command.

CODE

Specifies the log type and code of the records to write to the CSV or JSON output.

You can limit the fields included in the output by specifying a FIELDS parameter, section parameters, or both.

If you do not specify CODE, you must specify FORM.

CODE(CMF) is equivalent to CODE(CMF:6E13).

SECTIONS

SECTIONS is one of several parameters that specify which sections of the input log records to include in the output, and how to handle repeating sections.

By default, the output includes all sections.

For details, see Section parameters.

FIELDS

Specifies which fields of the input log records to include in the output.

By default, the output includes all fields.

You can specify FIELDS either as a parameter of a CSV or JSON command, or as a qualifying command immediately following a CSV or JSON command.

For details, see the FIELDS command.

FORM

Specifies the name of a form that identifies the log type and code of the records to write to the CSV or JSON output, and which fields to write.

The form must be defined in the control repository that is specified by the FUWCDS ddname.

If you do not specify FORM, you must specify CODE.

ALLNUMBERS

Selects all numeric fields from the input log records.

To select additional, non-numeric, fields by name, use the FIELDS or FORM parameter.

FIELDCASE

Specifies whether to write field names or labels as is, in all uppercase, or in all lowercase (the default):

ASIS: The case specified by the FIELDS parameter. For example, FIELDS(CPUTIME:cpuTIME) and FIELDS(cpuTIME) both result in cpuTIME
Otherwise, if the FIELDS parameter is omitted, the case of the field name as defined by Transaction Analysis Workbench.
UPPER: All uppercase. For example: CPUTIME
LOWER: All lowercase. For example: cputime

OUTPUT(ddname)

Writes the CSV or JSON output to the specified ddname. For the CSV command, the default ddname is CSV. For the JSON command, the default ddname is JSON.

To include the current date or time in the output MVS data set name or z/OS UNIX file path:

When saving to an MVS data set, use OUTPUT with a NEWNAME parameter.
When saving to a z/OS UNIX file path, use the OUTPATH parameter, not OUTPUT.
Note: Only use OUTPATH if you need the output z/OS UNIX file path to include the current date or time. Otherwise, use OUTPUT to refer to a ddname that identifies a z/OS UNIX file path.

NEWNAME

Renames the CSV or JSON output after it has been saved to the MVS data set specified by the OUTPUT parameter.

Specify NEWNAME if you want the output MVS data set name to contain the current date or time.

The NEWNAME parameter value specifies an MVS data set name that can contain the following substitution variables:

+DATE: Substituted with the current date in the format Dyymmdd (the letter D followed by 2-digit year, month, and day)
+TIME: Substituted with the current time in the format Thhmmss (the letter T followed by 2-digit hour in 24-hour format, minutes, and seconds)

For example, if the date 30 November 2015 is and the time is 14:45:57, then the following parameter:

NEWNAME(SMFX.+DATE.+TIME.CSV)

renames the MVS data set that is specified by the OUTPUT parameter to:

SMFX.D151130.T144557.CSV

To append the date and time to the data set name that is specified by the OUTPUT parameter, specify two consecutive asterisks (**) as the high-level qualifier in NEWNAME. For example, if the output data set name is BIGDATA.CSV, then the following parameter:

NEWNAME(**.+DATE.+TIME)

renames the data set to:

BIGDATA.CSV.D151130.T144557

The NEWNAME parameter uses IDCAMS (the z/OS DFSMS Access Method Services IDCAMS utility) to rename the data set. IDCAMS messages are written to the FUWPRINT data set. Message IDC0531I indicates that the rename was successful.

OUTPUT(STDOUT)

Writes the CSV or JSON output to stdout, the standard output stream. Only applicable if the report and extract utility program, FUWBATCH, is running in a z/OS UNIX environment, such as a z/OS UNIX shell script.

OUTPATH

Writes the CSV or JSON output to the specified z/OS UNIX file path.

The OUTPATH parameter value can contain the same +DATE and +TIME substitution variables as the NEWNAME parameter.

For example, if the date 30 November 2015 is and the time is 14:45:57, then the following parameter:

OUTPATH(/u/analytics/+DATE+TIME.csv)

creates the following file:

/u/analytics/D151130.T144557.csv

OUTPUT(STDOUT)

Writes the CSV or JSON output to stdout, the standard output stream. Only applicable if FUWBATCH is running in a z/OS UNIX environment, such as a z/OS UNIX shell script.

STREAM

Writes the CSV or JSON output to the network stream that is defined by the named STREAM command.

Specifying STREAM sets the following default parameters: ASCII EOL(LF). You can override these defaults by explicitly specifying different parameters.

FILTER

Refers to the name of a filter that selects the input log records to write to the CSV or JSON output. The filter must be defined in the control repository specified by the FUWCDS ddname.

The filter must, as a minimum, select the log type and code specified by the FORM or CODE parameter. Otherwise, there will be no CSV or JSON output.

If the filter refers to forms, the forms are ignored.

A filter is optional, but is recommended to reduce output. If you do not specify a filter, all log records of the log type and code specified by the FORM parameter or CODE parameter are selected.

METADATA

Writes a JSON object containing metadata that describes the data in the CSV or JSON output. For each field in the output, the metadata includes the data type, such as numeric or string, and a short description. Applications can use this metadata to interpret and format the data.

SCHEMA

Writes a DB2 DDL CREATE TABLE statement (sometimes referred to as a table schema) that corresponds to the data in the CSV or JSON output. You can use the schema to create a DB2 table into which you can import the data.

To specify the table name, use the TABLE parameter. If you omit the TABLE parameter, the CREATE TABLE statement uses the value of the OUTPUT parameter as a placeholder for you to replace later with an actual table name.

TABLE

Specifies the table name used by the DB2LOAD, SCHEMA, and HCATALOG parameters. For example:

TABLE(MYSCHEMA.CICSPERF)

TYPE

Specifies a string that characterizes the type of data in the CSV or JSON output.

The effect of the TYPE parameter depends on whether it is specified on a CSV command or a JSON command:

On a CSV command, the TYPE parameter has the following effects:
- If the LOGSTASHCONFIG parameter is also specified, TYPE sets the value of the document_type setting in the Logstash config.
- If the TYPECOLUMN parameter is also specified, TYPE sets the value of the type column in the CSV data.
Otherwise, if a CSV command specifies neither LOGSTASHSCONFIG nor TYPECOLUMN, the TYPE parameter has no effect.
On a JSON command, the TYPE parameter sets the value of the type property in each output event.

The default value is the log type and code of the input log records in lowercase, separated by a hyphen. If the VERTICALSECTION parameter is specified, the default value ends with the section name in lowercase, separated from the log code by another hyphen.

Examples:

The following JSON command, without a TYPE parameter:
```
JSON … CODE(DTR:001) …
```
writes events that contain the following field:
```
"type":"dtr-001"
```
The following JSON command, with a VERTICALSECTION parameter:
```
JSON … CODE(DTR:001) SECTIONS(HEADER) VERTICALSECTION(QWSA) …
```
writes events that contain the following field:
```
"type":"dtr-001-qwsa"
```
The following JSON command, with an explicit TYPE parameter:
```
JSON … TYPE(DB2-stats-control) …
```
writes events that contain the following field:
```
"type":"DB2-stats-control"
```

LOGSTASHCONFIGURATION

Writes a Logstash configuration file (config) that corresponds to the CSV or JSON output data, for use with Elasticsearch.

Abbreviations: LOGSTASHCONFIG, LSCONFIG

The Logstash config for JSON is the same for all log record types. If you forward logs as JSON (more specifically, JSON Lines), then you can use a single Logstash config to ingest all log record types from Transaction Analysis Workbench.

However, Logstash configs for CSV are specific not only to each log record type, but also to the particular set of fields that you select to forward with each CSV command. Logstash configs for CSV contain a columns option that identifies numeric fields. Without the convert option, Logstash would forward all values as strings. The Logstash config for JSON does not require a convert option, because JSON inherently distinguishes between numeric and string values.

HOST

The value of the hosts option in the output section of the Logstash config. The hosts option refers to the system running Elasticsearch. The default value is localhost, the system running Logstash.

LOGSTASHINDEX

The value of the index property in the output section of the Logstash config. The index property specifies the name of the Elasticsearch index.

Abbreviation: LSINDEX

Default:

index => "fuw-%{type}-%{+YYYY.MM.dd}"

The Logstash config created by the LOGSTASHCONFIGURATION parameter contains the following date filter:

date {
  match => ["time", "ISO8601"]
}

The date filter uses the value of the time (event time stamp) field in the CSV or JSON data to set the value of the Logstash @timestamp field. The date filter assumes that the value of the time field matches the pattern specified by the Logstash format literal ISO8601. That is true only if the CSV or JSON command specifies a corresponding TIMEFORMAT parameter, such as TIMEFORMAT(ISO8601). If the CSV or JSON command outputs time stamps in a different format, such as the default JDBC format, then you must edit the Logstash config and specify an appropriate pattern in the date filter.

TOKENS

Appends fields known as correlation tokens to the end of each output record:

TOKENS parameter value	Field name inserted in output	Description
ALL	All token fields: `ACCTOKEN`, `LUWID`, `IMSTOKEN`	A synonym for `TOKENS(CICS,DB2,IMS)`. All correlation tokens are appended to the record. ALL is recommended when you need to correlate CSV or JSON outputs that contain data from different subsystems. For example, the DB2 accounting record will contain the IMS recovery token when the connection to DB2 is IMS (MASS). Tip: If you want all correlation tokens, but with the `LUWID` value as generated by `TOKENS(DB2+)` (that is, decremented), specify `TOKENS(ALL,DB2+)`.
CICS	`ACCTOKEN`	CICS® accounting token.
DB2	`LUWID`	DB2 logical unit of work ID (LUWID).
DB2+	`LUWID`	Same as `TOKENS(DB2)` except that the LUWID commit count in the log record is decremented by 1. For some records that DB2 writes at commit time, DB2 increments the LUWID commit count before writing the record, so the LUWID commit count in the record is incorrect. For these records, such as DB2 accounting records (log type DB2, log code 003), specify DB2+. The adjusted LUWID will reflect the correct unit of work, and can be used for correlation between subsystems. For example, the LUWID `FTS3/DBA6LU/CC17F01EF3DA/2` will be converted to `FTS3/DBA6LU/CC17F01EF3DA/1`, which is the actual LUWID of the DB2 thread.
IMS	`IMSTOKEN`	IMS recovery token.

For more information about these tokens, see Token fields to correlate activity within and between subsystems.

ASCII | EBCDIC

The output encoding: ASCII or EBCDIC. The default is EBCDIC, with one exception: if the STREAM parameter is specified, the default is ASCII.

CODEPAGE

Only applies if the EBCDIC parameter is specified. CODEPAGE specifies the EBCDIC code page that is used to encode square brackets in the following outputs:

Log data converted to JSON by the JSON command
JSON-format metadata created by the METADATA parameter
Logstash configuration file created by the LOGSTASHCONFIGURATION parameter of the CSV command

You must specify the code page identifiers exactly as shown in the syntax. The default is 1047.

You can specify CODEPAGE either as a parameter of a CSV or JSON command, or as a separate command. The CODEPAGE command applies to all subsequent CSV and JSON commands. Specifying CODEPAGE as a parameter overrides the CODEPAGE command, but only for the current CSV or JSON command.

If you specify the ASCII parameter, CODEPAGE is ignored for this CSV or JSON command.

EOL(CR|CRLF|LF|NL|NONE)

The end-of-line (EOL) delimiter. Appends one or more bytes to the end of each output record. The default value is EOL(NONE) (no delimiter), with one exception: if STREAM is specified, the default is EOL(LF).

The actual bytes appended depend on which EOL parameter value you specify, and whether you specify the ASCII or EBCDIC parameter.

EOL parameter value	Description	ASCII	EBCDIC
CR	Carriage return (`\r`)	X'0D'	X'0D'
CRLF	Carriage return and line feed (`\r\n`): the Windows EOL character pair	X'0D0A'	X'0D25'
LF	Line feed (`\n`): the default UNIX EOL character	X'0A'	X'25'
NL	Newline	Not applicable	X'15'

When writing to a z/OS UNIX file, the appropriate EOL delimiter depends on whether you want to write output encoded in ASCII or EBCDIC:

To write ASCII-encoded output to a z/OS UNIX file:
- On the DD statement for the file, use the default FILEDATA parameter value, FILEDATA=BINARY. Do not specify FILEDATA=TEXT. If you specify FILEDATA=TEXT, the output file will contain an EBCDIC newline character (X'15') at the end of each line, which is not appropriate for an ASCII-encoded file.
- On the CSV or JSON command, specify the ASCII parameter and your choice of EOL parameter value: CR, CRLF, or LF.
To write EBCDIC-encoded output to a z/OS UNIX file, either:
- On the DD statement for the file, use the default FILEDATA parameter value, FILEDATA=BINARY, and on the CSV or JSON command, specify EOL(NL).
- On the DD statement for the file, specify FILEDATA=TEXT, and on the CSV or JSON command, specify EOL(NONE) (the default value).
Do not specify the combination of FILEDATA=TEXT on the DD statement and an end-of-line delimiter on the CSV or JSON command; if you do, each output record will have two EOL delimiters: the delimiter specified by the CSV or JSON command followed by the X'15' appended by FILEDATA=TEXT.

The default output EBCDIC code page is 1047. For details, see the CODEPAGE parameter.

TIMEFORMAT

Specifies the format of time stamp field values in output CSV or JSON data.

For more details, see Time stamps in CSV and JSON.

OUTLIM

Limits the number of output CSV or JSON records. The default is no limit.

The OUTLIM value must be a positive integer up to 99999999.

The OUTLIM value refers to the number of output records containing data; it does not include the header row in a CSV file, if requested, or records at the start and end of a JSON file that ensure valid JSON syntax.

If you specify the VERTICALSECTION parameter, then a single input log record containing multiple instances of the specified section will generate multiple output records. OUTLIM refers to the multiple output records, not the single input log record.

MB: Changes the behavior of the OUTLIM(n) parameter to limit output to n megabytes rather than n output records. Output ends at the last complete record that fits inside the limit.