CSV and JSON commands
The CSV command converts logs to comma-separated values (CSV) format. The JSON command converts logs to JavaScript Object Notation (JSON) format. The CSV and JSON commands have similar syntax with many common parameters. Both commands can write to a ddname, a z/OS® UNIX file path, or a TCP socket.
Except for a few parameters that apply only to CSV or only to JSON, the difference between a command that converts log data to CSV and a command that converts the same log data to JSON is the command keyword, CSV or JSON. If you already have a CSV command, but you want to output JSON instead, then, as a starting point, simply replace the CSV command keyword with JSON.
To create a CSV file specifically for the Mobile Workload Reporting Tool (MWRT), use the MWP command instead of the CSV command.
To convert logs to a delimiter-separated values (DSV) format that uses a delimiter other than a comma, use the CSV command with a DELIMITER parameter.
Each CSV or JSON command processes log records of one specific combination of log type and code. To specify the log type and code, you must use either a CODE parameter or a FORM parameter.
A SYSIN data set can contain multiple CSV and JSON commands for the same log type and code or any combination of different log types and codes.
A CSV or JSON command that writes to a ddname cannot write to the same ddname as another command. However, multiple JSON commands can write to the same network stream or to stdout.
To filter the input log records that are processed by a CSV or JSON command, use one of the following methods:
- Immediately after the CSV or JSON command, specify filter conditions in CODE qualifying commands
- In the CSV or JSON command, specify a FILTER parameter that refers to a filter in a control repository
You can use the CSV or JSON command to produce some outputs without any input logs. For details, see the SCHEMAONLY global command.
Format
- 1 Section parameters specify which sections of a log record to include in the output, and how to handle repeating sections.
- 2 The FIELDS parameter specifies which fields of a log record to include in the output. The FIELDS parameter has the same syntax as the FIELDS qualifying command.
CSV-only parameters
The following parameters apply only to the CSV command. If you specify these parameters on a JSON command, they are ignored.
There is one exception: the FIXEDWIDH parameter applies to both the CSV command and the JSON command. FIXEDWIDTH is presented here as a CSV-only parameter because, while you can use it with the JSON command, its behavior is much more likely to be useful for CSV output than JSON output.
- The field delimiter, also known as the column delimiter: the character that separates field
values. The default delimiter is a comma, with one exception: if the TIMEFORMAT
parameter specifies a comma as the decimal sign (the character that separates the integer number of
seconds from the decimal fraction), the default delimiter is a semicolon.
Strictly, the output is comma-separated values (CSV) only if the delimiter is a comma. Otherwise, the output is more correctly known as delimiter-separated values (DSV). CSV is a specific type of DSV that uses a comma as the delimiter.
- LABELS includes a header row with field labels. The default NOLABELS omits the header row.
- Right-pads field values, including numeric field values, with blanks to the maximum width of
each field.
The maximum output width of numeric fields is 9 characters, including any decimal point. For example, the floating-point value 12.456, which has an output width of 6 characters (2-digit integer part, a decimal point, and a 3-digit decimal fraction), will be right-padded with 3 blanks. The 4-digit integer value 1234 will be right-padded with 5 blanks.
By default, the maximum output width of a character field depends on the original log record definition. To set a different maximum output width, use the WIDTH parameter of the FIELDS qualifying command. For example, if a character field has a maximum width of 8 characters, then the 4-character value MYZ1 will be right-padded with 4 blanks.
Specifying FIXEDWIDTH causes the DB2LOAD parameter to create field specifications that use the POSITION option to determine column boundaries, rather than relying on delimiters.
- Inserts the TYPE parameter value as the second column of each output CSV
data row.
Some analytics platforms can use data in each incoming event to detect the event type, and then interpret the data accordingly. Using TYPECOLUMN to include such data as a column in each CSV row enables analytics platforms to interpret a stream of heterogeneous CSV rows.
If the LABELS parameter is also specified, then the corresponding header row label is Type, TYPE, or type, depending on whether the FIELDCASE parameter specifies
ASIS
,UPPER
, or the default valueLOWER
. However, if you are writing heterogeneous CSV rows (rows with different column layouts) from multiple CSV commands to the same destination, such as a stream, then it is unlikely that you want to include header rows in the output. - Writes a DB2® load utility LOAD control
statement that loads the CSV data output by this CSV command into a DB2 table.
To create JCL that runs the DB2 load utility with this generated LOAD statement, see Transaction Analysis Workbench ISPF dialog option 5 .
By default, DB2LOAD writes a LOAD control statement that uses FORMAT DELIMITED to determine column boundaries. To write a LOAD control statement that uses POSITION instead, specify DB2LOAD and FIXEDWIDTH.
To specify the DB2 table name in the LOAD statement, use the TABLE parameter. If you omit the TABLE parameter, the LOAD statement uses the value of the OUTPUT parameter as a placeholder for you to replace later with an actual table name.
To write a corresponding DB2 DDL CREATE TABLE statement, specify the SCHEMA parameter.
- Writes a Hive DDL CREATE TABLE statement (also referred to as an HCatalog
table schema) that corresponds to the output CSV data, for use with Hadoop and Hadoop-based software
such as IBM®
BigInsights® and Cloudera. The CREATE
TABLE statement creates a catalog table; specifically, an external table that uses data in
the CSV files stored in the HDFS directory referred to by the LOCATION
parameter.
Abbreviation: HCAT
To specify the table name, use the TABLE parameter. If you omit TABLE, the default table name is the ddname specified by the HCATALOG parameter.
To specify the value of the LOCATION keyword of the CREATE TABLE statement, use the LOCATION parameter:
- LOCATION
- The path of the HDFS directory that contains, or will contain, one or more CSV files that match
the HCatalog table schema.
If you omit the LOCATION parameter from the CSV command, then the LOCATION keyword in the CREATE TABLE statement specifies the placeholder value HDFS_PATH for you to replace later with an actual HDFS path.
JSON-only parameters
The following parameters apply only to the JSON command. If you specify these parameters on a CSV command, they are ignored, with one exception: you cannot specify HORIZONTALSECTIONS on a CSV command.
- LINES writes log data in JSON Lines format. Each
output line is a JSON object that represents a log record. This is the default
behavior.
ARRAY writes log data in a single JSON array. Each element of the array is a JSON object that represents a log record.
FLAT(YES)
writesflat
JSON: each JSON object is a sequence of key-value pairs without nested structures. This is the default behavior. You can abbreviateFLAT(YES)
asFLAT
.FLAT(NO)
writes JSON objects that reflect the structure of the original log records. Fields are nested in sections.For details, see "Flat" JSON versus JSON with nested structures.
- Specifies whether to exclude or include fields that are selected for
output but are missing from the input log record.
MISSING(EXCLUDE)
excludes such missing fields from the output. This is the default behavior.MISSING(INCLUDE)
includes such missing fields in the output, with the JavaScript valuenull
. - Only relevant to log records that contain repeating sections. Writes instances of the specified sections as array elements, keeping the values of each instance separate, rather than merging them into a single set of values for the section. For details, see Section parameters.
Common parameters
The following parameters apply to the CSV command and the JSON command.
- Specifies the log type and
code of the records to write to the CSV or JSON output.
You can limit the fields included in the output by specifying a FIELDS parameter, section parameters, or both.
If you do not specify CODE, you must specify FORM.
CODE(CMF)
is equivalent toCODE(CMF:6E13)
. - SECTIONS is one of several parameters that specify which sections of the
input log records to include in the output, and how to handle repeating sections.
By default, the output includes all sections.
For details, see Section parameters.
- Specifies which fields of the input log records to include in the output.
By default, the output includes all fields.
You can specify FIELDS either as a parameter of a CSV or JSON command, or as a qualifying command immediately following a CSV or JSON command.
For details, see the FIELDS command.
- Specifies the name of a form that identifies the log type and code of the records to write to
the CSV or JSON output, and which fields to write.
The form must be defined in the control repository that is specified by the FUWCDS ddname.
If you do not specify FORM, you must specify CODE.
- Selects all numeric fields from the input log records.
To select additional, non-numeric, fields by name, use the FIELDS or FORM parameter.
- Specifies whether to write field names or labels
as is
, in all uppercase, or in all lowercase (the default):- The case specified by the FIELDS parameter. For example,
FIELDS(CPUTIME:cpuTIME)
andFIELDS(cpuTIME)
both result in cpuTIMEOtherwise, if the FIELDS parameter is omitted, the case of the field name as defined by Transaction Analysis Workbench.
- All uppercase. For example: CPUTIME
- All lowercase. For example: cputime
- The case specified by the FIELDS parameter. For example,
- Writes the CSV or JSON output to the specified ddname. For the CSV command,
the default ddname is CSV. For the JSON command, the default ddname is JSON.
To include the current date or time in the output MVS data set name or z/OS UNIX file path:
- Renames the CSV or JSON output after it has been saved to the MVS data set specified by the OUTPUT parameter.
Specify NEWNAME if you want the output MVS data set name to contain the current date or time.
The NEWNAME parameter value specifies an MVS data set name that can contain the following substitution variables:
- +DATE
- Substituted with the current date in the format
Dyymmdd
(the letter D followed by 2-digit year, month, and day) - +TIME
- Substituted with the current time in the format
Thhmmss
(the letter T followed by 2-digit hour in 24-hour format, minutes, and seconds)
For example, if the date 30 November 2015 is and the time is 14:45:57, then the following parameter:
NEWNAME(SMFX.+DATE.+TIME.CSV)
renames the MVS data set that is specified by the OUTPUT parameter to:
SMFX.D151130.T144557.CSV
To append the date and time to the data set name that is specified by the OUTPUT parameter, specify two consecutive asterisks (
**
) as the high-level qualifier in NEWNAME. For example, if the output data set name isBIGDATA.CSV
, then the following parameter:NEWNAME(**.+DATE.+TIME)
renames the data set to:
BIGDATA.CSV.D151130.T144557
The NEWNAME parameter uses IDCAMS (the z/OS DFSMS Access Method Services IDCAMS utility) to rename the data set. IDCAMS messages are written to the FUWPRINT data set. Message IDC0531I indicates that the rename was successful.
- Writes the CSV or JSON output to stdout, the standard output stream. Only applicable if the report and extract utility program, FUWBATCH, is running in a z/OS UNIX environment, such as a z/OS UNIX shell script.
- Writes the CSV or JSON output to the specified z/OS
UNIX file path.
The OUTPATH parameter value can contain the same
+DATE
and+TIME
substitution variables as the NEWNAME parameter.For example, if the date 30 November 2015 is and the time is 14:45:57, then the following parameter:
OUTPATH(/u/analytics/+DATE+TIME.csv)
creates the following file:
/u/analytics/D151130.T144557.csv
See also PATHMODE and PATHOPTS parameters.
- Writes the CSV or JSON output to stdout, the standard output stream. Only applicable if FUWBATCH is running in a z/OS UNIX environment, such as a z/OS UNIX shell script.
- Writes the CSV or JSON output to the network stream that is defined by the named STREAM
command.
Specifying STREAM sets the following default parameters:
ASCII EOL(LF)
. You can override these defaults by explicitly specifying different parameters. - Refers to the name of a filter that selects the input log records to write to the CSV or JSON
output. The filter must be defined in the control repository specified by the FUWCDS ddname.
The filter must, as a minimum, select the log type and code specified by the FORM or CODE parameter. Otherwise, there will be no CSV or JSON output.
If the filter refers to forms, the forms are ignored.
A filter is optional, but is recommended to reduce output. If you do not specify a filter, all log records of the log type and code specified by the FORM parameter or CODE parameter are selected.
- Writes a JSON object containing metadata that describes the data in the CSV or JSON output. For each field in the output, the metadata includes the data type, such as numeric or string, and a short description. Applications can use this metadata to interpret and format the data.
- Writes a DB2 DDL CREATE TABLE statement
(sometimes referred to as a table schema) that corresponds to the data in the CSV or JSON output.
You can use the schema to create a DB2 table into which you
can import the data.
To specify the table name, use the TABLE parameter. If you omit the TABLE parameter, the CREATE TABLE statement uses the value of the OUTPUT parameter as a placeholder for you to replace later with an actual table name.
- Specifies the table name used by the DB2LOAD, SCHEMA,
and HCATALOG parameters. For
example:
TABLE(MYSCHEMA.CICSPERF)
- Specifies a string that characterizes the type of data in the CSV or JSON output.
The effect of the TYPE parameter depends on whether it is specified on a CSV command or a JSON command:
- On a CSV command, the TYPE parameter has the following
effects:
- If the LOGSTASHCONFIG parameter is also specified, TYPE sets the value of the document_type setting in the Logstash config.
- If the TYPECOLUMN parameter is also specified, TYPE sets the value of the type column in the CSV data.
Otherwise, if a CSV command specifies neither LOGSTASHSCONFIG nor TYPECOLUMN, the TYPE parameter has no effect.
- On a JSON command, the TYPE parameter sets the value of the type property in each output event.
The default value is the log type and code of the input log records in lowercase, separated by a hyphen. If the VERTICALSECTION parameter is specified, the default value ends with the section name in lowercase, separated from the log code by another hyphen.
Examples:
- The following JSON command, without a TYPE
parameter:
JSON … CODE(DTR:001) …
writes events that contain the following field:
"type":"dtr-001"
- The following JSON command, with a VERTICALSECTION
parameter:
JSON … CODE(DTR:001) SECTIONS(HEADER) VERTICALSECTION(QWSA) …
writes events that contain the following field:
"type":"dtr-001-qwsa"
- The following JSON command, with an explicit TYPE
parameter:
JSON … TYPE(DB2-stats-control) …
writes events that contain the following field:
"type":"DB2-stats-control"
- On a CSV command, the TYPE parameter has the following
effects:
- Writes a Logstash configuration file (
config
) that corresponds to the CSV or JSON output data, for use with Elasticsearch.Abbreviations: LOGSTASHCONFIG, LSCONFIG
The Logstash config for JSON is the same for all log record types. If you forward logs as JSON (more specifically, JSON Lines), then you can use a single Logstash config to ingest all log record types from Transaction Analysis Workbench.
However, Logstash configs for CSV are specific not only to each log record type, but also to the particular set of fields that you select to forward with each CSV command. Logstash configs for CSV contain a
columns
option that identifies numeric fields. Without theconvert
option, Logstash would forward all values as strings. The Logstash config for JSON does not require aconvert
option, because JSON inherently distinguishes between numeric and string values.- The value of the hosts option in the output section of the Logstash config. The hosts option refers to the system running Elasticsearch. The default value is localhost, the system running Logstash.
- The value of the index property in the output section of the Logstash
config. The index property specifies the name of the Elasticsearch
index.
Abbreviation: LSINDEX
Default:
index => "fuw-%{type}-%{+YYYY.MM.dd}"
The Logstash config created by the LOGSTASHCONFIGURATION parameter contains the following date filter:
date { match => ["time", "ISO8601"] }
The date filter uses the value of the
time
(event time stamp) field in the CSV or JSON data to set the value of the Logstash@timestamp
field. The date filter assumes that the value of thetime
field matches the pattern specified by the Logstash format literalISO8601
. That is true only if the CSV or JSON command specifies a corresponding TIMEFORMAT parameter, such asTIMEFORMAT(ISO8601)
. If the CSV or JSON command outputs time stamps in a different format, such as the default JDBC format, then you must edit the Logstash config and specify an appropriate pattern in the date filter. - Appends fields known as
correlation tokens
to the end of each output record:TOKENS parameter value Field name inserted in output Description ALL All token fields: ACCTOKEN
,LUWID
,IMSTOKEN
A synonym for TOKENS(CICS,DB2,IMS)
. All correlation tokens are appended to the record. ALL is recommended when you need to correlate CSV or JSON outputs that contain data from different subsystems.For example, the DB2 accounting record will contain the IMS recovery token when the connection to DB2 is IMS (
MASS
).Tip: If you want all correlation tokens, but with theLUWID
value as generated byTOKENS(DB2+)
(that is, decremented), specifyTOKENS(ALL,DB2+)
.CICS ACCTOKEN
CICS® accounting token. DB2 LUWID
DB2 logical unit of work ID (LUWID). DB2+ LUWID
Same as TOKENS(DB2)
except that the LUWID commit count in the log record is decremented by 1.For some records that DB2 writes at commit time, DB2 increments the LUWID commit count before writing the record, so the LUWID commit count in the record is incorrect. For these records, such as DB2 accounting records (log type DB2, log code 003), specify DB2+. The adjusted LUWID will reflect the correct unit of work, and can be used for correlation between subsystems.
For example, the LUWID FTS3/DBA6LU/CC17F01EF3DA/2 will be converted to FTS3/DBA6LU/CC17F01EF3DA/1, which is the actual LUWID of the DB2 thread.
IMS IMSTOKEN
IMS recovery token. For more information about these tokens, see Token fields to correlate activity within and between subsystems.
- The output encoding: ASCII or EBCDIC. The default is EBCDIC, with one exception: if the STREAM parameter is specified, the default is ASCII.
- Only applies if the EBCDIC parameter is specified.
CODEPAGE specifies the EBCDIC code page that is used to encode square brackets
in the following outputs:
- Log data converted to JSON by the JSON command
- JSON-format metadata created by the METADATA parameter
- Logstash configuration file created by the LOGSTASHCONFIGURATION parameter of the CSV command
You must specify the code page identifiers exactly as shown in the syntax. The default is
1047
.You can specify CODEPAGE either as a parameter of a CSV or JSON command, or as a separate command. The CODEPAGE command applies to all subsequent CSV and JSON commands. Specifying CODEPAGE as a parameter overrides the CODEPAGE command, but only for the current CSV or JSON command.
If you specify the ASCII parameter, CODEPAGE is ignored for this CSV or JSON command.
- The end-of-line (EOL) delimiter. Appends one or more bytes to the end of each output record. The
default value is
EOL(NONE)
(no delimiter), with one exception: if STREAM is specified, the default isEOL(LF)
.The actual bytes appended depend on which EOL parameter value you specify, and whether you specify the ASCII or EBCDIC parameter.
EOL parameter value Description ASCII EBCDIC CR Carriage return ( \r
)X'0D' X'0D' CRLF Carriage return and line feed ( \r\n
): the Windows EOL character pairX'0D0A' X'0D25' LF Line feed ( \n
): the default UNIX EOL characterX'0A' X'25' NL Newline Not applicable X'15' When writing to a z/OS UNIX file, the appropriate EOL delimiter depends on whether you want to write output encoded in ASCII or EBCDIC:
- To write ASCII-encoded output to a z/OS
UNIX file:
- On the DD statement for the file, use the default FILEDATA parameter value,
FILEDATA=BINARY
. Do not specifyFILEDATA=TEXT
. If you specifyFILEDATA=TEXT
, the output file will contain an EBCDIC newline character (X'15') at the end of each line, which is not appropriate for an ASCII-encoded file. - On the CSV or JSON command, specify the ASCII parameter and your choice of EOL parameter value: CR, CRLF, or LF.
- On the DD statement for the file, use the default FILEDATA parameter value,
- To write EBCDIC-encoded output to a z/OS
UNIX file, either:
- On the DD statement for the file, use the default FILEDATA parameter value,
FILEDATA=BINARY
, and on the CSV or JSON command, specifyEOL(NL)
. - On the DD statement for the file, specify
FILEDATA=TEXT
, and on the CSV or JSON command, specifyEOL(NONE)
(the default value).
Do not specify the combination of
FILEDATA=TEXT
on the DD statement and an end-of-line delimiter on the CSV or JSON command; if you do, each output record will have two EOL delimiters: the delimiter specified by the CSV or JSON command followed by the X'15' appended byFILEDATA=TEXT
.The default output EBCDIC code page is 1047. For details, see the CODEPAGE parameter.
- On the DD statement for the file, use the default FILEDATA parameter value,
- To write ASCII-encoded output to a z/OS
UNIX file:
- Specifies the format of time stamp field values in output CSV or JSON data.
For more details, see Time stamps in CSV and JSON.
- Limits the number of output CSV or JSON records. The default is no limit.
The OUTLIM value must be a positive integer up to 99999999.
The OUTLIM value refers to the number of output records containing data; it does not include the header row in a CSV file, if requested, or records at the start and end of a JSON file that ensure valid JSON syntax.
If you specify the VERTICALSECTION parameter, then a single input log record containing multiple instances of the specified section will generate multiple output records. OUTLIM refers to the multiple output records, not the single input log record.
- MB
- Changes the behavior of the
OUTLIM(n)
parameter to limit output to n megabytes rather than n output records. Output ends at the last complete record that fits inside the limit.