IBM InfoSphere Streams Version 4.1.1

Operator `CSVParse`

SPL standard and specialized toolkits > com.ibm.streams.teda 1.0.2 > com.ibm.streams.teda.parser.text > CSVParse

The CSVParse operator parses an input line of comma-separated values (CSV) that is received in an rstring type input tuple attribute, splits the line into fields (also called items), and assigns the field values to attributes in the output port schema. This functionality is similar to that of the the spl.utility::Parse operator in CSV mode, but it supports greater flexibility in assigning CSV fields to output attributes. CSV input fields must be mapped to output tuple attributes in a separate XML document, and the mappingDocument parameter must specify the location of this file. Input lines that cannot be parsed or converted successfully are sent to the error port (output port 2).

The operator supports the following features:

Output attributes that can be assigned an index from CSV fields
For example if you have 5 strings in the input line, you can select strings with indices 0, 2, and 4 to be assigned to 3 named attributes in the output port.
Multiple mappings in the XML document

You can specify multiple mappings in the XML document. To determine which mapping should be applied to a certain CSV line during runtime, the mappings can have filter conditions. A filter consists of an index (the column number of a CSV field) and a fixed string. The actual field value of each input line in the index is compared to the configured fixed string to determine whether the filter applies and the mapping can be used for this line. If multiple filters match a given input line, the first mapping that matches is applied.
Data validation

For each mapping, specify the minimum and maximum number of CSV fields expected in the input line. If during runtime a certain CSV input line for the mapping contains fewer or greater fields than expected, the tuple is forwarded to the error port.
Conversion to SPL primitive types

For example, you can assign CSV field 3 to an int8 type SPL output attribute. If the conversion from string to int8 fails during runtime, the tuple is sent to the error port.
Input record type unification with a common output port schema

For example, you receive two types of input lines, one with four CSV fields and one with eight CSV fields. The first field is an ID that uniquely identifies the line type. Your output stream contains the ID, the second and fourth fields for the first line type and the fifth and seventh fields for the second line type.

The output stream has the following type:
```
tuple<rstring id, rstring A, rstring B>
```
The incoming data has two line types. For example:
```
TypeX,Hello,My,World
TypeY,Second,Third,Fourth,Fifth,Sixth,Seventh,Eighth
TypeX,MyNameIs,Peter,Smith
TypeY,A,B,C,D,E,F,G
```
If you specify filters and mapping, the output data is:
```
TypeX,Hello,World
TypeY,Fifth,Seventh
TypeX,MyNameIs,Smith
TypeY,D,F
```

Example 1: Basic use case

Suppose you want to read a CSV file with records for different colors and their red-green-blue (RGB) values:

COLOR,Red,255,0,0
COLOR,Green,0,255,0

The lines contain 5 fields: record type, color name, and the 3 RGB values. Suppose you are only interested in the RGB values and want to generate an output tuple for each record with the following schema:

ColorTuple = tuple
<
	uint8 R,
	uint8 G,
	uint8 B
>;

To ignore the first two fields, use the following XML mapping document as specification for the CSVParse operator:

<?xml version="1.0" encoding="UTF-8"?>
<mappings xmlns="http://www.ibm.com/software/data/infosphere/streams/csvparser">
	<mapping name="colors">
		<assign attribute="R" index="2"/>
		<assign attribute="G" index="3"/>
		<assign attribute="B" index="4"/>
	</mapping>
</mappings>

The root element in the XML document is mappings. It can have an arbitrary number of mapping element children. Each mapping element must have a name attribute that uniquely identifies the mapping in the XML document. Within a mapping, you can use the assign element to specify the name of an SPL output attribute and the index of a CSV field. The value of this CSV field is then assigned to the SPL attribute.

If an input line cannot be successfully converted to the output tuple, the input tuple is forwarded to the error port. The following line will be rejected because the value 1024 cannot be converted to the uint8 type SPL attribute B:

COLOR,Blue,0,0,1024

If you want to ensure that only input records with colors generate output tuples, you can set up a filter for this mapping. The following example causes the operator to do a string comparison of the CSV field with index 0 against the string COLOR:

<mapping name="colors" filterIndex="0" filterValue="COLOR">

If the comparison fails, the mapping is not applied. So with this filter, an input line like the following example is dropped:

SHAPE,Rectangle,0,0,100,100

To ensure that each color input record contains enough fields to assign the R,G and B output tuple attributes, you can add itemCount XML attributes to the mapping, for example:

<mapping name="colors" itemCountMin="5">

If during runtime a color record contains less than 5 fields, the input tuple is sent to the error port. Similarly, you can specify a maximum number of allowed fields by using the itemCountMax XML attribute. To check for an exact number of fields, set itemCountMin and itemCountMax to the same value. If you don't validate the number of CSV fields in input records and an input line contains less fields than needed to assign the specified output tuple attributes, the remaining SPL attributes will get their default values.

Example 2: Multiple mappings

Suppose you want to read the color specifications from Example 1, but you also want to get tuples for the SHAPE records. The CSV input could look like the following example:

COLOR,Red,255,0,0
SHAPE,Rectangle,0,0,100,100	
COLOR,Green,0,255,0
SHAPE,Circle,0,0,100

Some lines contain COLOR records and have 5 fields. Other lines contain SHAPE records and have 4 or more fields. These input records could be mapped to the following unified SPL schema. Each input line generates one output tuple. For COLOR type lines, the attributes for the shape (name and coordinates) get their SPL default values. For SHAPE type lines, the color attributes get their defaults. The recordType attribute gets the string from CSV field 0 for all records.

ColorsAndShapes = tuple
<
	rstring recordType,
	uint8 R,
	uint8 G,
	uint8 B,
	rstring shapeName,
	float64 xCoordinate,
	float64 yCoordinate
>;

A mapping document for this setup could look like the following example:

<?xml version="1.0" encoding="UTF-8"?>
<mappings xmlns="http://www.ibm.com/software/data/infosphere/streams/csvparser">
	<mapping name="colors" filterIndex="0" filterValue="COLOR" itemCountMin="5">
		<assign attribute="recordType" index="0"/>
		<assign attribute="R" index="2"/>
		<assign attribute="G" index="3"/>
		<assign attribute="B" index="4"/>
	</mapping>
	<mapping name="shapes" filterIndex="0" filterValue="SHAPE" itemCountMin="4">
		<assign attribute="recordType" index="0"/>
		<assign attribute="shapeName" index="1"/>
		<assign attribute="xCoordinate" index="2"/>
		<assign attribute="yCoordinate" index="3"/>
	</mapping>
</mappings>

The mapping document

The XML mapping document specifies the mapping between CSV fields and output attributes. The operator validates the XML mapping document against the etc/xsd/CSVParseMapping.xsd XML schema definition (XSD) file in the toolkit directory. This XSD file contains reference documentation for the XML mapping document, which is located, by default, in the SPL application directory. The root element in the XML document is mappings. It can have an arbitrary number of mapping element children. The mappings element can have one attribute named useDefaults. This attribute can also be applied to the mapping element. If set on the mappings element, it is applied to all mapping elements. The attribute is described in the next section.

Mapping Element

A mapping specifies the assignment of CSV items from the input to SPL attributes in the output. A mapping should be defined for each CSV line format. Each mapping has a unique name and can define a filter that is matched against a CSV input line to determine if a mapping applies to that line. The content of the mapping can be an automatic or manual assignment specification. As shown in the example above, the mapping element can have 6 attributes:

name
A unique name for the mapping. The name is also used for the record statistics custom output functions.
filterIndex

The CSV field at this index that is used for comparison. If you do not specify a value for this attribute and the filterValue attribute is present, the index defaults to zero.
filterValue

The string that is compared against the specified CSV field in the input line. If the value matches, this mapping is applied to the CSV line. If the filterIndex attribute is present, this attribute is mandatory.
itemCountMin

If this attribute is specified and the mapping applies to a given CSV line, the number of fields in the CSV line is checked against this value. If the line has less than the specified number of items, this is treated as an error.
itemCountMax

If this attribute is present and the mapping applies to a given CSV line, the number of fields in the CSV line is checked against this value. If the line has more than the specified number of items, an error occurs.
useDefaults

If this attribute is set to true all empty CSV fields (strings of length zero) that shall be mapped to SPL numeric attributes (integers and floats), will be set to zero. If the attribute is set to false, the operator will instead produce an error record, because the conversion is treated as failed. This applies to all attributes in the current mapping. If this attribute is also set in the mappings element, the setting in the mapping element takes precedence. The default value for this attribute is false.

Manual Assignment

A manual assignment consists of a list of assignments. For each assignment, you can specify the index in the CSV item list and the target SPL attribute name. With a manual assignment, you can only use the assign element within a mapping element. The assign element has two optional attributes:

index
The index of the CSV field in the input line (starts with zero). If this attribute is omitted, the index of the preceding assignment is incremented. So the next CSV field is picked up.
attribute

The name of the SPL output attribute to which to assign the value. If this attribute is omitted, the next unassigned output attribute is used.

Consequently, the color mapping from Example 1 can also be written as:

<mapping name="colors">
	<assign attribute="R" index="2"/>
	<assign/>
	<assign/>
</mapping>

In this case, the two assign elements that do not have attributes select the G and B SPL attributes because they are the next in the output port schema and are assigned CSV fields 3 and 4 because the first assign element starts with index 2.

Automatic Assignment

An automatic assignment can consist of a list of included or excluded SPL attribute names. Included attributes participate in the mapping, while excluded ones do not. Included and excluded attributes cannot be mixed. The automatic assignment can be empty, meaning that all attributes participate in the mapping. In the following example, the mapping tries to assign the following attributes from example 1: R with CSV field 0, G with CSV field 1 and B with CSV field 2:

<mapping name="colors">
	<auto/>
</mapping>

The next two mappings try to assign the following attributes from example 1: R with CSV field 0 and B with CSV field 1:

<mapping name="colors">
	<auto>
		<include>R</include>
		<include>B</include>
	<auto>
</mapping>

<mapping name="colors">
	<auto>
		<exclude>G</exclude>
	<auto>
</mapping>

Behavior in a consistent region and checkpointing

The CSVParse operator can be an operator within the reachability graph of a consistent region. It cannot be the start of a consistent region.

The CSVParse operator also supports periodic checkpointing that is enabled with the checkpoint configuration clause, for example:

config
	checkpoint : periodic(5.0);
	restartable : true;

Summary

Ports

This operator has 1 input port and 3 output ports.

Windowing

This operator does not accept any windowing configurations.

Parameters

This operator supports 8 parameters.

Required: mappingDocument

Optional: payloadAttribute, ignoreEmptyLines, ignoreHeaderLines, separator, quoted, metricsMode, metricsModeThreshold

Metrics

This operator reports 9 metrics.

Properties

Implementation: C++
Threading: Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The input port that contains the CSV data to parse. It must have at least one rstring attribute that holds the CSV-formatted input line, or payload. If there is only one rstring type attribute in the input tuple schema, this attribute is automatically used as the data source. If there is more than one rstring attribute, you must specify which attribute to use in the payLoadAttribute parameter. Any input schema is allowed. You can forward input attributes to output port attributes by applying the custom output functions fromInput(...) to output port attributes.

The payload string can have one of two formats: quoted or not quoted. In quoted mode, the CSV fields can be enclosed in quotation marks (""). This symbol allows the separator character to be part of the CSV field, which is not possible in non-quoted mode. In non-quoted mode, the quoting character is treated as a normal character. The mode is controlled by the quoted parameter of the operator.

Example 1

In quoted mode the aa,"bb,cc",dd input string results in the following three fields:

aa
bb,cc
dd

In non-quoted mode, the same input string results in the following four fields:

Example 2

In quoted mode, the aa,bb”cc,dd input string results in an error because the quoting character cannot occur inside an unquoted field. In non-quoted mode, this string results in the following three fields:

aa
bb"cc
dd

Example 3

To include the quote character inside a field, escape it with itself (according to RFC 4180). In quoted mode, the aa,”bb””cc”,dd input string results in the following three fields:

aa
bb"cc
dd

In non-quoted mode, this input string results in the following three fields:

aa
"bb""cc"
dd

In quoted mode, the quoting character must occur right after the separator character. Otherwise, this field is treated as non-quoted and the quote character cannot be contained in the field. In addition, after the closing quote for a field, the next character must be a separator if additional fields follow.

Leading and trailing spaces in fields are not trimmed. Fields with zero length result in empty fields.

Example 4

In quoted mode, the ,,””, input string results in four empty fields. In non-quoted mode, the third field contains the two quote characters ("").

For more information on the quote handling, see RFC 4180.

Restrictions on the input string

The input attribute with the CSV data is of rstring type and can hold a string with any encoding. But, this operator expects either plain ASCII (7-bit) or UTF-8 encoding. If any other encoding is used, the results are undefined. If you want to use other string encodings, use, for example the encoding parameter of the spl.adapter::FileSource (or TCP/UDPSource) operator to convert your encoding format to UTF-8.

Properties

Optional: false

Output Ports

Assignments: This operator allows any SPL expression of the correct type to be assigned to output attributes.

Output Functions

DataAssignmentFunctions

<any T> T fromInput()

Takes the value from the input tuple. An input attribute of the same name and type as the output attribute must exist.

<any T> T fromInput(T attr)

Takes the value from the attribute of the input tuple of the given name. An input attribute of the given name and same type as the output attribute must exist.

boolean IsPresent(rstring attributeName)

Use this function to determine if a given output port attribute has been assigned from an input field or from a default value. The parameter of type rstring is the name of the output attribute you want to get the presence status for.

The function evaluates to true if the named attribute was assigned from a field of the input CSV file. It is false when

the attribute does not correspond to a field of the input CSV file as defined in the mapping xml
the field containing the attribute was not in the input line because the line had fewer fields than needed
the attribute is a numeric type and the field was empty

For example, if the output port type is:

stream<rstring aString, int anInt, rstring anotherString>

and the mapping includes all 3 attributes, then the following input lines produce the listed results.

"cats",0,"dogs"  isPresent("aString"), isPresent("anInt"), isPresent("anotherString") are all true.
,0,              all three are true
,0               isPresent("aString") is true, isPresent("anInt") is true, and isPresent("anotherString) is false
,,               isPresent("anInt") is false, but the other two are true.

blob PresenceMask()

Use this function to determine the presence status of all output port attributes at once. The function returns a blob that contains at least as many bits as there are attributes in the output port. Each output port attribute is represented by a bit in the blob. The leftmost bit in the first byte of the blob (bit 8) represents the first attribute in the output port tuple. The next bit (bit 7 of the first byte) represents the second attribute, and so on.

You can use the following SPL function to check the presence status of a given attribute. The first parameter is the blob returned by the presenceMask function, the second parameter is the position of the output port attribute you want to check (starting with 0 for the first attribute).

boolean checkAttribute(blob mask, uint32 index)
{
	uint32 offset = index/8u;
	uint8 bitpos = (1ub << (7u - (index % 8u)));
	return ((mask[offset] & bitpos) == 0ub) ? false : true;
}

Note that this code does not check for out of bounds errors.

The conditions that determine if a presence bit is set to true or false are the same as for the IsPresent(...) custom output function. See the details in the description of that function.

ErrorAssignmentFunctions

<any T> T fromInput()

Takes the value from the input tuple. An input attribute of the same name and type as the output attribute must exist.

<any T> T fromInput(T attr)

Takes the value from the attribute of the input tuple of the given name. An input attribute of the given name and same type as the output attribute must exist.

MetricsFunctions

<any T> T fromInput()

Takes the value from the latest input tuple. An input attribute of the same name and type as the output attribute must exist.

<any T> T fromInput(rstring attributeName)

Takes the value from the attribute of the latest input tuple of the given name. An input attribute of the given name and same type as the output attribute must exist.

uint64 getRecordCount(rstring mappingName)

Returns the number of records on a per-CSV line type, or mapping, basis.

uint64 getErrorCount(rstring mappingName)

Returns the number of errors on a per-CSV line type (mapping) basis.

map<rstring,uint64> getRecordCounts()

Returns the number of successfully processed records for all specified mappings.

map<rstring,uint64> getErrorCounts()

Returns the number of errors for all specified mappings.

map<rstring,map<rstring,uint64>> getRecordStats()

Returns the number of successfully parsed records and the number of errors for all specified mappings.

uint64 nTuplesReceivedTotal()

Returns the number of received tuples.

uint64 nTuplesSentTotal()

Returns the number of sent tuples on output port 0.

uint64 nTuplesFailedTotal()

Returns the number of sent error tuples on output port 1.

uint64 nTuplesDroppedTotal()

Returns the number of dropped tuples because of ignored CSV lines.

uint64 nTuplesReceived()

Returns the number of received tuples since the last sent metrics tuple.

uint64 nTuplesSent()

Returns the number of sent tuples since the last sent metrics tuple.

uint64 nTuplesFailed()

Returns the number of sent error tuples on output port 1 since the last window punctuation.

uint64 nTuplesDropped()

Returns the number of dropped tuples because of ignored CSV lines since the last window punctuation.

uint64 latestPunctuation()

Returns the time of the latest occurrence of a window punctuation (in seconds) since the Epoch (00:00:00 UTC, January 1, 1970).

Ports (0)

The first output port contains the fields from a successfully parsed CSV input line and is mandatory. It supports any schema. Each attribute in the output port schema is assigned by one of the following methods (with descending precedence):

If a mapping for this attribute is specified in the mappingDocument, the attribute is assigned the value of the CSV field with the given index. Conversions to the type of the output attribute are completed automatically. If an input CSV line does not have enough fields for all output attributes to be assigned, the attributes that cannot be filled get their default values.
If the attribute has a custom output function assigned, the value is determined by the given function. The allowed function set is DataAssignmentFunctions. If the attribute assignment is already specified in the mapping document, a compile time error occurs.
If the attribute is not assigned by the previous methods and an attribute with the same name and type is present in the input port, the value of the input attribute is copied to this output attribute.
If none of the previous methods can be used, the SPL attribute gets its default value.

For attributes that are assigned from a CSV field, the port supports rstring, ustring, float32, float64 and all integral types.

Properties

Optional: false

TupleMutationAllowed: true
WindowPunctuationOutputMode: Preserving

Ports (1)

On the second output port, error tuples are sent. This port is mandatory and can have any schema. For each input tuple that cannot be processed, one tuple is sent on this port. Custom output functions can be used to control the values of attributes in the schema of this output port. The allowed function set is ErrorAssignmentFunctions. For attributes that have no custom output functions, the value of an input attribute with the same name and type is assigned automatically. The simplest usage of this port is to specify the same schema as for the input port, which forwards the erroneous input tuple to this port for further analysis.

If one of the following conditions is true, an input tuple is considered erroneous:

The number of fields in the CSV line is less than the configured minimum number of fields for the mapping (itemCountMin XML attribute).
The number of fields in the CSV line is greater than the configured maximum number of fields for the mapping (itemCountMax XML attribute).
The conversion of the CSV field to the type of the SPL output attribute fails, for example if the input item has the "strange" value and the output attribute type is int32. Per default empty input fields cannot be converted to numeric types, and the input line will be forwarded to the error port. This behaviuor can be changed using the useDefaults XML attribute in the mapping definition.
The ignoreEmptyLines parameter is set to false. In this case, empty lines produce error tuples if either multiple mappings are defined or if the single mapping requires one or more CSV fields.

Properties

Optional: false

TupleMutationAllowed: true
WindowPunctuationOutputMode: Preserving

Ports (2)

The third output port contains processing statistics. This port is optional and can have any schema. By default, statistic tuples are sent when window punctuations are received. However, this behavior can be changed by modifying the metricsMode and metricsThreshold parameters. If the metricsMode parameter is set to tuples, the operator sends a statistic tuple on this port after it receives metricsThreshold input tuples. In this case, no statistic tuple is sent when it receives window punctuations. Each time a statistic tuple is sent, the following internal counters are reset to zero:

nTuplesReceived
nTuplesSent
nTuplesFailed
nTuplesDropped

Custom output functions can control attribute values in the output port schema. The allowed function set is MetricsFunctions. For attributes that do not have custom output functions assigned, the value of an input attribute with the same name and type is assigned automatically. Typically, statistic tuples are not sent for each input tuple, so the operator uses the attribute values of the latest input tuple while sending a statistic tuple. You can use this feature to, for example, send a filename that is constant for all tuples of a file, with the statistic tuple.

In addition to the metrics that are independent from the mappings, the supported custom output functions for this port can deliver statistics on a per-mapping basis. For example, if you configure two mappings named colors and cars, you can receive detailed statistics about the number of failed and successful tuples for each mapping by using the getRecordStats() custom output function. This function produces an output like the following:

{recordStats={"colors":{"errors":1,"records":2}},"cars":{"errors":0,"records":1}}}

This example means that we received 3 tuples that match the colors mapping, 2 of tuples successfully processed and one with an error. One tuple was received that matches the cars mapping with no errors.

Properties

Optional: true

TupleMutationAllowed: true
WindowPunctuationOutputMode: Preserving

Parameters

payloadAttribute

Specifies the input port rstring attribute, which holds the payload to be parsed. If more than one rstring attribute exists in the input port schema, this parameter is mandatory. Otherwise, the single rstring attribute is automatically selected.

Properties

Type: rstring
Cardinality: 1
Optional: true
ExpressionMode: Attribute

mappingDocument

Specifies the path name of the mapping definition document, the XML document that describes the mapping of the CSV fields to the SPL output attributes.

The mapping definition document is used at SPL compile time. If you modify the document, you must recompile the SPL application for the changes to take effect. After the application is compiled, the mapping definition document is not required for job submission.

A relative path is relative to the SPL application directory that is the current working directory where the sc command is run. For example, if you specify the relative path "etc/MappingDefinition.xml" and run the sc command from the /home/myapp directory, the compiler looks for the MappingDefinition.xml document in the /home/myapp/etc directory.

Properties

Type: rstring
Cardinality: 1
Optional: false
ExpressionMode: Constant

ignoreEmptyLines

Specifies how to handle empty CSV lines. If this parameter is set to true, empty lines are dropped. Otherwise, empty lines may produce a default or error tuple, depending on the mapping configuration. The default value is true.

Properties

Type: boolean
Cardinality: 1
Optional: true
ExpressionMode: Constant

ignoreHeaderLines

Specifies how to handle header lines in the CSV input. A header line is the first line after punctuation. If this parameter is set to true, the header is dropped. If false, the header is handled like a normal line. The default value is false.

Properties

Type: boolean
Cardinality: 1
Optional: true
ExpressionMode: Constant

separator

Specifies the separator character or characters that delimit the fields in the CSV input. The separator must be a single- or multi-character string constant. The default value is a comma (,).

Properties

Type: rstring
Cardinality: 1
Optional: true
ExpressionMode: Constant

quoted

Specifies the quoting mode for the CSV input. Supported values are off and on. The off value means that no CSV field is quoted, and on means that some or all fields are quoted. The default value is off. If you are sure that no input field is quoted, specify off to improve performance.

Properties

Type: QuotingMode (on, off)
Cardinality: 1
Optional: true
ExpressionMode: CustomLiteral

metricsMode

Specifies the trigger mode that sends a metrics tuple on the optional third output port. The supported values are punctuation and tuples. The default value is punctuation. For tuples mode, the metricsModeThreshold parameter is mandatory. After each sent metrics tuple, a subset of the online metrics is reset.

For example, if punctuation is specified, a metrics tuple is sent when window punctuation is received. If tuples is specified and metricsModeThreshold is set to 100, a metrics tuple is sent for every 100 input tuples.

Properties

Type: MetricsModeTypes (punctuation, tuples)
Cardinality: 1
Optional: true
ExpressionMode: CustomLiteral

metricsModeThreshold

Specifies the number of received tuples after which a metrics tuple on the optional second output port is triggered. After each sent metrics tuple, a subset of the online metrics is reset.

For example, if the metricsMode parameter is set to tuples and this parameter is set to 100, a metrics tuple is sent for every 100 input tuples.

This parameter is allowed only if the metricsMode parameter is set to tuples.

Properties

Type: uint64
Cardinality: 1
Optional: true
ExpressionMode: Expression

Code Templates

CSVParse

(stream<${outputSchema}> ${outputStream} as O; stream<${errorSchema}> ${errorStream} as E) = CSVParse(${inputStream} as I) 
{
	param
		payloadAttribute: ${attributeName};
		mappingDocument: "${mappingDefinitionFile}";
}

CSVParseWithStatistics

(stream<${outputSchema}> ${outputStream} as O; stream<${errorSchema}> ${errorStream} as E; stream<${statSchema}> ${statStream} as S) = CSVParse(${inputStream} as I)
{
	param
		payloadAttribute: ${attributeName};
		mappingDocument: "${mappingDefinitionFile}";
}

Metrics

nTuplesReceivedTotal - Counter

The number of received tuples since the start of the operator.

nTuplesSentTotal - Counter

The number of tuples sent on output port 0. In other words, the number of matched and converted tuples that did not generate an error.

nTuplesFailedTotal - Counter

The number of error tuples sent on output port 1. In other words, the number of tuples that failed conversion or itemCountMin/itemCountMax validation.

nTuplesDroppedTotal - Counter

The number of dropped tuples because of CSV lines that could not be matched to a mapping.

nTuplesReceived - Gauge

The number of received tuples since the last statistics tuple or, if the statistic output port is not present, window punctuation.

nTuplesSent - Gauge

The number of sent tuples on output port 0 since the last statistics tuple or, if the statistic output port is not present, window punctuation.

nTuplesFailed - Gauge

The number of error tuples sent on output port 1 since the last statistics tuple or, if the statistic output port is not present, window punctuation.

nTuplesDropped - Gauge

The number of dropped tuples because of ignored CSV lines since the last statistics tuple or, if the statistic output port is not present, window punctuation.

latestPunctuation - Time

The time of the latest window punctuation since the Epoch (00:00:00 UTC, January 1, 1970).

Libraries

Common Headers: Include Path: ../../../../impl/include/parser.binary

Operator CSVParse

Behavior in a consistent region and checkpointing

Summary

Properties

Operator `CSVParse`