MapR Streams Consumer
The MapR Streams Consumer origin reads messages from MapR Streams.
MapR is now HPE Ezmeral Data Fabric. At times, this documentation uses "MapR" to refer to both MapR and HPE Ezmeral Data Fabric. For information about supported versions, see Supported Systems and Versions.
When you configure a MapR Streams Consumer, you configure the topic, consumer group, and other general properties. You configure the data type and related properties, and you can optionally add additional MapR Streams properties and supported Kafka properties.
Before you use any MapR stage in a pipeline, you must perform additional steps to enable Data Collector to process MapR data. For more information, see MapR Prerequisites.
Processing All Unread Data
When you start a pipeline for the first time, the MapR Streams Consumer becomes a new consumer group for the topic. It reads only incoming data, processing data from all partitions, and ignores any existing data in the topic by default.
To read all unread data in the topic, add the auto.offset.reset Kafka configuration property and set it to earliest. For more information about this property, see the MapR Streams documentation.
Data Formats
The MapR Streams Consumer processes data differently based on the data format. MapR Streams Consumer can process the following types of data:
- Avro
- Generates a record for every message. Includes a
precision
andscale
field attribute for each Decimal field. - Binary
- Generates a record with a single byte array field at the root of the record.
- Delimited
- Generates a record for each delimited line.
- JSON
- Generates a record for each JSON object. You can process JSON files that include multiple JSON objects or a single JSON array.
- Log
- Generates a record for every log line.
- Protobuf
- Generates a record for every protobuf message. By default, the origin assumes messages contain multiple protobuf messages.
- SDC Record
- Generates a record for every record. Use to process records generated by a Data Collector pipeline using the SDC Record data format.
- Text
- Generates a record for each line of text or for each section of text based on a custom delimiter.
- XML
- Generates records based on a user-defined delimiter element. Use an XML element directly under the root element or define a simplified XPath expression. If you do not define a delimiter element, the origin treats the XML file as a single record.
Additional Properties
You can add custom configuration properties to MapR Streams Consumer. You can use any MapR or Kafka property supported by MapR Streams. For more information, see the MapR Streams documentation.
You can add any valid configuration property. When you add a property, enter the exact property name and the value. MapR Streams Consumer does not validate the property names or values.
If custom configurations conflict with other stage properties, the stage generates an error unless you select the Override Stage Configurations check box. With the check box selected, the custom configurations override other stage properties. For information about the necessary properties, see the MapR documentation.
Record Header Attributes
The MapR Streams Consumer origin
creates record header
attributes that include information about the originating file for
the record. When the origin processes Avro data, it includes the Avro schema in
an avroSchema
record header attribute.
You can use the record:attribute
or
record:attributeOrDefault
functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with Header Attributes.
- avroSchema - When processing Avro data, provides the Avro schema.
- offset - The offset where the record originated.
- partition - The partition where the record originated.
- topic - The topic where the record originated.