Introducing direct Splunk ingestion with IBM Common Data Provider for z Systems
The IBM Common Data Provider for z Systems team provides a new method for getting your business-critical data into the Splunk platform. Normally, to make Z data easily ingestible by the Splunk platform, customers must configure additional components, such as the IBM Common Data Provider for z Systems Data Receiver and Buffered Splunk Ingestion App. With the new ingestion method, these additional components are no longer necessary. Although the new method provides quicker configuration, it is not meant to supersede previous methods. It is meant only to provide another option for data ingestion, depending on the needs and requirements of your IT environment. This paper describes more about the new method, and when you might want to choose this option.
HTTP Event Collector (HEC): the Splunk component that makes direct ingestion possible
First, let’s talk about the component that makes this new method possible. Splunk 6.3.0 includes a new feature, known as the HTTP Event Collector (HEC). This feature enables an application to communicate with Splunk over HTTP or HTTPS. The HEC uses a token-based authentication model, which allows users to create an Event Collector token for an application, such as IBM Common Data Provider for z Systems. This 32-character globally unique identifier (GUID) token (or tokens) enables IBM Common Data Provider for z Systems to securely communicate with your Splunk instance over HTTP or HTTPS.
Why send data by using the HEC?
Here are some primary advantages of sending your data directly to Splunk by using the HEC:
- Your Mainframe and Splunk teams can work together and quickly get IBM Common Data Provider for z Systems deployed end-to-end, without the need for a distributed team to configure the Data Receiver or update the Buffered Splunk Ingestion App. The deployment time is streamlined by reducing the teams involved. Simply configure your host-based components, create an Event Collector token, and send your data directly to Splunk.
- The HEC provides an easy interface to manage and control the Event Collector tokens that you create. You can set the indexers that you want to be associated with each token. You can also enable or disable any token that you create to control what data is sent by using HEC. Token-based inputs also enable you to segregate, and search on, a specific input token.
Important if you have custom dashboards:
If you currently have any custom dashboards and want to display both CSV and key-value data, you must update the underlying queries. The new data format appends _KV at the end of the source type field, which enables both CSV and key-value data to coexist. For any search query that uses sourcetype in the query stanza, you must add an OR condition. For example, if the query code includes "sourcetype = SMF_030", you now must now have the following query code: "sourcetype = SMF_030 OR sourcetype = SMF_030_KV”. If you are just collecting key-value data then all you need to have in your custom queries is <sourcetype>_KV.
Why send data by using the Data Receiver and Buffered Splunk Ingestion App?
Now, you might ask: "Why would I want to use any method other than HEC to get my data into Splunk?" Data that IBM Common Data Provider for z Systems sends to the Data Receiver is in CSV format. CSV format is an efficient and concise way to move and store your data. This allows for the Data Receiver to quickly get your data from Z into Splunk, but it does not describe the structure of your data. Therefore, IBM Common Data Provider for z Systems provides the Buffered Splunk Ingestion App as a Splunk add-on. This simple solution contains a set of rules that define the structure of the CSV data that is being sent from the Data Receiver.
Data that is sent by using the HEC is in key-value format because this format best matches the Splunk preferred data format and potentially aids in search performance. However, CSV format uses less space for the same record than key-value format. The CSV format has a single row of field names that are followed by the numerous records. The key-value format displays the field name beside every single value. This formatting change can increase your overall data size by 2-8 times the original size. This increase varies depending on the data type and contents.
Due to the processing of key-value format you should expect to see an increase in CPU consumption when compared to CSV. Therefore, if you foresee sending a very large volume of data to Splunk, you might choose to send data by using the Data Receiver and Buffered Splunk Ingestion App. CSV utilizing the Data Receiver is the recommended best practice for most customers. However, based on your specific needs, HEC may be appropriate for a subset of customers.
You now have three options for sending data to Splunk:
- Use the Data Receiver with the Buffered Splunk Ingestion App.
- Ingest data directly into Splunk by using HEC.
- Use both methods, as both methods can coexist in the same environment.
CSV ingestion is still considered to be the best practice, however HEC ingestion may be worth considering for customers who:
- Have standards preventing them from installing applications on the Splunk servers or for whom coordination with the Splunk server team is restricted in some way.
- Are not sensitive to Splunk ingestion costs.
- Are not sensitive to CPU usage on the host( or that have zIIP available to offload Data Streamer CPU.)
- Have lower data volumes or are willing to set up more Splunk indexers and distribute data across them (possibly using a load balancer.)
Both methods have their advantages, so care should be taken when deciding which ingestion method best meets your needs. HEC provides quick end-to-end implementation and removes the need for the Data Receiver and the Buffered Splunk Ingestion App, but at the cost of increased data ingestion size, increased cost, and increased CPU usage on the mainframe.
The Data Receiver and Buffered Splunk Ingestion App provide the least CPU usage and smallest ingestion data size, but at the cost of requiring more components to complete the end-to-end solution.