Apache Nifi is a popular open source visual ETL (extract, transform, load) tool which can be used to consume or publish event data to and from many destinations, including Apache Kafka. 

Once ingested, Apache Nifi can be used to route, filter, enrich, and transform the payload. Apache Kafka is a distributed publish-subscribe messaging system that serves multiple use cases because of its high throughput, replication, and fault tolerance. It is a very reliable way of getting streaming event data into and out of your Nifi Flows. 

IBM Event Streams is a secure, fully managed Kafka-as-a-Service offering. It uses SASL_SSL to securely send and receive data, however this makes it less straightforward to connect your NiFi Kafka processors to than an unsecured Kafka. 

This guide can be used to help with connecting Apache Nifi to a hosted Apache Kafka offering, such as IBM Event Streams. It assumes that you have already setup an Apache Nifi Cluster or instance and provisioned IBM Event Streams.

1. Create IBM Event Streams credentials

With an instance of IBM Event Streams created, you will need to create connection credentials either using the IBM Cloud console UI or IBM Cloud CLI. This will provide the user, password, and the list of Kafka brokers which are used to connect to IBM Event Streams. To create credentials from the IBM Cloud Console, navigate to the Service Details page of your Event Streams instance. This is the page that you are redirected to upon creating the instance:

2. Add certificates to the Apache NiFi trust store

To connect to Kafka over SASL_SSL, Apache NiFi requires that a Truststore be created and the IBM Event Stream certificates be included. The Truststore is used by NiFi to verify the the ssl connection. While not required to communicate with IBM Event Streams, it is good to create a Keystore to secure Apache NiFi and enable authentication of users to the UI. This is also required to enable communication between nodes of an Apache NiFi cluster. 

Apache NiFi includes the TLS Toolkit utility to help with generating the Keystore and Truststore. The following command can be used for a standalone server with hostnames *.nifi.svc.cluster.local and your desired Truststore and Keystore passwords.

/opt/nifi/nifi-toolkit-current/bin/tls-toolkit.sh standalone -n *.nifi.svc.cluster.local -f /opt/nifi/nifi-current/conf/nifi.properties --trustStorePassword "securePasswordOne" --keyStorePassword "securePasswordTwo" || true && \
Scroll to view full table

The openssl and keytool command-line utility can be used to download the Event Streams certificate and then import it into the truststore.jk. The hostname for Event Streams can be obtained in Step 1 when creating the credentials. The <<nifi service – namespace>> is what was specified when deploying Apache NiFi. Be sure to use your Keystore password that was used in the tls-toolkit command to setup your stores.

openssl s_client -connect broker-2-<<hostname>>.eventstreams.cloud.ibm.com:9093 -servername broker-2-<<hostname>>.eventstreams.cloud.ibm.com </dev/null \
        | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > broker-2.cert && \
    keytool -import -noprompt -trustcacerts \
        -alias kafka-broker -file broker-2.cert \
        -keystore /opt/nifi/nifi-current/\*.<<nifi service - namespace>>.svc.cluster.local/truststore.jks -storepass "securePasswordTwo"
Scroll to view full table

For Deployment, a custom Docker image can be created from the base NiFi Docker image with the configured Keystore and Truststore. 

3. Configure an Apache NiFi Kafka consumer or producer

Apache NiFi should now have what it needs to connect to IBM Event Streams. To start consuming or publishing events, add a ConsumeKafkaRecord or PublishKafkaRecord NiFi processor and change the following configurations.

  1. Enter the comma separated list of Kafka Brokers from Step 1.
  2. Enter your Topic Name(s) for the topics that you want to consume from or publish to.
  3. Select and configure a Record Reader and Record Writer.
  4. Choose SASL_SSL as the Security Protocol.
  5. Enter the Username and Password retrieved from Step 1.
  6. Choose a Group ID.
  7. Create a new SSL Context Service.

Once complete, click the arrow next to your SSL Context Service to go to the controllers page and start to configure your SSL Context Service.

4. Configure the connected SSLContextService

The SSL Context Service is what connects your Truststore and Keystore to Apache NiFi. Enter in the configurations to specify where the Keystore and Truststore filenames are located as well as their corresponding passwords. The image below shows example configurations to configure the SSLContextService:

Once complete, enable your SSL Context Service and the Kafka Nifi Processor, and you should now be able to connect to Event Streams and publish or consume event data in Apache Nifi.

Next steps

With Apache Nifi connected to IBM Event Streams, you can now concentrate on creating ETL processes for your Event Streams data. Once that is complete, you might consider storing your results in one of the many managed databases in IBM Cloud so it can be consumed by your application. Alternatively, you might consider sending your data to IBM streams or a cloud function to create a machine learning pipeline or other more computationally intensive processing. 

The choice is yours on what you want to do but you can be mindful of the fact that you can spend more time on creating your application and less time managing Kafka or writing ETL processes while being able to leverage many of the services within IBM Cloud.

More from Cloud

Strengthening cybersecurity in life sciences with IBM and AWS

7 min read - Cloud is transforming the way life sciences organizations are doing business. Cloud computing offers the potential to redefine and personalize customer relationships, transform and optimize operations, improve governance and transparency, and expand business agility and capability. Leading life science companies are leveraging cloud for innovation around operational, revenue and business models. According to a report on mapping the cloud maturity curve from the EIU, 48% of industry executives said cloud has improved data access, analysis and utilization, 45% say cloud…

7 min read

Kubernetes version 1.27 now available in IBM Cloud Kubernetes Service

< 1 min read - We are excited to announce the availability of Kubernetes version 1.27 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 22nd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.25 (soon to be 1.26); you can also choose to immediately deploy version 1.27. Learn more about deploying clusters here. Kubernetes version 1.27 In…

< 1 min read

Redefining the consumer experience: Diageo partners with SAP and IBM on global digital transformation

3 min read - In an era of evolving consumer preferences and economic uncertainties, the beverage industry stands as a vibrant reflection of changing trends and shifting priorities. Despite the challenges posed by inflation and the cost-of-living crisis, a dichotomy has emerged in consumer behavior, where individuals untouched by the crisis continue to indulge in their favorite beverages, while those directly affected pivot towards more affordable luxuries, such as a bottle of something special. This intriguing juxtaposition highlights the resilient nature of consumers and…

3 min read

IBM Cloud releases 2023 IBM Cloud for Financial Services Agreed-Upon Procedures (AUP) Report

2 min read - IBM Cloud completed its 2023 independent review of IBM Cloud services and processes. The review report demonstrates to its clients, partners and other interested parties that IBM Cloud services have implemented and adhere to the technical, administrative and physical control requirements of IBM Cloud Framework for Financial Services. What is the IBM Cloud Framework for Financial Services? IBM Cloud for Financial Services┬« is designed to build trust and enable a transparent public cloud ecosystem with features for security, compliance and…

2 min read