Modernizing Kafka Data Streaming with Confluent

Apache Kafka is a powerful distributed event streaming system that provides real-time data. Event-driven architectures are the most commonly used strategy for streaming data: continuous generation and transmission of data from sources like sensors, websites and financial transactions.

However, enterprises often need more data governance, data security, data integration, observability and operational data management. Confluent packages all of these capabilities into a data platform that reduces the complexity of deploying and operating Kafka at scale.

Running Kafka clusters in production requires managing brokers, storage, networking, scaling, upgrades, monitoring and disaster recovery. Confluent offers managed cloud services and operational tooling that reduce the burden on infrastructure teams and allow developers to focus on building applications rather than maintaining clusters.

Rather than treating Kafka as a messaging system, Confluent promotes Kafka as a central nervous system for enterprise data. Events can be shared across applications, advanced analytics systems, artificial intelligence and machine learning platforms, and operational systems in real time.

This enables data architectures that are more responsive and less dependent on either batch processing or batch data ingestion and that speed data-driven decision-making and business intelligence.

To see how this story plays out, imagine the use cases of a large health insurance provider, “Wellspring Health”, which already runs Apache Kafka on-premises to process healthcare and insurance events.

Their legacy system contains a Kafka environment that handles:

· Medical claims

· Pharmacy transactions

· Fraud detection

· Provider billing events

· Member eligibility updates

The system works but data teams struggle with inconsistent schemas, unknown downstream consumers, custom ETL pipelines and slow data analytics onboarding. To improve operational efficiency, Wellspring looks to adopt the cloud-based Confluent Cloud in a data modernization initiative to upgrade their streaming platform while still preserving their existing Kafka investments and knowledge.

The highest priorities on the roadmap for Wellspring are to:

1. Detect potentially fraudulent claims in real time

2. Provide access controls to govern sensitive data structures

3. Easily provide Iceberg snapshots for ingestion into a data lake or data warehouse

In this tutorial, you’ll see how Wellspring can modernize and optimize their data infrastructure and Kafka architectures with Confluent to easily create derived big data streams, improve data processing and create dashboards. This improvement will give them a more scalable and unified data infrastructure and analytics ecosystem.

The first step along this path is to take the Kafka streams that they’re already using in an on-premises environment and move them to Confluent Cloud.

You can find the code sections of this tutorial in our GitHub repository in the Confluent Modernization tutorial.

Create an account in Confluent Cloud. You can use GitHub, Google or simply use your own email address. Answer the initial questions however you’d like. When you reach the stage to create your own cluster, select “Explore other cluster types and pricing”.

Create the environment. For the purposes of this demo, call it “default”.

Next, create the cluster. For the purposes of this demo, this one is named “tutorial_cluster” but you can choose any name that you’d like. Use the Basic cluster for this tutorial because it has pricing compatible with a simple proof of concept.

Now you’re ready to create your first Kafka topic. A Kafka topic is the fundamental unit of organization in Apache Kafka. You can think of it as a feed name or logical channel where data records are published and stored.

After launching your cluster you can now create the first topic, called medical_claims to store incoming medical claims.

Now, edit the data contract:

The JSON for this data contract is:

{
“connect.name”: “com.wellspring.claims.MedicalClaim”,
“connect.parameters”: {
“io.confluent.connect.avro.field.doc.event_time”: “The string is a unicode character sequence.”,
“io.confluent.connect.avro.record.doc”: “Schema for a medical procedure claim.”
},
“doc”: “Schema for a medical procedure claim.”,
“fields”: [
{
“name”: “member_id”,
“type”: “int”
},
{
“name”: “claim_id”,
“type”: “int”
},
{
“name”: “provider_id”,
“type”: “int”
},
{
“name”: “procedure_code”,
“type”: “string”
},
{
“name”: “diagnosis_code”,
“type”: “string”
},
{
“name”: “claim_amount”,
“type”: “float”
},
{
“name”: “claim_status”,
“type”: “int”
},
{
“doc”: “The string is a unicode character sequence.”,
“name”: “event_time”,
“type”: “string”
}
],
“name”: “MedicalClaim”,
“namespace”: “com.wellspring.claims”,
“type”: “record”
}

Now you’ll add the data topic to the schema registry. This provides a single point of governance through the schema registry so that all clients querying topics know what to expect.

Navigate to the schema registry by navigating to your cluster:

Then, select ‘Data contracts’ and then ‘Add data contract’:

Next, enter the data contract formedical_claims :

Now select the API endpoint and note the URL. This is how you can access the schema registry from any client. To see how it works in action, create a user account to access the schema registry remotely.

Select your username from the upper right menu and then navigate to the API keys view, then select add API key. Enter whatever name that you’d like and select the schema registry for the key scope and default for the environment.

This generates a new key. Download the key, open the downloaded text file and copy the values into your .env file. Save the API key asREGISTRY_API_USER , the API Secret asREGISTRY_API_SECRET , and the registry URL asREGISTRY_URL . Then save and close your .env file.
You can now test the schema registry like so. You’ll want to create a Python environment and install the libraries found in therequirements.txt file in the project repository.

import datetime
import json
from confluent_kafka.schema_registry import SchemaRegistryClient

from dotenv import load_dotenv
import os

# Load env variables
load_dotenv()

# Setup Registry and Serializer
sr_conf = {‘url’: str(os.getenv(“REGISTRY_URL”)),
‘basic.auth.user.info’:f’{str(os.getenv(“REGISTRY_API_USER”))}:{str(os.getenv(“REGISTRY_API_SECRET”))}’}
schema_registry_client = SchemaRegistryClient(sr_conf)

schema_str = schema_registry_client.get_version(“medical_claims-value”, schema_registry_client.get_versions(“medical_claims-value”)[-1]).schema.schema_str

print(schema_str)

schema = json.loads(schema_str) # type: ignore

print(“ ============ Schema Fields ============”)
for f in schema[‘fields’]:
print(f[‘name’])

Now you can see how any client accessing the streams will be able to see the data contract for that stream and any changes that might have occurred in it before reading or writing to it.

Now you can create derived topics in Confluent Cloud through an automation that takes the form of a ksqlDB statement. These topics can check data quality, aggregate or process event streams, send notifications and many other data workflows. In Confluent architectures, topics are generated through:

· Stream processing

· Joins

· Aggregations

· Enrichment pipelines

Often using:

· ksqlDB

· Kafka Streams

· Apache Flink

You’ll see how Confluent ksqlDB can help Wellspring create two derived topics to streamline data access. The first is to track high-value claims of more than USD 10,000.

You’ll need a globally scoped key for the ksqlDB operations. Select your username from the upper right menu and then navigate to the API keys view, then select add API key. Enter whatever name that you’d like and select global for the key scope.

It generates a new key that you can use across Confluent Cloud. Download the key and open that text file and copy the values into your .env file. Save the API key from the text file as REQUESTS_API_KEY in your .env and the API secret from the text file as REQUESTS_API_SECRET in your text file.

You’ll also copy your user ID from your user settings.

Save this to your .env file asSERVICE_ACCOUNT_ID .

Open the environment details in the environment tab:

Copy the env ID and save it to your .env file asENV_ID .

Open the Cluster details in the Cluster tab:

Save this to your .env file as CLUSTER_ID.

Now you’ll create the ksql database by using the REST URL from your environment:

def create_db():

payload = {
“spec”: {
“display_name”: “medical-claims-ksqldb”,
“csu”: 4,
“environment”: {
“id”: os.getenv(“ENV_ID”) # the ID of your environment
},
“kafka_cluster”: {
“id”: os.getenv(“CLUSTER_ID”) # the ID of your cluster
},
“credential_identity”: {
“id”: os.getenv(“SERVICE_ACCOUNT_ID”) # the ID of your user profile in Confluent Cloud
}
}
}

response = requests.post(
“https://api.confluent.cloud/ksqldbcm/v2/clusters”,
json=payload,
auth=HTTPBasicAuth(str(os.getenv(“REQUESTS_API_KEY”)), str(os.getenv(“REQUESTS_API_SECRET”)))
)

print(response.status_code)
print(response.json())

Now you’ve created a ksql database. This database is a purpose-built event streaming database that lets you process and analyze data in Apache Kafka by using standard, lightweight SQL syntax. To use your new ksqlDB, go to the Confluent Cloud console and open the ksqlDB menu item. This might take a few minutes to provision and show up in the Confluent Cloud UI.

Go to settings and copy the ‘REST endpoint URL’.

Save this to your .env file as KSQL_ENDPOINT.

Now you can create a base stream to work with ksqlDB:

def create_base_stream():

sql = “””
CREATE STREAM medical_claims_stream (
claim_id VARCHAR,
member_id VARCHAR,
provider_id VARCHAR,
procedure_code VARCHAR,
diagnosis_code VARCHAR,
claim_amount DOUBLE,
claim_status VARCHAR,
event_time VARCHAR
)
WITH (
KAFKA_TOPIC=’medical_claims’,
VALUE_FORMAT=’AVRO’
);
“””

response = requests.post(
os.getenv(“KSQL_ENDPOINT”),
auth=HTTPBasicAuth(str(os.getenv(“REQUESTS_API_KEY”)), str(os.getenv(“REQUESTS_API_SECRET”))),
headers={
“Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
},
json={
“ksql”: sql
}
)

print(response.status_code)
if(response.status_code != 404):
print(response.json())

Now with that base stream, you can use SQL statements to filter all the data coming from the base stream into a derived stream for real-time analytics:

def create_derived_stream():

sql = “””
CREATE STREAM high_value_claims AS
SELECT
claim_id,
member_id,
provider_id,
claim_amount,
procedure_code
FROM medical_claims_stream
WHERE claim_amount > 10000
EMIT CHANGES;
“””

response = requests.post(
os.getenv(“KSQL_ENDPOINT”),
auth=HTTPBasicAuth(os.getenv(“REQUESTS_API_KEY”), os.getenv(“REQUESTS_API_SECRET”)),
headers={
“Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
},
json={
“ksql”: sql
}
)

print(response.status_code)
if(response.status_code != 404):
print(response.json())

You can also create complex windowing. For instance, there are certain kinds of procedures that should trigger an audit and multiple auditable claims in a short period of time indicate that a provider’s systems might have been compromised in a data breach.

To help track this, you’ll create an auditor_stream of claims that are over USD 5000 for procedures ‘7371’, ‘2710’ and ‘1831’. Then, create a new stream that captures whether there are multiple items in the auditor stream in any 10-minute period and capture that as provider_claim_spikes .

def create_auditor_stream():

sql = “””
CREATE STREAM auditor_stream AS
SELECT
claim_id,
member_id,
provider_id,
claim_amount,
procedure_code,
diagnosis_code
FROM medical_claims_stream
WHERE claim_amount > 5000
AND procedure_code IN (‘7371’, ‘2710’, ‘1831’)
EMIT CHANGES;
“””

response = requests.post(
os.getenv(“KSQL_ENDPOINT”),
auth=HTTPBasicAuth(USER_KEY, SECRET_KEY),
headers={
“Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
},
json={
“ksql”: sql
}
)

print(response.status_code)
if(response.status_code != 404):
print(response.json())

def create_provider_claim_spike():

sql = “””
CREATE TABLE provider_claim_spikes AS
SELECT provider_id,
COUNT(*) AS claim_count,
SUM(claim_amount) AS total_claim_value
FROM auditor_stream
WINDOW TUMBLING (SIZE 10 MINUTES)
GROUP BY provider_id
EMIT CHANGES;
“””

response = requests.post(
os.getenv(“KSQL_ENDPOINT”),
auth=HTTPBasicAuth(USER_KEY, SECRET_KEY),
headers={
“Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
},
json={
“ksql”: sql
}
)

print(response.status_code)
if(response.status_code != 404):
print(response.json())

One important thing to note is that the creation of these derived streams will append extra characters to the beginning of the name of the derived topic. You can see the correct name in the topic for your cluster:

Now that you’ve created the derived topics, you can query that stream. Run the following code in a new Python window so that you can see results come in as they are published to the basemedical_claims topic.

def query_provider_spike():

    query = “””
    SELECT *
    FROM provider_claim_spikes
    EMIT CHANGES;
    “””

    response = requests.post(
        os.getenv(“KSQL_ENDPOINT”) + “/query-stream”,
        auth=HTTPBasicAuth(str(os.getenv(“REQUESTS_API_KEY”)), str(os.getenv(“REQUESTS_API_SECRET”))),
        headers={
            “Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
        },
        json={
            “sql”: query
        },
        stream=True
    )

    for line in response.iter_lines():
        if line:
            print(line.decode(“utf-8”))

Because you haven’t created data that would trigger a provider spike yet, nothing will show here yet. To create this data, write to the medical_claims stream.

This requires a key in the cluster itself. Navigate to your cluster and then to API Keys and create a new key. This key will be scoped to the cluster itself and so can be used as a producer of events. Download this key and copy the API key, API secret into your .env asSASL_USERNAME andSASL_PASSWORD . Then, copy the Bootstrap server details asBOOTSTRAP_SERVERS .

Now, open a second terminal window and run the following Python code. This code block first gets the data contract from the schema registry and uses that to ensure that the message being sent contains allthe correct fields. This data is then sent through the derived streams to demonstrate how data flows through real-time data processing systems.

This method creates 20 high value claims, which will show up in the base stream. Then, they show upalso in the provider spike stream because it generates 10 high-value claims in a short time span for 2 different providers.

import datetime
import json

from dotenv import load_dotenv
import os

# Load env variables
load_dotenv()

# Setup Registry and Serializer
sr_conf = {‘url’: str(os.getenv(“REGISTRY_URL”)),
           ‘basic.auth.user.info’:’basic.auth.user.info’:f’{str(os.getenv(“REGISTRY_API_USER”))}:{str(os.getenv(“REGISTRY_API_SECRET”))}’}
schema_registry_client = SchemaRegistryClient(sr_conf)

schema_str = schema_registry_client.get_version(“medical_claims-value”,
                                                schema_registry_client.get_versions(“medical_claims-value”)[-1]).schema.schema_str

avro_serializer = AvroSerializer(schema_registry_client, # type: ignore
                                 schema_str = schema_str) # type: ignore

producer_conf = {
    “bootstrap.servers”: os.getenv(“BOOTSTRAP_SERVERS”),
    # Required for Confluent Cloud
    “security.protocol”: “SASL_SSL”,
    “sasl.mechanism”: “PLAIN”,

    # Confluent Cloud API credentials
    “sasl.username”: os.getenv(“SASL_USERNAME”),
    “sasl.password”:os.getenv(“SASL_PASSWORD”)
}

producer = SerializingProducer({
    **producer_conf,
    “value.serializer”: avro_serializer
})

for i in range(20):
    suspicious_claim = {“member_id”:1000,
        “claim_id”:1001,
        “provider_id”:5000 + int(i/10),
        “procedure_code”:”7371”,
        “diagnosis_code”:”GHIJKLM”,
        “claim_amount”:6000.0 + (i * 100),
        “claim_status”:1,
        “event_time”:str(datetime.datetime.now())}

    producer.produce(
        topic=”medical_claims”,
        value=suspicious_claim
    )

    producer.flush()

You’ll see the terminal window with the derived topics capture the provider spike from the streamed medical claims and print that value to the terminal window.

Another key offering of Confluent Cloud is Tableflow, which allows you to quickly and easily create Iceberg tables that store metadata and snapshots of stored messages. Iceberg is an open source data store that creates snapshots of real-time data streaming systems. These snapshots can be ingested into analytics engines like IBMs watsonx.data® or other providers like Snowflake or Amazon S3.

Go to the topics view in Confluent Cloud and select the generated high value flow. The generated name from the ksql operation will be something likepksqlc-xxxxxxx-HIGH_VALUE_CLAIMS .

Open the topic and click ‘Enable Tableflow’ in the upper right. Select ‘Iceberg’ and ‘Use Confluent Storage’.

This stores Iceberg snapshots of all events, along with the associated metadata, creating a widely compatible data asset for consumption by any analytics engine or storage in a data center.

In order to query the Iceberg datasets create, you’ll need to copy the environment ID from the environment > details view and the organization ID from your organizations. Save this data to your .env file as ENVIRONMENT_FOR_ICEBERG .

Then, copy the organization ID from your organization tab:

Save this your .env file as ORGANIZATION_FOR_ICEBERG .

You can test the Iceberg table creation with the following code:

from pyiceberg.catalog import load_catalog
from dotenv import load_dotenv
import os

# Load env variables
load_dotenv()

catalog = load_catalog(
“confluent”,
type=”rest”,
uri=(
“https://tableflow.us-east-2.aws.confluent.cloud/”
“iceberg/catalog/”
f”{os.getenv(“ORGANIZATION_FOR_ICEBERG”)}”
f”{os.getenv(“ENVIRONMENT_FOR_ICEBERG”)}”
),
credential=(f’{str(os.getenv(“REQUESTS_API_KEY”))}:{str(os.getenv(“REQUESTS_API_SECRET”))}’),
header={“X-Iceberg-Access-Delegation”:”vended-credentials”} # pyright: ignore[reportArgumentType]
)

print(catalog.list_namespaces())

ns = catalog.list_namespaces()

ns_name, table_name = catalog.list_tables(ns[0])[0]

table = catalog.load_table(f”{ns_name}.{table_name}”)

# 3. Query with filters and column selection
df = table.scan(limit=100).to_pandas()

print(df.head())

This will show all data that has been created and stored in the Iceberg table.

Finally, you might want to enable Wellspring Health to view a data visualization dashboard for the high-value claims and provider claim spike datasets. To build it, you’ll create an app by using the Streamlit framework that uses the data pipelines enabled by Confluent Cloud to provide a dashboard view to stakeholders.

First, some configuration fields for the user:

import json
import requests
import pandas as pd
import streamlit as st
from requests.auth import HTTPBasicAuth
from datetime import datetime

import threading
import queue
import datetime

from confluent_kafka.schema_registry import SchemaRegistryClient

from dotenv import load_dotenv
import os

# Load env variables
load_dotenv()

st.set_page_config(page_title=”Fraudulent Claims Dashboard”, layout=”wide”)

st.markdown(“””
    <style>
        html, body, [data-testid=”stAppViewContainer”] {
            overflow-anchor: none !important;
        }
    </style>
“””, unsafe_allow_html=True)

st.title(“Claims Streaming Dashboard”)
st.markdown(“””This dashboard continuously queries a ksqlDB streams and displays high value medical claims in real time.”””)

from streamlit_autorefresh import st_autorefresh

st_autorefresh(interval=2000, key=”stream_refresh”)

# -------------------------------------------------------------------
# Configuration
# -------------------------------------------------------------------

KSQL_ENDPOINT = st.sidebar.text_input(
    “ksqlDB Endpoint”,
    value=os.getenv(“KSQL_ENDPOINT”)
)

KSQL_API_KEY = st.sidebar.text_input(
    “KSQL API Key”,
    type=”password”
)

KSQL_API_SECRET = st.sidebar.text_input(
    “KSQL API Secret”,
    type=”password”
)

MAX_ROWS = st.sidebar.slider(
    “Maximum Rows”,
    min_value=10,
    max_value=500,
    value=100,
    step=10
)

START_STREAM = st.sidebar.button(“Start Stream”)

### Queries
PROVIDER_SPIKE_QUERY = “””
    SELECT *
    FROM provider_claim_spikes
    EMIT CHANGES;
“””

HIGH_VALUE_QUERY = “””
    SELECT *
    FROM high_value_claims
    EMIT CHANGES;
“””

if “claims” not in st.session_state:
    st.session_state.claims = []
    st.session_state.spikes = []
    st.session_state.stream_started = False
    st.session_state[‘event_queue’] = queue.Queue()

Powered by Granite

Next, retrieve the schemas from the schema registry. You’ll need to get the generated names of your derived topics from the topics view in Confluent Cloud.

# get all of our schemas once
if “high_val_cols” not in st.session_state:

    sr_conf = {‘url’: str(os.getenv(“REGISTRY_URL”)),
            ‘basic.auth.user.info’:f’{str(os.getenv(“REQUESTS_API_KEY”))}:{str(os.getenv(“REQUESTS_API_SECRET”))}’}
    schema_registry_client = SchemaRegistryClient(sr_conf)

    high_val_schema_str = schema_registry_client.get_version(“pksqlc-4my2e5kHIGH_VALUE_CLAIMS-value”, schema_registry_client.get_versions(“pksqlc-4my2e5kHIGH_VALUE_CLAIMS-value”)[-1]).schema.schema_str

   print(high_val_schema_str)

    schema = json.loads(high_val_schema_str) # type: ignore
    st.session_state.high_val_cols = [f[‘name’] for f in schema[‘fields’]]

    spike_schema_str = schema_registry_client.get_version(“XXXX-XXXXXXPROVIDER_CLAIM_SPIKES-value”, schema_registry_client.get_versions(“XXXX-XXXXXXPROVIDER_CLAIM_SPIKES-value”)[-1]).schema.schema_str

    schema = json.loads(spike_schema_str) # type: ignore
    st.session_state.spike_cols = [f[‘name’] for f in schema[‘fields’]]

    print(spike_schema_str)

Powered by Granite

Next, create a streaming function to grab updates from Confluent:

st.subheader(“High Value Claims”)
hv_metrics = st.container(height=500)
hv_table = st.empty()

st.subheader(“Provider Claim Spikes”)
ps_metrics = st.container(height=500)
ps_table = st.empty()

placeholder_status = st.empty()

def stream_query(query_name, sql, event_queue, data_columns):

    print(‘streaming’)

    response = requests.post(
        KSQL_ENDPOINT,
        auth=HTTPBasicAuth(os.getenv(“REQUESTS_API_KEY”), os.getenv(“REQUESTS_API_SECRET”)),
        headers={
            “Content-Type”: “application/vnd.ksql.v1+json; charset=utf-8”
        },
        json={
            “sql”: sql
        },
        stream=True
    )

    columns = None

    for line in response.iter_lines():

        if not line:
            continue

        decoded = line.decode(“utf-8”)

        try:
            record = json.loads(decoded)
            print(record)
        except:
            continue

        # First row is columns
        if columns is None and isinstance(record, dict):
            columns = record[‘columnNames’]
            continue

        values = record

        row = dict(zip(columns, values)) # type: ignore

        row[“stream”] = query_name # type: ignore

        event_queue.put(row)

Powered by Granite

Finally, create the stream when the user enters credentials and display each record as it comes in as well as a sum of the number of claims and claim amounts.

if START_STREAM and not st.session_state.stream_started:

    st.session_state.stream_started = True

    print(“starting”)

    try:

        threading.Thread(
            target=stream_query,
            args=(“spike”,
                  PROVIDER_SPIKE_QUERY,
                  st.session_state.event_queue,
                  st.session_state.spike_cols),
            daemon=True
        ).start()

        threading.Thread(
            target=stream_query,
            args=(“high_val_claim”,
                  HIGH_VALUE_QUERY,
                  st.session_state.event_queue,
                  st.session_state.high_val_cols),
            daemon=True
        ).start()

    except requests.exceptions.RequestException as e:
        placeholder_status.error(f”Connection error: {e}”)

    except Exception as e:
        placeholder_status.error(f”Unexpected error: {e}”)

## outside of block
while not st.session_state.event_queue.empty():

    print(“event”)

    event = st.session_state.event_queue.get()

    if(event[‘stream’] == ‘spike’):
        data_rows = {k: v for k, v in event.items() if k != ‘stream’}
        data_rows[‘WINDOWSTART’] = datetime.datetime.fromtimestamp(data_rows[‘WINDOWSTART’]/1000).strftime(‘%c’)
        data_rows[‘WINDOWEND’] = datetime.datetime.fromtimestamp(data_rows[‘WINDOWEND’]/1000).strftime(‘%c’)
        st.session_state.spikes.insert(0, data_rows)

    if(event[‘stream’] == ‘high_val_claim’):
        data_rows = {k: v for k, v in event.items() if k != ‘stream’}
        st.session_state.claims.insert(0, data_rows)

claims_df = pd.DataFrame(st.session_state.claims, columns=st.session_state.high_val_cols)

with hv_metrics.container():

    if(len(claims_df) > 0):

        col1, col2 = st.columns(2)

        col1.metric(“High Value Claims”, len(claims_df))
        col2.metric(“Total High Claim Amount”, f”${claims_df[‘CLAIM_AMOUNT’].sum():,.2f}”)

        st.dataframe(
            claims_df,
            width=’stretch’,
            height=300
        )

spikes_df = pd.DataFrame(st.session_state.spikes, columns=st.session_state.spike_cols)

with ps_metrics.container():

    if(len(spikes_df) > 0):
        ps_col1, ps_col2 = st.columns(2)

        ps_col1.metric(“Claim Spikes Observed”, len(spikes_df))
        ps_col2.metric(“Total Claim Spike Value”, f”${spikes_df[‘TOTAL_CLAIM_VALUE’].sum():,.2f}”)

        st.dataframe(
            spikes_df,
            width=’stretch’,
            height=300
        )

Powered by Granite

You can run this streamlit app by using the following command in your Python environment:

streamlit run dashboard_two_stream.py

Powered by Granite

This shows the power of how data streaming platforms like Confluent Cloud can create a modern platform to centralize and simplify Kafka deployments and streamline data collection, validation and data storage. Ensuring that schemas are correctly applied and automatically generating Iceberg tables doesn’t require instrumentation or external storage. This all enables high volumes of data to be processed in a low-latency, near real-time fashion that enables real-time insights from various data sources.

Modernizing Kafka Data Streaming with Confluent

Modernizing Kafka data streaming with Confluent

Step 1—Create an account

Step 2—Create the first topic

Step 3—Create derived topics

Step 4—Query derived topics

Step 5—Set up Tableflow

Step 6—Query Tableflow Iceberg

Step 7—Dashboard

Authors

Resources