IBM Support

ServerBusyException and TimeoutException Error in Microsoft Azure Event Hub when Sending & Receiving

How To


Summary

Errors in Microsoft Azure Event Hubs when sending, receiving, or capturing events can stem from various causes, including connectivity issues, configuration errors, permission problems, or resource limitations.

Objective

See outline below of common issues and troubleshooting steps for each scenario along with practical solutions to resolve ServerBusyException and timeoutException Errors in Microsoft Azure Event Hub.

Environment

Azure

Steps

 1. Errors When Sending Events

 Common Errors

- ServerBusyException: Indicates the Event Hubs namespace is being throttled due to insufficient throughput units or uneven partition distribution.

  - Resolution:

    - Increase throughput units in the Azure portal under the "Scale" or "Overview" page of the Event Hubs namespace.

    - Revise partition distribution strategy or use `EventHubClient.Send(eventDataWithoutPartitionKey)` to balance load across partitions.

- TimeoutException: Occurs when an operation takes longer than the specified timeout, often during maintenance or due to incorrect configuration.

 

  - Resolution:

    - Verify the connection string or fully qualified domain name (FQDN) is correct. Use the Azure portal, CLI, or PowerShell to retrieve the correct connection string.

    - Adjust timeout settings in the client code (e.g., via `ServiceBusConnectionStringBuilder`).

    - Check for service outages on the Azure service status site.

- Connection Timed Out (SocketException): Seen when sending high-frequency events (e.g., 100 requests per second).

 

  - Resolution:

    - Use AMQP over WebSockets (`transportType: AmqpWebSockets`) to reduce connection latency.

    - Increase throughput units to handle high request rates.

    - File a support ticket if the issue persists for deeper analysis.

- EventHubError: ResourceLimitExceeded: Occurs when exceeding the maximum number of receivers per partition (limit: 5 per consumer group).

  - Resolution:

    - Reuse `EventHubConsumerClient` instances instead of creating new ones per request. Treat clients as singletons to avoid exceeding connection limits.

    - Use different consumer groups for multiple consumers to distribute load.

 General Troubleshooting for Sending

- Verify Connection String: Ensure the connection string includes the correct `EntityPath` for the specific Event Hub.

- Check Network Configuration: Confirm outbound traffic is allowed for Event Hubs (ports: 5671 for AMQP, 443 for HTTPS/AMQP over WebSockets). If using a virtual network, ensure the subnet has access to the namespace or add the application’s IP to the IP firewall.

- Protocol Configuration: Use AMQP for reliable, high-performance sending. For Kafka clients, verify `producer.config` settings.

- Authentication: Ensure the Shared Access Signature (SAS) token or Microsoft Entra ID credentials are valid and have send permissions.

 2. Errors When Receiving Events

 Common Errors

- No Messages Received Despite Requests: Successful requests appear in metrics, but no messages are ingested.

  - Resolution:

    - Use the Event Hubs Data Explorer to verify ingested messages.

    - Check if the producer is sending messages correctly (e.g., ensure message body is not null).

    - Verify the consumer group exists and the receiver is subscribed to the correct partition.

    - Ensure the sender has proper permissions to send messages.

- ConnectionLostError: Occurs when the connection is idle, causing the service to disconnect the client.

  - Resolution:

    - This is normal behavior; clients automatically reconnect. Ensure retry policies are configured (e.g., `retry_total`, `retry_backoff_factor` in Python SDK).

    - Use `keep_alive` functionality in `EventHubProducerClient` for long-lived connections.

- Receiver Disconnected Error: Caused by multiple receivers with the same epoch or load balancing issues.

 

  - Resolution:

    - Ensure a higher epoch is used when recreating receivers.

    - Check if too many processors are configured for the same Event Hub and consumer group. Scale down or use different consumer groups.

    - Verify the checkpoint store (e.g., Blob Storage) does not have soft delete enabled.

- Messages Stop Being Delivered: No errors in logs, but messages stop arriving.

  - Resolution:

    - Check partition IDs and ensure receivers are set up for all partitions.

    - Monitor Azure Event Hubs metrics for spikes in message processing errors.

    - Increase logging on the client side to capture detailed errors.

 General Troubleshooting for Receiving

- Consumer Configuration: Ensure `EventHubConsumerClient` is configured with a valid checkpoint store (e.g., BlobCheckpointStore) for load balancing and checkpointing.

- Partition Issues: If receiving from a specific partition, verify the `partition_id` is correct. For multiple partitions, ensure load balancing is enabled.

- Logging: Enable verbose logging in the client SDK to capture errors. For example, in Python, use `logging.getLogger("azure.eventhub")`.

- Diagnostics: Check Azure Monitor’s `RuntimeAuditLogs` for `Status: Failure` entries. Correlate with client-side logs to identify the root cause.

 3. Errors When Capturing Events

 Common Errors

- MessagingGatewayBadRequest: Occurs when capturing events to a storage account due to missing permissions.

  - Resolution:

    - Assign the “Storage Blob Data Contributor” role to the user or application at the storage account scope. Steps:

      1. Navigate to the storage account in the Azure portal.

      2. Go to “Access Control (IAM)” > “Add role assignment.”

      3. Select “Storage Blob Data Contributor” and assign it to the relevant user or service principal. (

    - Ensure the storage account supports block blobs and is not a premium account.

- Capture Failure in Diagnostics: Capture failures appear in diagnostic logs (e.g., backlog or failure metrics).

  - Resolution:

    - Check if the storage account is temporarily unavailable. Event Hubs retains data for the configured retention period and backfills once the storage account is available.

    - Verify the storage account is in the same subscription as the Event Hub and public access is enabled or trusted services are allowed.

    - Monitor capture metrics in Azure Monitor to identify failure patterns.

- Empty Capture Files: Files are written to storage, but the body is null.

  - Resolution:

    - Ensure the producer is sending valid event data.

    - Check capture settings (e.g., time window, size window) in the Azure portal to confirm they align with event frequency.

    - Verify the storage container is correctly configured in the Event Hubs capture settings.

 General Troubleshooting for Capturing

- Capture Configuration: Enable capture via the Azure portal with appropriate time (1–15 minutes) and size (10–500 MB) windows. Ensure the storage account is Azure Blob Storage or Data Lake Storage Gen 2.

- Storage Account Issues: Confirm the storage account is accessible and not deleted or misconfigured. Avoid using Azure Data Lake Storage Gen 1, as it is retired.

- Diagnostics: Enable diagnostic settings to monitor capture metrics (e.g., capture backlog, failures) in Azure Monitor.

- Retention Period: Ensure the Event Hub’s retention period is sufficient to handle temporary storage unavailability.

 General Best Practices

- Monitor Azure Metrics: Use Azure Monitor to track metrics like `Incoming Messages`, `Outgoing Messages`, and `Capture Backlog`. Look for anomalies in request counts or errors.

- Enable Logging: Implement verbose logging in your client application (e.g., Azure SDK for .NET, Python, or Java) to capture detailed error information.

- Check Quotas: Ensure you haven’t exceeded Event Hubs limits (e.g., 20 consumer groups, 5 receivers per partition).

- Retry Policies: Configure retry options in the SDK to handle transient errors (e.g., `retry_total`, `retry_backoff_factor` in Python).

- Network Security: If using a virtual network or firewall, verify the `EventHub` service tag is allowed, and ports 5671 and 443 are open.

- Azure Support: If issues persist, file a support ticket via the Azure portal for deeper investigation, especially for timeout or throttling errors.

 Code Example for Robust Sending/Receiving (Python)

```python

import asyncio

from azure.eventhub.aio import EventHubProducerClient, EventHubConsumerClient

from azure.eventhub import EventData

async def send_events(connection_str, eventhub_name):

    producer = EventHubProducerClient.from_connection_string(connection_str, eventhub_name=eventhub_name)

    async with producer:

        event_data_batch = await producer.create_batch()

        event_data_batch.add(EventData("Test Event"))

        try:

            await producer.send_batch(event_data_batch)

            print("Events sent successfully")

        except Exception as e:

            print(f"Error sending events: {e}")

async def receive_events(connection_str, eventhub_name):

    consumer = EventHubConsumerClient.from_connection_string(connection_str, consumer_group="$Default", eventhub_name=eventhub_name)

    async with consumer:

        async def on_event(partition_context, event):

            print(f"Received event: {event.body}")

            await partition_context.update_checkpoint()

       

        await consumer.receive(on_event=on_event, max_wait_time=5)

 Run the async functions

connection_str = "Endpoint=sb://<namespace>.servicebus.windows.net/;SharedAccessKeyName=<key>;SharedAccessKey=<value>;EntityPath=<eventhub>"

eventhub_name = "<eventhub>"

asyncio.run(send_events(connection_str, eventhub_name))

asyncio.run(receive_events(connection_str, eventhub_name))

```

- Notes: This code includes error handling and checkpointing. Adjust `max_wait_time` and retry policies as needed. Ensure the connection string is valid and includes `EntityPath`.

 Additional Resources

- Azure Event Hubs Documentation: [Troubleshoot connectivity issues]

- Capture Overview: [Event Hubs Capture]

- Diagnostics: [Monitor Event Hubs]

- SDK Troubleshooting: [.NET SDK]

[Github Python SDK’s]

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSTKH9","label":"Microsoft Azure"},"ARM Category":[{"code":"a8mKe000000004XIAQ","label":"AZURE"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Document Information

Modified date:
11 August 2025

UID

ibm17241957