🚀 Stay Ahead with the IBM Sterling OMS Self Service Alert Framework

General Page

In the dynamic world of production systems, unforeseen challenges are inevitable. What truly sets great operations apart is the speed and precision with which these challenges are detected and addressed, long before customers feel any impact.

Today, I’m incredibly excited to highlight the continuous evolution of the IBM Sterling Order Management (OMS) Self-Service Alert Framework.

Our journey began with essential MQ queue depth-based alerts, which provide early detection of message flow issues. Since then, we’ve introduced several powerful OMS functional alerts that detect critical production issues well before they escalate into full-blown incidents. And what excites me even more is what the future holds in this space ...

In this blog, I’d like to walk you through:

The different types of alerts we now have
How they work in real-world scenarios
Why are these capabilities so critical to stable, performant OMS environments

For detailed configuration steps, parameters, and available metrics, you can refer to the official IBM Sterling OMS Self-Service Alert Framework documentation here.

🛠️ Exploring the Powerful Alerts in IBM Sterling OMS

The IBM Sterling OMS Self Service Alert Framework offers a comprehensive set of tools designed to keep your system healthy and your business running smoothly. These alerts fall into two main categories, each targeting a different layer of your environment:

📬 MQ Alerts — Keeping a Pulse on Your Messaging Backbone

Think of IBM MQ queues as the lifeblood of your asynchronous workloads. Our MQ alerts monitor these vital arteries by tracking message volumes, queue capacities, and flow rates. This early detection helps prevent bottlenecks, conserve resources, and keeps messages flowing seamlessly — all crucial for operational resilience.

You’ll find alerts like:

Queue Depth Alert 📥
Queue Full Percentage Alert 📊
Queue Messages In Alert ➕
Queue Messages Out Alert ➖

❤️ OMS Alerts — Watching Over Your Application’s Heartbeat

Beyond infrastructure, your OMS servers handle real-time processing and critical business transactions. OMS alerts focus on error rates, response times, and business process statistics, giving you instant insights into application health, transaction integrity, and overall business flow.

These include alerts such as:

Application Server and Agent/Integration Server Error Rate Alerts 🔥🔄
OMS Agent Not Processing Workload 🛑
OMS Response Time Statistics ⏱️
OMS Order and Shipment Statistics 📦🚚

Together, these alert types provide a powerful 360-degree view, enabling your team to detect issues early, respond fast, and keep your OMS environment running at peak performance.

📨 MQ Alerts

📥 Queue Depth Alert

Tracking the current depth of a queue provides one of the most direct indicators of integration health.

Take the Create Order flow as an example — order XMLs captured by a front-end system are dropped into a queue for OMS integration servers to process. If the integration server stalls or crashes, the queue depth will climb rapidly.

To proactively manage this, set up an alert that notifies your team once the queue depth crosses a critical threshold. This early warning allows for rapid response and keeps order processing on track.
→ Set an alert to notify if the createOrder queue depth exceeds 1000 messages for 10 minutes.

📊 Queue Full Percentage Alert

Sometimes, absolute thresholds aren’t practical because message volume can vary widely. Monitoring queue fullness as a percentage of maximum capacity provides a more flexible approach.

Imagine an inventory feed queue receiving updates from numerous stores and distribution centers throughout the day. Since volumes fluctuate, setting an alert based on fullness percentage can signal when resource pressure becomes risky.
→ Configure an alert if queue fullness remains above 75% for 3 hours.

➕ Queue Messages In Alert

Monitoring the number of messages added to a queue during a time window is essential for spotting upstream issues.

Returning to the Create Order queue, while queue depth alerts track consumer performance, message-in metrics ensure orders are being successfully produced and pushed into the queue.

If the number of incoming messages drops unexpectedly, it could indicate upstream failures or issues with the external system feeding OMS.

Additionally, this metric helps identify surges in workload (e.g., promotional events), enabling teams to scale integration servers proactively.
→ Alert if no messages arrive in the createOrder queue for 30 minutes.

➖ Queue Messages Out Alert

Similarly, tracking messages consumed from a queue verifies that consumers are actively processing workloads.

In low-traffic queues, queue depth may remain below thresholds even if no messages are processed — a silent failure that could cause delays and bottlenecks.

An alert based on message consumption ensures such issues are caught quickly, no matter the queue size.
→ Notify the team if no messages have been consumed from a queue for 2 hours.

💡 Pro Tip:
To effectively find out the right threshold for these alerts, you can check the JMS Metrics dashboard in the Monitoring tab to understand the historical trend and decide based on it.

📦 OMS Alerts

While MQ alerts safeguard the messaging infrastructure, OMS alerts monitor application health and business process integrity — critical for smooth, error-free operations.

❗ Why Error Rates Matter:
When errors occur in OMS — whether in application servers or agent/integration servers — exceptions might get logged into the YFS_INBOX table. If the transaction is marked as reprocessable, it also gets added to the YFS_ERROR_REPROCESS table. Both of these tables are heavily accessed by OMS processes and by business users to track operational health and transaction exceptions. High error rates can flood these tables, increase database contention, degrade user experience, and impair overall system performance. That’s why proactively monitoring error rates is vital.

🔥 OMS Application Server Error Rate

Application servers handle a wide range of real-time workloads — from REST API calls to store and call center transactions. An uptick in error rates here signals immediate business impact and resource strain.

Additionally, for application servers, a high error rate can also lead to unnecessary thread utilization, potentially blocking other critical request flows and degrading overall server performance.

→ Alert if application server error rate exceeds 2% over 10 minutes.

🔄 Agent/Integration Server Error Rate

Agent and integration servers process asynchronous workloads. High error rates here can lead to repeated transaction reprocessing, database bottlenecks, and wasted resources.

Detecting sustained errors enables faster remediation, prevents cascading failures, and keeps database load under control.
→ Trigger an alert if error rates exceed 30% for 30 minutes.

Note: The following OMS alerts are powered by operational statistics collected within OMS — similar to the metrics captured in the YFS_STATISTICS_DETAIL table. These metrics provide vital insights into server performance, transaction processing, and business activity trends, enabling proactive issue detection and resolution.

🛑 OMS Agent Not Processing Workload

Sometimes agents appear healthy at the JVM level but fail to pull or process jobs. Without error logs, this condition can remain hidden and stall business processes.

By monitoring getJobs statistics for the agent server, you can confirm actual processing activity.
→ Alert if getJobs processed remains zero for 2 hours.

⏱️ OMS Response Time Statistics

API and service response times directly affect user experience and integration reliability. Slow responses in critical service calls, like the findInventory API, can cause checkout delays or failures.

Alerting on response times lets you fix performance degradations before users notice.
→ Notify if the findInventory API response time exceeds 1500 ms over 10 minutes.

📦🚚 OMS Order and Shipment Statistics

Business operations demand that orders and shipments move smoothly through the system. Tracking key lifecycle metrics signals bottlenecks early, enabling swift action.

Metrics include:

numOrderLinesCreated
numOrderLinesReleased
numOrderLinesScheduled
numOrdersCreated
numOrdersReleased
numOrdersScheduled
numShipmentInvoicesCreated

For instance, a sudden drop in order creation could indicate upstream issues or system outages.
→ Set an alert if numOrdersCreated falls below 1 in a 2-hour window.

🔍 From Detection to Resolution

While no system can guarantee perfection, operational excellence is defined by how quickly you identify and resolve issues before they impact end users.

The continuously evolving IBM Sterling OMS Self Service Alert Framework equips your team to detect issues early, respond decisively, maintain system stability, and consistently meet business goals.

With these capabilities, your IBM Sterling OMS environment is well-positioned to deliver reliable, seamless customer experiences — every time.

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSGTJF","label":"IBM Sterling Order Management System"},"ARM Category":[{"code":"a8m0z000000cxzbAAA","label":"SaaS Components"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Tips