Cascading failure
A large application system has many working components ranging from physical disk drives, operating systems, interfaces to external systems, application servers to database. All of these highly interconnected and dependent components must work well for the system to perform.
A defense logistics officer once gruffly reminded a group of young pilots that their new fighter jet was nothing more than 50,000 parts flying in tight formation. This message has many parallels to any large computing systems.
Lets us assume that a Sterling™ Order Management System Software transaction calls out to an external system to check on item availability. If that external system is unable to scale or performs poorly, that Sterling Order Management System Software transaction waits, which results in a thread being blocked. If there are many requests for that transaction, the system could become stalled when all the threads become blocked. As a result, a poorly tuned system could have a ripple effect on integrated systems.
This document presents some of these interdependencies along with approaches to monitoring them.