Human and operator errors
Many system outages are caused by human error in running the system correctly.
A Gartner report says that "an average of 80 percent of mission-critical application service downtime is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster.” (NSM: Often the Weakest Link in Business Availability, AV-13-9473, July 3, 2001)
The best prevention is strict change control, documented procedures, training, and supervision.
Recovery from human-induced outages could range from restarting services to recovering a corrupted database.