The IBM Cloud Pak console is the user interface for monitoring your mission-critical applications.
Based on the award winning design for IBM® Cloud Event Management, Monitoring is interactive and scalable:
- In the Resources dashboards, you can select other metrics that you'd like to see and compare, use the time slider to adjust the dates shown, and use the time selector to adjust the time range from the past 3 hours to the past month.
- You can quickly sort through and filter lists and tables to shows only what you're interested in.
Use the console to check the status of your applications and respond to incidents. The dashboards simplify problem identification with the incident management capability and dashboard navigation that takes you from a view of application status to code level detail. You have visibility into source code problems at the exact moment of an issue.
Take a look at the usage scenarios to learn more about what you can do in the IBM Cloud Pak console.
Getting started: Manage dynamic application and infrastructure environments
As a developer or IT operator, you want to be able to quickly isolate and focus on issues affecting your application or the environment that is hosting your application. Follow this scenario to generate some sample incidents and learn about the incident queue in the IBM Cloud Pak console.
Getting started: Collaborate to rapidly resolve problems
You assign incidents to other users or work on them yourself to keep the work evenly distributed. Follow this scenario to learn more about incident research and discovery on IBM Cloud Pak console.
Getting Started: Proactively manage the health of your application environment – regardless of size
You're the operations lead and want to automate some incident handling by adding a new policy. Follow this scenario to learn more about incident policies and user profiles and how they are manifested in the incident queue.
Getting started: Accelerate your transition to the cloud with DevOps
What do you do when you find out about a problem not from an incident but from a help ticket? Follow this scenario and learn some proactive measures you can take to avoid future problems.
Getting started: Performing SRE functions
How can Monitoring help simplify monitoring and quicken time to resolution? Let's look at a typical scenario to see how:
Incident Resolution Flow for a Kubernetes App
Resolving an incident in a Kubernetes service in Monitoring. In this example, John (an SRE) is notified by an incident that is created with high latency with the stock trader service. John restores the service by creating a hypothesis and following it through to determine whether he isolated the problem. If the problem is not resolved, John creates another hypothesis. This example shows how this process is accomplished in Monitoring.
Incident Resolution Flow for a traditional VM-based App
Resolving an incident in a traditional VM-based app in Monitoring. In this example, Todd (IT Ops - not an SRE) receives an alert from a threshold breach. Todd has thresholds set on the application's user experience metrics such as slow synthetic transactions or slow real user transactions. Todd restores the service by creating a hypothesis and following it through to see whether he isolated the problem. If not, he creates another hypothesis. This example shows how this process is accomplished in Monitoring. Note: This incident is more difficult to resolve since Todd does not have a deep understanding of how the app works. He is looking at the infrastructure metrics to try to find the problem.