Getting started: Accelerate your transition to the cloud with DevOps
What do you do when you find out about a problem not from an incident but from a help ticket? Follow this scenario and learn some proactive measures you can take to avoid future problems.
Monitoring. Developers and IT Operations work together to deliver innovation at speed and scale, leveraging cloud native technologies such as containers, Kubernetes and ITOps. Successful enterprises are adopting these technologies for cloud-native applications and to modernize their existing ones to deliver business agility.
In the previous scenarios, you learned about the incident queue and the features for managing and handling incidents. In this scenario, the ITOps team notices some peculiar behavior that they'd like to monitor. You'll create a threshold to test for this behavior and use the Resources pages to help you fine tune the threshold definition.
Open the Threshold page
Go to Administer > Monitoring > Thresholds on the IBM Cloud Pak console.
In the Thresholds front page, you see a list of thresholds ordered by Permissions (Editable and Read-only), Assigned to (resource type, such as DB2 Instance and Linux Systems) and Name. For each threshold, you can see such information as the resource it monitors, the severity, and whether it is enabled and or disabled.
Create a threshold
ITOps told you they were having disk file space issues on their Linux systems, so you create a threshold to monitor the percentage of time spent in read operations:
- Click Create to define a new threshold.
- In the Details section, name the threshold
Linux\_disk\_reads\_80\_percent
and add a helpful description such as "Warning event for disk read time of 80% or more". -
In the Threshold section, select the Linux systems resource type and warning severity. For the condition, specify Disk IO Ext Disk Read Percent greater than or equal (>=) to 80 percent.
-
Scroll to the Assignments section, select the Individual instance option, click in Select instances to open a list of Linux system instances, select an instance.
-
Click Finish to save the threshold and return to the Thresholds page, which shows your new threshold.
Review the Resources dashboard
Find out if the threshold you created is opening events where they're needed. Some thresholds might need fine tuning. Go to Monitor health > Infrastructure monitoring on the console. The resource types are displayed in alphabetical order.
All resources
Type linux
in the search box to quickly find the Linux Systems resource type, and click the Linux Systems link.
Resource type
After you select Linux Systems, the resource instances are listed alphabetically.
Resource instance
Select the link for the same resource instance that you selected in step 4 when you defined the threshold. The instance dashboard is displayed with metrics from the past 12 hours. You can adjust the time span to show from the past 30 minutes up to one month at a time. If you have data retention configured, you can see up to one year of saved samples from the data provider.
Drag the dropped pin along the Events timeline to see the values displayed on every chart at that point in time. Any numbers along the timeline represent the number of events of the same type that are in close succession. Hover the mouse over an event marker to see when the event was opened and what triggered it. Check each marker, including the values before and after each event, to see if you can find a pattern.
Operating system dashboards plot metrics for the system characteristics. Scroll through the dashboard sections:
- Click a point or drag along the x-axis of a chart to read the values at that time.
-
Select a device name in the Disk Device table to see the transfers per second for that device in the corresponding line chart. You can sort the table by clicking a column, and filter it by entering a value in the filter box.
-
Expand the Process section and select multiple process IDs to see how the CPU (%) and Resident memory (MB) charts aggregate the values.
You can use the Collapse and
Expand twisties to show only what's of interest
to you.
What if you want to see other metrics?
There's another metric you'd like to check: page outs, which might indicate memory issues:
- Scroll down to the Custom Metrics section and expand the view.
- Open the Filter metric drop-down list and select
System Pages Paged Out Per Second
.
The list shows the metrics that are available for selection from the Linux data provider. You can add other metrics, but this metric and your analysis of the metrics around the event times is enough to tell you that the threshold you created needs a minor adjustment.
Edit the threshold definition
Return to Thresholds and edit the threshold that you created earlier:
-
Go to Administer > Monitoring > Thresholds.
-
Find the
Linux\_disk\_reads\_80\_percent
threshold in the list and click the link to edit the threshold. - Change the Disk Read Percent value from 80 to 75.
- You tested the threshold on one resource and now want to disseminate it to all your Linux systems, so you change your selection in the Assign to resources section to
Resource group
. - Select Finish to see the edited threshold assigned to
Linux Systems
.
Previous topic: Getting Started: Proactively manage the health of your application environment – regardless of size
You are the operations lead and want to automate some incident handling by adding a new policy. Follow this scenario to learn more about incident policies and user profiles and how they are manifested in the incident queue.
Next topic: Getting started: Performing SRE functions