SLA options for your VM in IBM SmartCloud Enterprise+ (Part 2)

Welcome back! In my earlier post SLA options for your VM in IBM SmartCloud Enterprise+ (Part 1) we discussed about the SLA types and how they define the HA for VMware virtual machines. In this blog post I’ll be discussing the SLA options and HA for the LPARs over SystemP.

High availability (HA) of AIX LPARs in IBM Smart Cloud Enterprise+ (SCE+) is achieved through some of the configurations in the AIX operating system directly. AIX operating system has some of the finest mechanisms to help us achieve this. SCE+ configures priority hang detection, lost I/O hang detection and crash detection as part of HA enablement. Let’s discuss what these properties are and how they enable the HA environment.

Priority hang detection

All processes (also known as threads) run at a priority. This priority is in the range 0-126, with 0 highest priority and 126 the lowest. The default priority for all threads is 60. Any user can lower the priority of a process by using the nice command. Anyone with root authority can also raise a process’s priority.

The kernel scheduler always picks the highest-priority runnable thread to put on a CPU. It is therefore possible for a sufficient number of high-priority threads to completely tie up the machine such that low-priority threads can never run. If the running threads are at a priority higher than the default of 60, this can lock out all normal shells and logins to the point where the system appears hung. The system hang detection feature provides a mechanism to detect this situation and give the system administrator a means to recover from it. This feature is implemented as a daemon (shdaemon) that runs at the highest process priority. This daemon queries the kernel for the lowest-priority thread run over a specified interval. If the priority is above a configured threshold, the daemon can take one of several actions. Each of these actions can be independently enabled, and each can be configured to trigger at any priority and over any time interval.

System hang detection is configured in IBM SmartCloud Enterprise+ with an action to reboot the system, via the shconf command:

$: shconf -l pio -a pp_reboot=enable -a pp_rto=priority hang timeout based on SLA

Lost I/O hang detection

AIX can also detect I/O hang conditions and try to recover from them, based on user-defined actions.

I/O errors can cause the I/O path to become blocked, further affecting I/O on that path. In these circumstances it is essential that the OS alert the user and execute user-defined actions. As part of the lost I/O detection and notification, the shdaemon— with the help of the Logical Volume Manager — monitors the I/O buffers over a period of time and checks if any I/O is pending for too long a time period. If the wait time exceeds the threshold wait time defined by the shconf file, a lost I/O is detected and further actions are taken. The information about the lost I/O is documented in the error log. Also based on the settings in the shconf file, the system might be rebooted to recover from the lost I/O situation.

Lost I/O hang detection is configured in IBM SmartCloud Enterprise+ with an action to reboot the system, via the shconf command:

$: shconf -l lio -a lio_reboot=enable -a lio_to=Lost I/O timeout based up on SLA

Crash detection

If the OS crashes, an automatic restart should be enabled for continuity. Crash detection in IBM SmartCloud Enterprise+ enables a reboot via the chdev command, which changes the system object device property called autorestart to true.

$: chdev -l sys0 -a autorestart=true

The table below shows the values configured for these HA features in each SLA:

smartcloud-high-availabilitySLA settings for HA-enabling properties on System p

IBM recommends these values. Vendors, cloud administrators, or users can adjust the timeout intervals, depending on the prevailing environment and workload conditions.

For more details about the above features refer to the AIX infocenter.

Share this post:

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Archive Stories

The new “C” in CSP: from communications to cloud service provider

I’m sure you’ve seen the estimates from analysts on public cloud market opportunity and expected growth. They vary depending on the study, but conservative numbers have the opportunity at $89 Billion in 2015. This prediction represents a 25% compound growth rate (CGR) for all public services. Analysis shows that the software-as-a-service (SaaS) and infrastructure-as-a-service (IaaS) […]

Continue reading

Cloud and the Internet of Things: Inextricably linked

Today, the Internet of Things is more than a dream or an idea—it is a reality. The Internet of Things has changed not only the way we interact with each other (humans), but also the way we interact with the “things” in this world. Now you can interact with your car, a security cam at […]

Continue reading

Decentralized control in IBM SmartCloud Provisioning

IBM SmartCloud Provisioning is designed to minimize the use of a centralized “command and control” approach, in favor of scale-out management, where endpoints can participate in management activities and do not depend on a single configuration management database.

Continue reading