Reliability, Availability, and Serviceability (RAS)
RAS areas for Linux on Power
Intro
The Reliability, Availability, and Serviceability (RAS) team at IBM’s Linux Technology Center (LTC) contributes broadly to RAS areas of Linux and OPAL firmware. The team works on tracing infrastructure (static and dynamic) for the Linux kernel and user space and its associated tooling such as perf. The primary focus is on IBM’s Power platform and its capabilities, including infrastructure for First Failure Data Capture (FFDC) and tooling to analyze the captured data. On the OPAL front, the team is responsible for making the OPAL-based Power platform robust against hardware errors such as CPU, cache, memory, and clock, provide error logging capabilities and firmware-assisted RAS capabilities for better service of the system.
In keeping with the rich tradition of RAS on Power Systems, the team also maintains an extensive suite of tools exclusive to the platform. More recently, the team has ventured into RAS aspects of cloud and containers — making perf container aware.
Technical resources
Service and productivity tools
The service and productivity tools are available in a YUM repository that you can use to download and install all recommended packages for your Red Hat Enterprise Linux (RHEL), CentOS, SUSE LINUX ENTERPRISE SERVER (SLES), or Fedora Linux distribution.
IBM POWER9 in-memory collection counters
IBM Developer article that provides information about nest PMU counters and Linux perf integration.
Perf annotate
Perf is a powerful performance analysis tool and is a combination of two different components: userspace tool and kernel infrastructure.
Linux Diagnostic Tools
Tools for diagnosing Linux systems.
Perf wiki
Linux profiling with performance counters.
Perf tutorial
Read this tutorial for more information about Perf.
BPF and XDP Reference Guide
Understand BPF and XDP in great technical depth.
Code repositories
The various tools, packages, and their code repositories are described in this table.
| Tools | Repositories |
|---|---|
| Linux PowerPC tree | |
| Linus kernel repository | |
| BCC | |
| bpftrace | |
| Systemtap | |
| lsvpd | |
| libvpd | |
| servicelog | |
| libservicelog | |
| ppc64-diag | |
| sosreport | |
| supportutils | |
| ServiceReport |
Download
Download various fixes and updates for your system’s software, hardware, and operating system. Learn more.