By: Aravinda Prasad.
Problem determination is
definitely a key area when it comes to systems administrators
(sysadmins). Sysadmins tend to spend hours debugging and trying to
find out what is wrong with the system. Engineers at the IBM Linux
Technology Center around the world are working on ways to simplify the
experience of sysadmins in managing IBM systems.
An outcome of such
an effort is the upcoming facilities like Light Path Diagnostics,
improved diagnostic tools and other features in IBM PowerLinux, which
integrates well with the existing Reliability Availability and
Serviceability (RAS) capabilities. IBM believes that such facilities will help sysadmins to perform the administration tasks easily and
In this article, we emphasize the PowerLinux RAS advantage for
sysadmins in determining and resolving the problem from the
administrator point of view.
A PowerLinux sysadmin
receives a notification alert of a serviceable event. The sysadmin
knowing that the service log infrastructure of RAS on PowerLinux is
capable of sending such notifications, logs into the system and
checks the service log event using the servicelog
tool to get more details about the serviceable event. The detailed
log by servicelog mentions
that one of the Ethernet cards has gone bad, giving additional
information about the location code, device serial number etc.
firstname.lastname@example.org ~]# servicelog --dump
Servicelog ID: 27
Log Timestamp: Fri Nov 30 10:44:02 2012
Event Timestamp: Fri Nov 30 10:44:02 2012
Update Timestamp: Fri Nov 30 10:44:02 2012
Type: Operating System Event
Severity: 6 (ERROR)
Node Name: ras.ibm.com
Reference Code: BF778E00
Serviceable Event: Yes
Predictive Event: No
Disposition: 1 (Unrecoverable)
Call Home Status: 1 (Call Home Candidate)
Kernel Version: #1 SMP Wed Jun 13 18:19:27 EDT 2012
Message forwarded from syslog:
Fri Nov 30 10:44:02 ras kernel: e1000e 0001:00:01.0: Invalid MAC Address
Description: The MAC address read from the adapter's EEPROM is not a valid Ethernet
Action: 1. Execute diagnostics on the adapter, using "ethtool -t".
2. Check the EEPROM level on the failing adapter.
3. Replace the adapter.
<< Callout 1 >>
Procedure Id: see explain_syslog
The clever sysadmin,
knowing the RAS capabilities behind this entire setup, recalls that
the OS running on the PowerLinux server which detected the bad
Ethernet card has logged an error in /var/log/messages,
which was converted into service log event by syslog_to_svclog
tool by logging the event into service log database. The service log
database upon receiving a serviceable event has sent the
notification. The sysadmin also quickly recalls that the serviceable
events are not only restricted to Ethernet devices, but are also
supported on SCSI enclosures for which events are logged to service
log database by diag_encl tool
and RTAS related events, which are logged into the database by
The sysadmin orders a new
Ethernet card after collecting the required information like model,
serial number etc of the bad Ethernet card with the help of the new
-l flag to lscfg
command, which takes location code, which was logged in service log,
as input and prints VPD (Vital Product Data) information.
email@example.com ~]# lscfg -vl U78A5.001.WIH8464-P1
0001:00:01.0 eth1 ethernet U78A5.001.WIH8464-P1
Port 2 - IBM 2 PORT 10/100/1000
Base-TX PCI-X Adapter (14107910)
Machine Type-Model........82546GB Gigabit Ethernet Controller
The sysadmin is also very
happy to know that with the new light path diagnostics facility (coming in RHEL and SLES service pack updates in the future) and the
service log notifier would have notified the light path diagnostics
subsystem lp_diag about the
bad Ethernet card and the light path infrastructure would have
enabled the fault indicator for the bad Ethernet card slot helping in
easy identification of the physical location of the Ethernet card.
The sysadmin replaces the
bad part by identifying the slot with the help of fault indicator
LEDs. The hot-plug facility automatically identifies and initializes
the newly plugged card. The sysadmin closes the serviceable event
using log_repair_action tool
and the service log facility upon closure of serviceable event
notifies the light path infrastructure to turn off the fault
indicators. The sysadmin now updates the VPD using vpdupdate
tool to reflect the changes in the hardware.
The sysadmin appreciates
the RAS capabilities of PowerLinux and its seamless integration with
the OS, which helped in quickly identifying and resolving the
problem. The sysadmin checks for new notifications knowing that the
PowerLinux RAS is not just restricted to identifying faulty devices
but is capable of lot more things and provides many service and
For more information about service and productivity (aka RAS) tools for your PowerLinux system, see the related article
in the Linux Information Center.
By: Carlos Seo.
The IBM Advance Toolchain for PowerLinux is a set of open source
development tools and runtime libraries that allow users to take leading
edge advantage of IBM's latest POWER hardware features on Linux. A new
update release is now available, and it includes the following:
- Time zone automatically set to system's time zone.
- New advance-toolchain-at6.0-selinux rpm, which sets the correct labels for key files so the Advance Toolchain works when SELinux is enabled.
- New ldconfig wrapper (installed in /usr/sbin/) that wraps up the ldconfig of the system and all installed AT's.
- GDB for remote debugging now included in the cross compiler.
- The following fixes included in GCC:
- The following fixes included in GDB:
- The following fixes included in OProfile:
- Fix to address over-counting user context lost samples.
- Fix to record samples from forked processes.
Please let us know if you have any questions about this release.
By: Bill Buros.
A new IBM Redbooks publication is now available which provides great advice on optimizing application codes for POWER7 and POWER7+ systems
Chapter 1. Optimization and tuning on IBM POWER7 and IBM POWER7+
Chapter 2. The POWER7 processor
Chapter 3. The POWER Hypervisor
Chapter 4. AIX
Chapter 5. Linux
Chapter 6. Compilers and optimization tools for C, C++, and Fortran
Chapter 7. Java
Chapter 8. DB2
Chapter 9. WebSphere
Appendix A. Analyzing malloc usage under AIX
Appendix B. Performance tooling and empirical performance analysis
Appendix C. POWER7 optimization and tuning with third-party applications
provides practical guidelines for understanding the tools, approaches, and tuning tips for AIX and Linux operating system based products.
The material covers the early stages of lightweight tuning, through deployment guidelines, and evolving to deep performance optimization guidelines.
From the Abstract:This publication provides advice and technical information about optimizing and tuning application code to run on systems that are based on the IBM POWER7® and POWER7+™ processors. This advice is drawn from application optimization efforts across many different types of code that runs under the IBM AIX® and Linux operating systems, focusing on the more pervasive performance opportunities that are identified, and how to capitalize on them. The technical information was developed by a set of domain experts at IBM.
The focus of this book is to gather the right technical information, and lay out simple guidance for optimizing code performance on the IBM POWER7 and POWER7+ systems that run the AIX or Linux operating systems. This book contains a large amount of straightforward performance optimization that can be performed with minimal effort and without previous experience or in-depth knowledge. This optimization work can:
- Improve the performance of the application that is being optimized for the POWER7 system
- Carry over improvements to systems that are based on related processor chips
- Improve performance on other platforms
By: Bill Buros.
In November 2012, IBM launched a new POWER7+ enabled IBM Flex System p260 compute node. For the marketing and system details, see the IBM PureFlex Systems page
. Here on the technical community, we wanted to take a minute to reference the "formal benchmark publishes" that are run and submitted to organizations like SPEC.org. Performance proof points like published benchmarks are a great way to keep score of performance improvements, not only with the hardware systems, but as in our case, the software stacks and enabling software pieces that are required for the best performance on any system.
For POWER systems, our performance teams focus on compilers, Java, and toolchain libraries to best optimize the performance. Best practice guides like the recently announced IBM Redbook on POWER7 and POWER7+ Optimization and Tuning Guide
are generally based on extensive work in understanding workloads like the SPEC workloads, and also of course working directly with many customer applications.
For this announce cycle of this new compute node, we took advantage of the latest SUSE Linux Enterprise 11 Service Pack 2
operating system, and published performance benchmarks using the IBM XL Compilers, the IBM Advance Toolchain, and IBM Java.
With each performance benchmark, there are straight-forward ideas, flags, hints, and practical examples hidden in the benchmark disclosure information which are great examples of the process of the optimizing of applications and workloads. These results demonstrate to our Power customers that Power systems continue to deliver leading performance, and the technical means to achieve these results is simple to take advantage of.
In the coming weeks, we'll work to highlight some of these basic techniques, ideas, and tuning tips from the actual runs of each benchmark.
Modified on by jhopper
zswap" is discussed, with some initial performance data provided to demonstrate the potential benefits for a system (partition or guest) which has constrained memory and is beginning to swap memory pages to disk. The technique improves the throughput of a system, while significantly reducing the disk I/O activity normally associated with page swapping. We also explore how zswap works in conjunction with the new compression accelerator feature of the POWER7+ processor to potentially improve the system throughput even more than software compression alone.
This article is a good example of the ongoing collaboration that occurs in the Linux open-source community. New implementations are proposed, discussed, debated, refined and updated across developers, community members, interested customers, and performance teams. Here on the PowerLinux technical community, we are working to highlight more of these examples of work-in-progress from the broader Linux community. These proposals are applicable to both x86 systems and Power systems, so examples shown below cover both realms.
What is zswap?
Zswap is a new lightweight backend framework that takes pages that are in the process of being swapped out and attempts to compress them and store them in a RAM-based memory pool. Aside from a small reserved portion intended for very low-memory situations, this zswap pool is not pre-allocated, it grows on demand and the max size is user-configurable. Zswap leverages an existing frontend already in mainline called frontswap. The zswap/frontswap process intercepts the normal swap path before the page is actually swapped out, so the existing swap page selection algorithms are unchanged. Zswap also introduces key functionality that automatically evicts pages from the zswap pool to a swap device when the zswap pool is full. This prevents stale pages from filling up the pool.
The zswap patches have been submitted to the Linux Kernel Mailing List
(lkml) for review, you can view them in this post
Instructions for building a zswap-enabled kernel on a system installed with Fedora 17 can be found on this wiki
What are the benefits?
When a page is compressed and stored in a RAM-based memory pool instead of actually being swapped out to a swap device, this results in a significant I/O reduction and in some cases can significantly improve workload performance. The same is true when a page is "swapped back in" - retrieving the desired page from the in-memory zswap pool and decompressing it can result in performance improvements and I/O reductions compared to actually retrieving the page from a swap device.
Using the SPECjbb2005 workload for our engineering tests, we gathered some performance data to show the benefits of zswap. SPECjbb2005 uses a Java™ benchmark that evaluates server performance and calculates a throughput metric called "bops" (business operations per second). To find out more about this benchmark or see the latest official results, see the SPEC web site
. Note that the following results are not tuned for optimal performance and should not be considered official benchmark results for the system, but rather results obtained for research purposes. We liked this benchmark for this use case because we could more carefully control the amount of active memory being used in increments.
The SPECjbb2005 workload ramps up a specified number of "warehouses", or units of stored data, during the run. The number of warehouses is a user-controlled setting that is configured depending on the number of threads available to the JVM. As the benchmark increases the number of warehouses throughout the run, the system utilization level increases. A bops score is reported for each warehouse run. For this work, we focused on the bops score from the warehouse that keeps the system about 50% utilized. We also increased the default runtime for each warehouse to 5 minutes since swapping can be bursty and a longer runtime helps to achieve more consistent results.
For these results, the system was assigned 2 cores, 10 GB of memory, and a 20 GB swap device. A single JVM was created for the SPECjbb2005 runs, using IBM Java. First, a baseline measurement was taken where normal swapping activity occurred, then a run with zswap enabled was measured to show the benefits of zswap. We gathered results on both a Power7+ system and an x86 system to observe the performance impacts on different architecture types. The mpstat, vmstat, and iostat profilers from the sysstat package were used to record CPU utilization, memory usage, and I/O statistics. We would recommend taking advantage of the lpcpu
package to gather these data points.
To demonstrate the performance effects of swapping and compression, we started with a JVM heap size that could be covered by available memory, and then increased the JVM heap size in increments until we were well beyond the amount of free memory, which forced swapping and/or compression to occur. We recorded the throughput metric and swap rate at each data point to measure the impacts as the workload demanded more and more pages.
Settting up zswap
With the current implementation, zswap is enabled by this kernel boot parameter:
We looked at several new in-kernel stats to determine the characteristics of compression during the run. The metrics used were as follows:
pool_pages - number pages backing the compressed memory pool
reject_compress_poor - reject pages due to poor compression policy (cumulative) (see max_compressed_page_size sysfs attribute)
reject_zsmalloc_fail - rejected pages due to zsmalloc failure (cumulative)
reject_kmemcache_fail - rejected pages due to kmem failure (cumulative)
reject_tmppage_fail - rejected pages due to tmppage failure (cumulative)
reject_flush_attempted - reject flush attempted (cumulative)
reject_flush_fail - reject flush failed (cumulative)
stored_pages - number of compressed pages stored in zswap
outstanding_flushes - the number of pages queued to be written back
flushed_pages - the number of pages written back from zswap to the swap device (cumulative)
saved_by_flush - the number of stores that succeeded after an initial failure due to reclaim by flushing pages to the swap device
pool_limit_hit - the zswap pool limit has been reached
There are two user-configurable zswap attributes:
max_pool_percent - the maximum percentage of memory that the compressed pool can occupy
max_compressed_page_size - the maximum size of an acceptable compressed page. Any pages that do not compress to be less than or equal to this size will be rejected (i.e. sent to the actual swap device)
failed_stores - how many store attempts have failed (cumulative)
loads - how many loads were attempted (all should succeed) (cumulative)
succ_stores - how many store attempts have succeeded (cumulative)
invalidates - how many invalidates were attempted (cumulative)
To observe performance and swapping behavior once the zswap pool becomes full, we set the max_pool_percent parameter to 20 - this means that zswap can use up to 20% of the 10GB of total memory.
The following graphs represent the SPECjbb2005 performance and swap rate for a run using the normal swapping mechanism.
Note that as "available" memory is used up around 10GB, the performance falls off very quickly (the Blue Line) and normal page swapping (the Red Line) to disk increases. The behavior is consistent both on Power7+ and x86 systems.
Power7+ baseline results:
x86 baseline results:
As you can see, performance dramatically decreased once the system started swapping and continued to level off as the JVM heap was increased.
The following graphs represent the SPECjbb2005 performance and swap rate for a run when zswap is enabled. In these cases, memory is now being compressed, which significantly reduces the need to go to disk for swapped pages. Performance of the workload (the blue line) still drops off but not as sharply, but more importantly the system load on I/O drops dramatically.
Power7+ with zswap compression:
x86 with zswap compression:
As you can see, the swap (I/O) rate was dramatically reduced. This is because most pages were compressed and stored in the zswap pool instead of swapped to disk, and taken from the zswap pool and decompressed instead of swapped in from disk when the page was requested again. The small amount of "real" swapping that occurred is due to the fact that some pages compressed poorly - which means they did not meet a user-defined max compressed page size - and were therefore swapped out to the disk, and/or stale pages were evicted from the zswap pool.
Looking at the zswap metrics for each run, we can calculate some interesting statistics from this set of runs - keep in mind the base page size is different between Power (64K pages) and x86 (4K pages), which accounts for some of the different behaviour. Also note that we set the max zswap pool size to 20% of total memory for these runs, as mentioned above - this max setting can be adjusted as needed. On Power, the average zswap compression ratio was 4.3. On x86, the average zswap compression ratio was 3.6. For the Power runs, we saw entries for "pool_limit_hit" starting at the 17 GB data point. For the x86 runs, the pool limit was hit earlier - starting at the 15.5 GB data point. For the Power runs, at most the zswap pool stored 139,759 pages. For the x86 runs, the max number of stored pages was 1,914,720. This means all those pages were compressed and stored in the zswap pool, rather than being swapped out to disk, which results in the performance improvements seen here.
POWER7+ hardware acceleration
The POWER7+ processor introduces new onboard hardware assist accelerators that offer memory compression and decompression capabilities, which can provide significant performance advantages over software compression. As an example, the system specifications for the IBM Flex System p260 and p460 Compute Nodes
mention the "Memory Expansion acceleration" feature of the processor.
The current zswap implementation is designed to work with these hardware accelerators when they are available, allowing for either software compression or hardware compression. When a user enables zswap and the hardware accelerator, zswap simply passes the pages to be compressed or decompressed off to the accelerator instead of performing the work in software. Here we demonstrate the performance advantages that can result from leveraging the POWER7+ on-chip memory compression accelerator.
POWER7+ hardware compression results
Because the hardware accelerator speeds up compression, looking at the zswap metrics we observed that there were more store and load requests in a given amount of time, which filled up the zswap pool faster than a software compression run. Because of this behavior, we set the max_pool_percent parameter to 30 for the hardware compression runs - this means that zswap can use up to 30% of the 10GB of total memory.
The following graph represents the SPECjbb2005 performance and swap rate for a run when zswap and the POWER7+ hardware accelerator are enabled. In this case, memory is now being compressed in hardware instead of software, and this results in a significant performance improvement. Performance of the workload (the blue line) still drops off, but even less sharply than the zswap software compression case, and the system load on I/O still remains very low.
Power7+ hardware compression:
As you can see, the swap (I/O) rate was dramatically reduced. This is because most pages were compressed using the hardware accelerator and stored in the zswap pool instead of swapped to disk, and taken from the zswap pool and decompressed in the hardware accelerator instead of swapped in from disk when the page was requested again. The small amount of "real" swapping that occurred is due to the fact that some pages compressed poorly - which means they did not meet a user-defined max compressed page size - and were therefore swapped out to the disk, and/or stale pages were evicted from the zswap pool.
The following graphs show the performance comparison between normal swapping and zswap compression, and the POWER7+ graph also includes the hardware compression results, showing that the hardware accelerator provides even more performance advantages over software compression alone:
Power7+ performance comparison:
x86 performance comparison:
As you can see, this workload shows up to a 40% performance improvement in some cases after the heap size exceeds available memory when zswap is enabled, and the POWER7+ results show that the hardware accelerator can improve the performance by up to 60% in some cases compared to the baseline performance.
Swap (I/O) comparison
The following graphs show the swap rate comparison between normal swapping and zswap compression, and the POWER7+ graph includes the hardware compression results, showing that the hardware accelerator also reduces the swap rate dramatically. Swap rates are dramatically reduced on both architectures when zswap is enabled, including the POWER7+ hardware compression results.
Power7+ swap I/O comparison:
x86 swap I/O comparison:
The new zswap implementation can improve performance while reducing swap I/O , which can also have positive effects on other partitions that share the same I/O bus. The new POWER7+ on-chip memory compression accelerator can be leveraged to provide performance improvements while still keeping swap I/O very low.
By: Jessica Erber-Stark.
Mel Beckman at PowerITPro recently published a very nice overview of IBM PowerLinux systems. Mel states: "IBM’s PowerLinux
brings big-iron reliability and scalability to Linux while still
providing cost-competitive, Linux-friendly, entry-level server
Check out the article "PowerLinux Pumps Up Linux Apps
" on PowerITPro to get Mel's take on:
- Application portability to PowerLinux systems
- Virtualization performance advantages
- PowerLinux 7R2 system summary
- Big Data Analytics solution, using Hadoop
- Power systems features key to Linux users
By: Wainer dos Santos Moschetta
The Linux Trace Toolkit Next Generation (LTTng) is a toolkit for trace and visualization of events produced by both the Linux kernel and applications (user-space).
Version 2.x offers several improvements in relation to previous 1.x series, including:
- Introduction of a new trace file format called CTF(Common Trace Format)
- Beyond default kernel events, it allows trace of user-space applications
- New implementation of ring buffer algorithm
- Able to attach context information to events
Building & installing
: In this section I will show how to build LTTng from source although it is already delivered in some Linux distributions, for instance, in RPM packages for pcc64 in OpenSuse Linux and Fedora 17. So you might want to use a stable version from your chosen distro or just build latest yourself.
Its source code is version'ed in different git trees, one for each component (see table 1
).Table 1. LTTng source components
Library that implement RCU (Read-Copy-Update)mechanism in user-space
Provides main client that control execution of LTTng
Kernel modules for tracing kernel events.
Enable tracing in user-space
# Cloning LTTng source code
$ git clone git://git.lttng.org/lttng-tools.git
$ git clone git://git.lttng.org/lttng-modules.git
$ git clone git://git.lttng.org/lttng-ust.git
$ git clone git://git.lttng.org/userspace-rcu.git
Requirements to build the components:
- Common to all are GNU Autotools and Libtool
- At the time this post is written, requires kernel >= 2.6.38 to build lttng-modules
- Some components rely on third-party libraries so take a look at README file in each component
# Buiding and installing liburcu library
$ cd userspace-rcu
$ make install
$ sudo ldconfig
# Bulding and installing lttng-ust
$ cd lttng-ust
$ make install
# Building and installing lttng-tools
$ cd lttng-tools
$ make install
# Building and installing lttng-modules
$ cd lttng-modules
$ sudo make modules_install
$ sudo depmod -a
There is a post-installation procedure that must be done in order to allow non-root users to (transparently) start LTTng daemon for monitoring kernel events. These users must be added in the tracing group as shown below (in Fedora 17):
# Create group if it doesn't exist
$ sudo groupadd -r tracing
# Add <username> to the group
$ sudo usermod -aG tracing <username>
Managing a trace session
LTTng tracing relies on concept of session. The table 2 shows commands to manage a session lifecycle.
Table 2. Commands to manage tracing session
Create a session with NAME. By default, tracing files are held in ~/lttng-traces but it may be redefined with option -o
Used to switch between sessions, setting current to NAME.
Destroy the session with NAME. The option -a or –all may be used to destroy all the sessions.
Show information regarding session with NAME or list all sessions if NAME is omitted.
Below is an example of tracing session with LTTng where it is monitored system behavior after booting up an instance of Firefox browser.
$ sudo lttng-sessiond &
$ lttng create demo_session
Session demo_session created.
Traces will be written in /home/wainersm/lttng-traces/demo_session-20121030-233238
$ lttng start
Tracing started for session demo_session
$ firefox &
$ lttng stop
Waiting for data availability
Tracing stopped for session demo_session
$ lttng destroy demo_session
Session demo_session destroyed
Notice from example above lttng-sessiond (daemon) is initialized with sudo (i.e. root). The trace session may be started/stopped several times, allowing you to some parameters (i.e. add/remove events and context information).
Managing events in a trace session
The tool is able to trace events emitted by kernel and application which are made available through several infrastructures just like Kprobe, Ftrace, tracepoints and also processor PMU (Performance Monitoring Unit). Therefore, lttng has a set of commands to manage events to be monitored as shown in table 3.
Table 3. Commands to manage events tracing
list [-k] [-u]
List available events of kernel (-k) and user-space (-u)
Add events to the session. The [options]
filter which events should be traced. For example, “-a -k --syscall” is used to add syscall events.
Remove events to the session. The [options]
filter which events should be removed.
add-context -t [type]
Add information context to an event. As of this post is written, [type] may be pid, procname,
prio, nice, vpid, tid, pthread_id, vtid, ppid, vppid as well as available PMU events.
Below listing show how to display all kernel events available for tracing.
$ lttng list -k
timer_init (loglevel: TRACE_EMERG (0)) (type: tracepoint)
timer_start (loglevel: TRACE_EMERG (0)) (type: tracepoint)
timer_expire_entry (loglevel: TRACE_EMERG (0)) (type: tracepoint)
timer_expire_exit (loglevel: TRACE_EMERG (0)) (type: tracepoint)
timer_cancel (loglevel: TRACE_EMERG (0)) (type: tracepoint)
hrtimer_init (loglevel: TRACE_EMERG (0)) (type: tracepoint)
hrtimer_start (loglevel: TRACE_EMERG (0)) (type: tracepoint)
hrtimer_expire_entry (loglevel: TRACE_EMERG (0)) (type: tracepoint)
hrtimer_expire_exit (loglevel: TRACE_EMERG (0)) (type: tracepoint)
hrtimer_cancel (loglevel: TRACE_EMERG (0)) (type: tracepoint)
itimer_state (loglevel: TRACE_EMERG (0)) (type: tracepoint)
itimer_expire (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_start (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_end (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_process_state (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_file_descriptor (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_vm_map (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_network_interface (loglevel: TRACE_EMERG (0)) (type: tracepoint)
lttng_statedump_interrupt (loglevel: TRACE_EMERG (0)) (type: tracepoint)
signal_generate (loglevel: TRACE_EMERG (0)) (type: tracepoint)
signal_deliver (loglevel: TRACE_EMERG (0)) (type: tracepoint)
signal_overflow_fail (loglevel: TRACE_EMERG (0)) (type: tracepoint)
signal_lose_info (loglevel: TRACE_EMERG (0)) (type: tracepoint)
sched_kthread_stop (loglevel: TRACE_EMERG (0)) (type: tracepoint)
sched_kthread_stop_ret (loglevel: TRACE_EMERG (0)) (type: tracepoint)
sched_wakeup (loglevel: TRACE_EMERG (0)) (type: tracepoint)
sched_wakeup_new (loglevel: TRACE_EMERG (0)) (type: tracepoint)
The example below shows how to add all kernel events for tracing in a session:
$ lttng enable-event -a -k
All kernel events are enabled in channel channel0
Analyzing gathered data
Usually a tracing session generates a huge amount of data, which makes it unreadable by humans. So LTTng project provides some visualizing tools to easy analysis of traced data, as follows:
- It is a library and command-line tool able to read and convert trace data stored in different formats (including CTF).
- LTTV Viewer
It is a graphical visualizer (GTK+) able to make analysis of trace data produced by LTT, but it currently doesn't support CTF.
- Eclipse LTTng
The Eclipse LTTng is a set of plug-ins able to visualize LTTng gathered data. Currently it is maintained by a pretty active community.
Below is an screenshot of Eclipse view with a loaded LTTng tracing file.
- LTTng project - https://lttng.org
- Eclipse LTTng plug-in - http://wiki.eclipse.org/Linux_Tools_Project/LTTng2/User_Guide
By: Timothy Noonan.
Recently, Red Hat announced that Red Hat Enterprise Linux 6 conforms with the USGv6 Host profile. See Red Hat Ready to Serve U.S. Government with IPv6 Conformity. Likewise, we can assert that IBM POWER7 Systems™, as well as SUSE Linux Enterprise Server 11, have met the National Institute of Standards and Technology’s USGv6 evaluation requirements.
Why is this important? U.S. government agencies are migrating to Internet Protocol Version 6 (IPv6), as the pool of IPv4 addresses is being depleted. The move to IPv6 also allows scalability of government networks to take advantage of new technologies such as cloud computing. The National Institute of Standards and Technology (NIST) provides the technical standards and testing program, USGv6, to certify products as conforming to IPv6. This certification is required to be considered by the U.S. government for new IT purchases. POWER7 Systems, Red Hat Enterprise Linux 6, and SUSE Linux Enterprise Server 11 conform with the USGv6 profile, making PowerLinux™ a trusted platform.
To achieve conformity, IBM POWER7 Systems, Red Hat Enterprise Linux 6, and SUSE Linux Enterprise Server 11 underwent rigorous testing by the University of New Hampshire’s InterOperability Laboratory (UNH-IOL). UNH-IOL is one of two accredited third-party labs approved for USGv6 testing. Testing included addressing and security protocol requirements. For specific UNH-IOL test results for POWER7 Systems, Red Hat Enterprise Linux 6, and SUSE Linux Enterprise Server 11, see the following:
· IBM: https://www.iol.unh.edu/services/testing/ipv6/usgv6tested.php?company=2443&type
· Red Hat: https://www.iol.unh.edu/services/testing/ipv6/usgv6tested.php?company=6164&type
· SUSE: https://www.iol.unh.edu/services/testing/ipv6/usgv6tested.php?company=105&type
Modified on by jerberstark
By: Breno Leitão.
This tutorial explains how to create a RAID device on PowerLinux machines using an array of disks. This step by step tutorial includes identifying the disks, formatting them, combining them in a RAID array, creating a partition and, finally, creating a file system on this partition.
The PowerLinux machines support a RAID (Redundant Array of Independent Disks) card. A RAID card is a device that combines a set of physical disks into a logical unit to achieve a better performance and more data redundancy. A RAID array could also be created by the operating system (known as Software-based RAID), and it consumes some CPU cycles from the machine to manage and control the disk array. On the other side, a RAID card, as the one embedded on PowerLinux machines, offers a Hardware-based RAID, meaning that the operations on the disk array are offloaded to the RAID card, not utilizing CPU cycles managing the disks, thus, being more efficient than Software-based RAID solutions.
The RAID adapters on PowerLinux machines support several different RAID protection levels. Depending of the protection level, you might have different benefits, as potentially achieving a higher data transfer, a smaller latency and data redundancy when compared to a single big disk. You might also want to combine these benefits all together in the same disk array, which is also a possible depending on the RAID protection level.
Using RAID is usually a trade-off between disk space and redundancy, so, depending on the RAID protection level, part of the disk space is used to save redundant data, thus, part of the disk space is not available for general usage. The real space available to the users varies from 50% to 100% of the total disk space.
The RAID protection levels supported by most PowerLinux RAID adapters are:
RAID 0: On this configuration, a block of data is striped in different disks on the array, so, the read/write operations on the disks could happen in parallel on the disks in the array. On this configuration, there is no fault tolerance i.e., if a disk fail, the whole data is lost. This level usually improves the data throughput.
Requires at least 1 disk. In a single disk RAID 0 configuration, no striping occurs.
RAID 1: On this level, the data is written at the same time on 2 disks. As both disks have the same data, a read operation will occur on the disks that has the smaller latency. On this case, if one disk fails, the whole data will continue be preserved on the other disk. Once the other disk is replaced, the RAID would be reconstructed. As expected, this level improves the data redundancy.
Requires at least 2 disks. Note: The PowerLinux RAID adapters refer to this RAID level as RAID 10.
RAID 5: On this level, the data and parity bits are spread all across the disks. If one disk fails, then all the data is still available, once the original data could be reconstructed using parity data from the other disks. If more than one disk fails, then the data will be corrupted. (If a disks fail happen, the operations may happen in a slower fashion, since the data being accessed is on the lost disk, then the data will need to be reconstructed.). One disk's worth of capacity is consumed for redundancy information for the array.
Requires at least 3 disks.
RAID 6: The same as RAID 5, but up to 2 disk can fail, and the data will still be preserved. Two disk's worth of capacity is consumed for redundancy information for the array.
Requires at least 4 disks.
RAID 10: This level combines the best concepts of RAID 1 and RAID 0. On RAID 10, the data is striped on a set of hard disks, and these hard disks are mirror to another set of hard disks. So, you have a very good throughput and also a data redundancy.
Requires at least 2 disks for mirroring and striping. In a two disk RAID 10 configuration, no striping occurs.
RAID card on PowerLinux
The PowerLinux machines come with an embedded RAID controller that supports up to 6 SAS or SDD disks and RAID levels 0, 5, 6 and 10 on machines 7R1
. RAID 1 is also supported as a subset of RAID 10, since the RAID controller allows you do create a RAID 10 with just two disks as part of the array. In this case, since there is no enough disk to mirror and strip, the data just gets mirrored on both disk, instead of striped, which is what a RAID 1 does. So, the cards also supports, in a different form, a RAID 1.
On Linux, the device is listed as a PCI-E device named "IBM Obsidian-E PCI-E SCSI controller ".
In order to manage this controller, there is a set of tools on the packaged called iprutils that helps the system administrator to create, configure and delete disks and arrays using the RAID controller.
The iprutils package provides the iprconfig application. The iprconfig is the tool responsible for configuring the RAID devices on your machine, and will be the tool that will be covered below.
The device driver
The device driver for the PowerLinux RAID controller is named ipr.ko. It's currently part of the Linux kernel and comes with all the supported Linux Distros for PowerLinux. So, it's recommended to always use the last supported kernel version from the distro in order to take the best from you PowerLinux machine.
Using iprconfig tool
The iprconfig tool is a very easy application to use. It's a text-based (TUI) application that helps you to list and configure the RAID controller and the disks on your system. iprconfig also allows you to check the controller log, and upgrade the card firmware. An example of the iprconfig screen could be seen at Figure 1.
Now on, a step-by-step tutorial will show how to format a set of disks, combine them together in a RAID mode, create a partition over this array and, then, create a file system over this partition. This is an easy process that might take less than 30 minutes to be accomplished. For this tutorial, we are going to create a RAID 5 device, meaning that the array will have the data mirrored and striped over the disks arrays.
You can use iprconfig just as a command line option, in this case, you need to pass the parameters you want in the command line. For example, in order to see what are the RAID levels supported on a controller (sg5
), the following command should be used:
# iprconfig -c query-supported-raid-levels sg5
0 5 10 6
Formatting a disk in RAID mode.
In order to use a disk as part of a disk array, it needs to be formatted specifically for being part of RAID, also known as, advanced function
mode. If the disks is not formatted properly, you can not add it to a RAID array. In order to format a disk in advanced function mode, the following steps should be followed:
Launch the iprconfig tool on a console.
Select the menu Work with disk array (as shown in Figure 1)
In order to do it, press 2 and then enter.
Select Format device for RAID function
In order to do it, press 5 and then enter.
Then select the disks you want to format (all of those that are going to take part of the disk array) and continue. (As shown in figure 2)
In order to select the disks, you must use the up/down arrows, and press 1 to select the devices you want to select.
Wait until the disks are formatted as shown in Figure 3. (It takes some minutes until the disks are formatted.)
Figure 3: Disks being formatted
Creating an RAID disk array
As explained above, in order to add a disk into the RAID array, the disk need to be formatted in RAID mode. Once the disks are formatted in RAID mode, they can be available to be added to a disk array, and the RAID device could be created. You might want to create as many RAID devices you want, and give them a set of disks. Let' s go through the process of creating an array device. It is recommended when creating a RAID array to format the devices for RAID as described in the previous step, then create the disk array following the steps below without exiting the iprconfig
tool. Exiting the tool will result in the loss of knowledge that the disks have just been zeroed and it will take longer for the array to initialize.
Launch the iprconfig tool.
Select Work with disk arrays, as shown in Figure 1.
Select Create a disk array.
Select the controller you want (You might have just one controller), as shown in Figure 4.
Select the disk that will be part of the array, as shown in Figure 5
Press '1' over the disks you want to select.
Select the RAID type, as shown in Figure 6.
Go into the Protect Level and press 'c' to change the RAID level.
Select the disks that will take part of that disk array
It may take a while to have the array created.
Figure 4: Selecting the RAID controller
Figure 6: Select the RAID type
Start using the RAID array
Once you had the RAID array created, it becomes a block device as any other block device on the system. You can create a partition on the device, make a file system on it, and start using. The next steps will help you to create a partition and a file system on the array partition. On this tutorial, I will create just one partition using the whole array and format it using EXT4 file system, as shown below:
Figure 8: Creating a EXT4 file system on the partition over the RAID device
Once the file system is created in the partition, you can use this partition as a traditional file system, i.e, mount it on a directory and start using it. All the RAID operation, as managing, striping, checksumming or mirroring the data will be offloaded to the RAID card, and happen transparently.
By: Bill Buros.
Across developerWorks, there are a number of related communities, blogs, and individuals focusing on technologies making things easier and simpler for customers. The most recent example is a blog which focuses on notes from the IBM Systems Electronic Support
The team posted an entry on the YUM! Tools for PowerLinux servers
- which nicely explains some of the background work that goes on with offerings like this. Check it out! There's some interesting news about work in progress and what to look for.
By: Bill Buros.
An interesting new User Group is launching in Austin Tx this spring. Ben Collins provides an introduction over on Linux.com
for the "founding meeting" coming up in March 2013. Looking forward to seeing more of what's happening around the broader arena of POWER processors and specifically with Linux.
"The PLUG's founding meeting aims to bring together some of the top companies, developers and users to a common place to see, touch and discuss the future of Power platforms. By bringing together engineers and speakers to discuss what Power Architecture and Linux are doing in this day and age of embedded systems and cloud computing, along with on-site products to enable and dazzle attendees, the goal is about awareness and building community around this famed processor. "
The PowerLinux Users Group (PLUG) should be a great place to network with other participants in the industry...
William Mapp is driving some interesting things in this area.
Several of us here in Austin will be participating and learning. I'll post updates, news, and announcements here on the community on a regular basis. If you're in the Austin area, check it out!
By: Bill Buros.
The Linux Performance customer profiling utility
(lpcpu) has been updated with some enhancements and clarifications for easier use. The package is available for download here.
The package is designed to capture performance information for analysis in one script invocation, gathering the profiler and system information into a packaged tarball.
The modifications in this update include:
- Add capture of sysctl.conf to see explicit modifications of sysctl values
- Update script termination processes to not create an essentially empty tarball package when the requested profilers are not installed
- Add new lseth utility processing to the scripts
- Update and clarify error messages for the sysstat package (which has iostat, mpstat, and sar profilers).
The package can format (graphed in charts) many of data outputs from common performance tools like iostat, meminfo, mpstat, sar, and vmstat. By simply unpacking the lpcpu.tar.bz2 package on an x86 system, together with the generated profiler tar ball, the information is presented in an easier format for "seeing" a lot of data.
On my Linux system, I have my system setup with a web server, so I unpack and process data in the /var/www/html directory. First I download and unpack the latest lpcpu.tar.bz2 package to my x86 system. Separately, I have gathered the performance data from another test system.
$ cd /home/wmb
$ tar -jxf lpcpu.tar.bz2
$ cd /var/www/html/lpcpu/
$ tar -jxf lpcpu_data.mytestsystem.default.2013-01-25_1232.tar.bz2
$ cd lpcpu_data.mytestsystem.default.2013-01-25_1232/
$ ./postprocess.sh /home/wmb/lpcpu/
This generates a summary.html file which can be opened in a browser window.
We recommend you give it a try. Feel free to ask questions on the message board. Thanks as always to Karl Rister, Tom Lendacky, and others who are continuing to contribute changes, enhancements, ideas and updates to the package.
By: Anirban Chatterjee.
month, the PowerLinux team is announcing the biggest technology change in PowerLinux servers
since we launched, with the availability of our POWER7+ chips on the platform.
POWER7+ is more than just a speed bump on our POWER7
processors. Our hardware teams have
worked hard to increase the flexibility of the platform, bringing
balanced performance increases while keeping other factors like energy
consumption at bay. Some examples:
doubled the memory capacity in servers like the 7R1 and 7R2. We’ve also doubled the number of virtual
machines you can allocate to a single processor core. This means we’ve dramatically increased
the system’s flexibility when it comes to deploying virtualized workloads
… in many cases, this will eliminate memory as the gating factor, allowing
users to drive utilization rates even higher and boost system efficiency.
reduced the feature size in the chips from 45 nm to 32 nm. This not just a simple die shrink,
though … with every shrink, the chip team has to work even harder to
ensure the computational and thermal stability of the chip while driving
higher clock speeds. In PowerLinux
servers like the 7R2, the new chips now top out at 4.2 GHz.
- Because we have more available chip real estate now that we’ve shrunk the die
size, we’ve bumped up the L3 cache from 4 MB to 10 MB. This significantly boosts performance in
workloads that are memory dependent, like Java and big data applications.
feature additions to POWER7+ allow us to improve chip reliability and
boost energy savings. We’ve added
self-healing capabilities and automatic processor reinitialization to
increase system robustness, and we’ve introduced a new energy saving mode
that saves 45% more energy than before when the processor is idle.
The new performance capabilities afforded by POWER7+ enable
some pretty interesting possibilities when it comes to reducing costs. For example, we’ve found that people
typically need just two dual-socket (16 core) PowerLinux 7R2s to do what it
would take three dual-socket (16 core) Xeon servers to do. Given the already competitive pricing on the
7R2s, this means that you can potentially save north of 40% on your costs of acquisition by choosing PowerLinux.
These changes make PowerLinux an ideal platform for the most
critical workloads your business runs today, like your customer facing web
applications, or your ERP system.
Customers like Kwik Fit (PDF) and IT Informatik (PDF) are already realizing the benefits. Click the links to read the
case studies on these customers.
But PowerLinux is also a great platform for today’s growth
workloads, like development and deployment of mobile and web applications. To make it easier for businesses to create
and launch these types of client experiences, we’re introducing a new solution for WebSphere mobile and web applications that leverages the lightweight
WebSphere Liberty Profile software. This
is a light, easily reconfigured web app environment that makes it simple for
developers to test and deploy applications.
As 2013 progresses, we'll continue to bring you more
announcements that improve the PowerLinux platform's ability to reduce costs while improving
efficiency, enabling new and growth workloads, and giving you a better overall
By: Maynard Johnson and Beth Taylor.
Finding performance bottlenecks in applications that you develop can be a daunting task. But with the right tools and a little guidance, it's easier than you might think. OProfile is a performance analysis tool set for Linux systems. A new collection of topics has been published in the Linux for IBM Systems Information Center to help application developers get started with OProfile on Power Systems™ servers running Linux. Getting started with OProfile on PowerLinux
introduces the new operf
profiler available with OProfile 0.9.8. The topics also give helpful usage tips for OProfile's "legacy" profiling tool, opcontrol
. In most cases, you'll find that operf
is a much simpler alternative to opcontrol
and other profilers available.
The information center topic collection includes the simplest path for installing and setting up OProfile, taking advantage of IBM® Advance Toolchain for PowerLinux™. It also includes basic scenarios for dealing with problems such as lost samples caused by kernel throttling, and buffer overflow.
The SourceForge OProfile project web site
is another good source for getting help and information. Resources there include FAQs, an OProfile user manual, mailing list archives, and the IRC channel. As lead maintainer for the OProfile open source software project, I (Maynard Johnson) am watching the mailing list and IRC channel to make sure that your OProfile questions are being answered. I also review community-submitted patches and handle bug reports. So please be sure to give OProfile a try and keep in touch on the PowerLinux Community message board
with any questions that you have.
By: Wainer dos Santos Moschetta
The IBM SDK for PowerLinux provides you with an all-in-one solution for
developing software on PowerLinux servers. It integrates the IBM Eclipse
SDK with several open source and IBM tools in a single IDE (Integrated
Development Environment), allowing easy development, build, debug,
analyze and packaging of software for PowerLinux servers.
The PowerLinux SDK 1.3.0 is now available. It provides you with many enhancements that ease remote development and new features for the migration and performance analysis tools. Please, check out a complete "what's new" list here
That new version may be downloaded from here
For more information, documentation and assistance, visit the IBM SDK for PowerLinux landing page here