A memory-related issue unfolded using performance tools for AIX
A case study
Upon the launch of a specific application, we noticed that the memory was getting completely drained out.This application was the only process running in the system.There were also issues doing manual transactions, as there were frequent time outs due to no response from the server.
Problem analysis and resolution
The initial thought was that the memory drain was related to the application. However, after using the performance tools, we changed our minds.
In Indian mythology, there is a reference of a medicine called Sanjeevini, which can cure almost all problems. topas is the Sanjeevini of performance tools, as it gives an overview of all of the system resources and is used as a starting point for performance analysis. It helped in this scenario by providing the first stepping stone to attack the problem. When we first observed the problem, topas showed the total percentage of computational memory as 99%. This was also the time when the team was observing issues while doing manual transactions. The team was performing the test runs related to only one application.
So, as a next step of investigation, the application was stopped, and topas was used once again to check the status of computational memory. This time the computational memory was 78%, an alarming figure considering no other application was running, and it provided a new direction to our train of thought.
The other tools that were of great help in the investigation were svmon, vmstat, and vmo.
The svmon command is a performance-measurement tool that captures and analyzes the snapshot of virtual memory.
svmon -G command displays the following global memory report. It shows the used and free sizes of real and virtual memory in the system.
Figure 1. svmon global memory report
This memory report shows that out of 1998848 pages (page size=4K) of total memory, there were 1996881 pages in use and 1967 pages free.
However, in the above report, the number of free pages is 1967, but that does not infer any memory-related constraint or memory bottleneck, because to improve I/O performance, AIX® tries to use the maximum amount of free memory for file caching if it's not explicitly requested by athe pplication or kernel. Moreover, the report states that out of the total paging space size of 3145728 pages, the Inuse paging space is 99556 pages.
svmon -P command displays the memory usage statistics for all the processes.
Figure 2. Process memory utilization report
The process memory utilization report shows that a Java™ process has an Inuse memory of 166690 pages. Upon adding the Inuse memory used by all the processes in this report, we observed that the sum total of memory in use by different processes was significantly less than the total memory of that system. This observation was also an indication that the memory is not the limiting factor.
Another performance-monitoring tool, vmstat, was used for reporting statistics about kernel threads, virtual memory, disks, and CPU activity.
# vmstat 2 10
Figure 3. vmstat report
This vmstat report shows that there is approximately 114MB of free memory. Moreover, there are no page-outs getting reported. However, the last five entries do state that there was one blocked thread and that there is some I/O wait happening in the system.
The performance parameters were also checked using the vmo command. The vmo command also displays and adjusts the Virtual Memory Manager parameters.
Figure 4. vmo command output
While observing the output of the vmo command, we noticed that
lgpg_regions is set to 256 and
is 16777216 (or 64MB). AIX treats large pages as pinned memory and does not provide
paging support for large pages. The data of an application that is backed by large
pages remains in physical memory until the application completes. In our case, this
means that 256 pages of size 64MB are pinned and are set reserved.
If you look back at Figure 1, the output of svmon –G, you can see that for a large page size of 16MB, the pool size is 256 and since for large pages the memory is pinned, 256 pages of 16MB each cannot be paged out. Looking at the output of Figure 2, the svmon –P command, you can see that the first line of the output has the last column named 16MB and its corresponding value is 'N'. This means that the process with PID 266372 is not using 16MB pages; that is, the application is not making use of large pages that have been reserved by the vmo command. Similarly, when other running processes were checked in the same report, we observed that none of the applications was using the large page support. Hence, 256*16=4096MB is simply blocked and not used at all by any running process.
From Figure 3, you can see that the total physical memory is 7808MB. As is clear from the above analysis, out of that, 4096MB is reserved for large pages. It implies that all the running applications can make use of only 7808-4096= 3712MB of memory. Since a big chunk of memory was blocked and left unused, that was the reason memory was getting exhausted completely.
Hence, either you should not block the memory for large pages or you should ensure that the application must utilize that well.
Configuring the large pages
The application or the system can be configured to use large pages.
Configuring the application for large pages
The blpdata flag is used with the ldedit command to enable an executable file to request large pages. More details regarding it can be found in the Related topics section.
Configuring the system for large pages
By default, the system does not allocate any memory to the large page physical memory pool. The vmo command can be used to configure the size of the large page physical memory pool using the lgpg_regions and lgpg_size options.
LDR_CNTRL environment variable is used so that the application's data and heap segments should use large pages.
More details regarding the usage of the vmo command for configuring large pages and LDR_CONTROL variable can be found in the Related topics section.
In the above case study, changing the value of lgpg_regions to a nominal value and making the application use the large pages helped in resolving the memory problem and helped in increasing the overall system performance, as well.
Deciding the value for lgpg_regions
A general recommendation cannot be given for the value of lgpg_regions and even other performance-related tuning parameters; yet a nominal value can be decided by identifying the workload on the system. In the above case study, when the LDR_CNTRL variable was exported, the application started making use of large pages. After this, the vmstat command was used again.
# vmstat –l 10
This command displays the large page section-related statistics, at 10-second intervals.
Figure 5. Large page statistics
This has a section for large pages, which provides the detailed statistics related to large pages. The two fields titled
flp correspond to active large pages and free large pages,
respectively. For this case study, the value of
alp was not going beyond 80. So, the judicious way of utilizing
memory is to lower the value of lgpg_regions from 256 to a value of, say, 100. This
helped in returning a significant chunk of memory to a pool of 4K memory. This helped
in increasing the available memory for other running applications and hence improving
the overall performance of the system. In short, tuning the parameters after
identifying the workload of the system is the key to the efficient usage of system resources.
The main purpose for large page usage is to improve system performance for high-performance computing applications or any memory-access-intensive application that uses large amounts of virtual memory. Large page is useful but it should be used in specific scenarios. It is a special-purpose performance improvement feature and is not recommended for general use.
- The Performance management guide discusses Large pages in AIX.
- The AIX Performance Tools Handbook discusses AIX performance tools.
- AIX 6.1 information center is your source for technical information about the AIX operating system