IBM Support

SEAPROC Consuming High CPU on VIOS

Troubleshooting


Problem

High CPU consumption by seaproc on Virtual I/O Server (VIOS).

Symptom

seaproc consuming high CPU on VIOS.
This technote discusses the most frequently seen causes for seaproc to consume high CPU utilization.

Cause

seaproc is a CPU intensive process. CPU used by seaproc process is directly related to network load. When there is network traffic in the Shared Ethernet Adapter (SEA), CPU utilization by seaproc is expected to be increased. Therefore, if seaproc is intermittently consuming CPU during high network utilization, that may be considered a normal behavior.
An SEA runs 100% in CPU, therefore, making a processor intensive process. Streaming large packet workloads, like file transfers or data backup/restore, requires lower interrupt load than other workloads that generate a lot of small packets. Hence, it is expected that the CPU utilized by seaproc process would be high during peak network transfer hours under those circumstances.

Environment

PowerVM VIOS 2.2 or higher

Diagnosing The Problem

To monitor CPU% utilization on VIOS, login as padmin user and run topas command. Watch closely as the process names may fluctuate putting the one with the highest CPU% at the top of the list.
If seaproc leads with the highest CPU%, next, determine which client(s) are driving up the CPU utilization using seastat command when you see seaproc utilization is high. This will require SEA accounting to be enabled. It is disabled by default.
$ chdev -dev ent4 -attr accounting=enabled
$ seastat -d ent4|tee seastat.ent4.out
where ent4, in this example, is the SEA bridging traffic for the client LPAR (hostname, virt01) that is driving up CPU utilization.
Sample output:
...
==============================================================
Advanced Statistics for SEA
Device Name: ent4
...
==============================================================
MAC: 06:CE:C5:B3:27:04
----------------------

VLAN: 111
VLAN Priority: 0
Hostname: virt01.dfw.ibm.com
IP: 9.20.175.151

Transmit Statistics:                   Receive Statistics:
--------------------                   -------------------
Packets: 194                           Packets: 2243
Bytes: 20814                           Bytes: 313421
...

$ chdev -dev ent4 -attr accounting=disabled

Next, determine if the client(s) are having high network traffic at the time seaproc is consuming high CPU to ascertain if it may be justifiable.
If it is determined that the client(s) network traffic is minimum or low, evaluate the most common probable causes discussed next.

Resolving The Problem


Probable Cause #1--Known issues


    1. seaproc consuming much CPU 2. SEA thread lock contention prevents scale up
      Fixed in 2.2.1.3 and 2.2.1.4 with IV07193.
    3. SEA config failure may cause VIOS to hang
      Fixed in 2.2.1.8 with IV48671; and in 2.2.3.0 and 2.2.3.3 with IV45320.
    4. perfprovider might cause performance impact on VIOS nodes of SSP Cluster
      Fixed in 2.2.3.4, 2.2.3.60, 2.2.3.70, and 2.2.4.0 or higher with IV60982.
    5. Slow mbufs release causes high CPU.
      Fixed in VIOS 2.2.3.60 and 2.2.4.0 or higher with IV75682.

    Recommendation
    IBM recommends updating the VIOS to the ioslevel containing the APAR. VIOS fixes are available at FixCentral.

Probable Cause #2 - Over-utilized VIOS resources


Probable Cause #3--large_send and large_receive is enabled on the underlying physical adapter used by the SEA but disabled on the SEA itself.

    Recommendation
    Most physical adapters used by the SEA have largesend/large_receive enabled. In such cases, it is recommended to also enable these attributes on the SEA to improve CPU throughput and minimize CPU issues.
    With largesend/large_receive enabled on the SEA, the VIOS client can transmit and receive large data, which will get segmented by the real adapter instead of the kernel to fit its MTU size. This saves on CPU load and increases network throughput.

    Checking VIOS settings

    To list SEAs:
    $ lsdev -type adapter|grep -i shared


    To determine the real adapter used by the SEA and attribute values for largesend, large_receive:
    $ lsdev -dev <SEA_ent#> -attr
    In this example, largesend is already enabled (1), and large_receive is disabled (no).

    To list the largesend, large_receive attribute values of the physical adapter (real_adapter) used by the SEA, run: 
    $ lsdev -dev <real_adapter_ent#>|grep large
    The output may vary depending on the adapter type, i.e




    Next is an example of how to change the SEA attribute(s) of an SEA configured in failover auto mode. There are a couple of ways to do this. The difference is that one requires the VIOS to be rebooted (Option 1) for the change to take effect. It is recommended to first make the change on the Backup SEA since it is not bridging any network traffic.

    Option 1 - Requires VIOS reboot for change to take effect
    On Backup SEA
    Run chdev command using -perm option as shown below. This will cause the change to be made permanently in the ODM, and the change will take effect (in the kernel) in the next reboot.
      $ chdev -dev <SEA_ent#> -attr large_receive=yes -perm   (to enable large_receive)
      $ chdev -dev <SEA_ent#> -attr largesend=yes -perm       (to enable largesend)
      $ shutdown -restart

    Next, repeat above commands on the Primary SEA. Unlike on the Backup SEA/VIOS, the reboot of the Primary SEA will automatically cause a failover to the Backup SEA when the VIOS goes down, and it will automatically failback when the Primary SEA is back Available.

    Note 1: If the SEA failover mode is set to sharing, both VIO Servers are expected to be bridging the client's network. So it makes no difference which VIOS you start with since a failover/failback will automatically take place during each VIOS reboot.

    Note 2: If the SEA in question is a Standalone SEA on a single VIOS environment, a maintenance window will need to be scheduled for all clients to be shutdown prior to rebooting the VIOS.

      Option 2 - Requires SEA to be put in Defined state
      If the SEA is configured in failover (auto or sharing) mode, the SEA attributes can be changed without a VIOS reboot. However, this will require the SEA to be put in Defined state first. Consequently, if there is a network interface on top of the SEA, it will need to be removed prior to putting the SEA in Defined state. As previously discussed in Option 1, if the failover mode is set to auto, it is best practice to implement the change on the Backup SEA first. The following example uses SEA failover configured in auto mode.

      On Backup SEA
      To check if SEA has an network interface, run:
      $ lstcpip

      In this case, SEA, ent4, has a network interface (en4), which should be removed in order to avoid a "Device busy" error when changing the SEA attribute.

      To remove the network interface, run:
      $ chdev -dev <SEA en#> -attr state=down
      $ chdev -dev <SEA en#> -attr state=detach

      At this point the SEA is ready to be put in Defined state to change the attribute(s):
      $ rmdev -dev <SEA_ent#> -ucfg                           (to put SEA in Defined state)
      $ chdev -dev <SEA_ent#> -attr large_receive=yes         (to enable large_receive)
      $ chdev -dev <SEA_ent#> -attr largesend=yes -perm       (to enable largesend)
      $ cfgdev -dev <SEA_ent#>                                (to make SEA back Available)
      $ lsdev -type adapter|grep -i shared                    (to ensure SEA is back Available)

      To recreate the network interface on the SEA, run cfgassist or mktcpip command:
      $ cfgassist > VIOS TCP/IP Configuration > Select network interface
      or
      $ mktcpip -hostname <VIOS_hostname> -interface <SEA_en#> -inetaddr <IP> -netmask <network_mask> -gateway <default_gateway> -nsrvaddr <Name_server_IP> -nsrvdomain <domain_name> -start

      Once the change is implemented on Backup SEA, the primary SEA will need to be "failed over" to the backup SEA in order to make the change, and then failback using chdev command after the change has been implement.

      On Primary SEA
      $ chdev -dev <SEA_ent#> -attr ha_mode=standby   (to initiate SEA failover to Backup SEA)
      Repeat procedure used on Backup SEA to put SEA in Defined state and change the SEA attribues.
      $ chdev -dev <SEA_ent#> -attr ha_mode=auto      (to fail back to primary SEA)


    Probable Cause #4--Mismatched MTU size on VIOs vs client(s)

    Recommendation


    The VIO clients have an MTU size to be equal or lower to the VIO servers. To check MTU sizes:
    $ lstcpip          (On VIOS)
    # netstat -in      (On AIX)


    Probable Cause #5--Virtual Ethernet Adapter (VEA) used by SEA has small buffer overruns

    Recommendation/Example


    Check Receive Buffers of VEAs used by the SEA using entstat command for the SEA, i.e.
    $ entstat -all ent9|more   or
    $ entstat -all ent9|tee <filename>

    For example, in the below output, data shows SEA ent19 VEA ent9 has server small buffer overrruns:

    Receive Information
      Receive Buffers
        Buffer Type              Tiny    Small   Medium    Large     Huge
        Min Buffers               512      512      128       24       24
        Max Buffers              2048     2048      256       64       64
        Allocated                 513     2043      128       24       24
        Registered                511     1338      128       24       24
        History
       >>>Max Allocated           532   >>2048<<    128       24       24
          Lowest Registered       502      448      128       24       24

    The commands to fix it are:
    $ chdev -perm -dev ent9 -attr min_buf_small=2048
    $ chdev -perm -dev ent9 -attr max_buf_small=4096
    $ chdev -perm -dev ent9 -attr min_buf_tiny=1024
    $ chdev -perm -dev ent9 -attr max_buf_tiny=2048
    $ shutdown -restart      (reboot for changes to take effect)


    Probable Cause #6--Downlevel firmware

    Recommendation


    Review firmware level for the machine type using the FLRT website.


    Probable Cause #7--Hypervisor Send Failures

    Recommendation


    Check SEA for Hypervisor Send Failures like the ones below by running
    $ entstat -all <SEA_ent#>|more
      Sample output
      ...
      --------------------------------------------------------------
      Statistics for adapters in the Shared Ethernet Adapter ent6
      --------------------------------------------------------------
      VLAN Ids :
          ent2: 1
      Real Side Statistics:
          Packets received: 78958212158
          Packets bridged: 78934548717
          Packets consumed: 96851114
          Packets fragmented: 0
          Packets transmitted: 68123622548
          Packets dropped: 68507

      On the Virtual ent2 adapter we see:
      Hypervisor Send Failures: 7459806
        Receiver Failures: 7459806
        Send Errors: 0
      Hypervisor Receive Failures: 0
      ...

      ...
      --------------------------------------------------------------
      Statistics for adapters in the Shared Ethernet Adapter ent7
      --------------------------------------------------------------

      VLAN Ids :
          ent3: 10
      Real Side Statistics:
          Packets received: 1047857
          Packets bridged: 840586
          Packets consumed: 1045144
          Packets fragmented: 0
          Packets transmitted: 620004
          Packets dropped: 68507

      On the Virtual ent3 adapter we see:
      Hypervisor Send Failures: 525495    
        Receiver Failures: 525495       
        Send Errors: 0
      Hypervisor Receive Failures: 0
      ...
    Note: It is a good idea to compare those with the packets dropped on physical ent adapters which may indicate a network outage external to the server.


    Probable Cause #8--CPU folding is enabled

    Recommendation


    CPU folding is not supported on VIOS. To check current value, run:
    $ oem_setup_env
    # schedo -L|grep vpm_fold_policy|awk '{print $2}' >This will return a numeric value of 1 (enabled) or 4 (disabled, which is the default value starting at VIOS 2.1)
    or
    # schedo -FL|more >Check that vpm_fold_policy has 4 under CUR column
    NAME                  CUR   DEF    BOOT   MIN    MAX    UNIT      TYPE
         DEPENDENCIES
    ----------------------------------------------------------------------
    ...
    ----------------------------------------------------------------------
    vpm_fold_policy       4     1      4      0      15                  D
    ----------------------------------------------------------------------
    ...


    Probable Cause #9--Large network configuration

    Recommendation


    If the VIOS in question has multiple SEAs but not all are in use, remove all unused network adapters. Then check if the CPU goes down or performance improves. Alternatively, you can configure an IP address over all unused interfaces as it may hog CPU activity or configure them with 0.0.0.0 address.


    Probable Cause #10--SEA accounting is enabled.

    Recommendation


    Disable SEA accounting. To check SEA attribute, run
    $ lsdev -dev ent4 -attr|grep accounting (where ent4 is the SEA)
    accounting    enabled  Enable per-client accounting of network statistics

    To disable it:
    $ chdev -dev ent4 -attr accounting=disabled


    When all of the above have been ruled out
    To expedite resolution, ensure all of the above probable causes are carefully evaluated and ruled out. If problem continues at that point, or for Problem Source Identification, gather perfpmr and contact your local IBM SupportLine Representative and request to speak to the Performance team for initial problem determination.
    Additionally, VIOS snap data may be useful depending on perfpmr findings.

    [{"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":null,"Platform":[{"code":"","label":"Other"}],"Version":"2.2.4;2.2.3;2.2.2;2.2.1;2.2.0","Edition":"Enterprise;Express;Standard","Line of Business":{"code":"LOB57","label":"Power"}}]

    Document Information

    Modified date:
    19 February 2022

    UID

    isg3T1023505