Performance tuning

Performance optimizations are strongly dependent on your workload and on your hardware and software environment.

The suggestions that follow might or might not be suitable for your setup. Verify any settings against a suitable benchmark to avoid adverse tuning effects.
Tip: Use ethtool to verify that intended settings, like hardware offloads, are active. Also use ethtool to explore possible optimizations with transient settings before making them persistent through distribution tools.

Hardware offloads

By default, hardware offloads are enabled for checksums of both inbound and outbound packets and for TCP segmentation (TSO).

Use the ethtool command with the -k option to verify that the offloads are enabled.

Example:
# ethtool -k eno0
Features for eno0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
... 

In the command output, [fixed] labels settings that you cannot change.

Receive Packet Steering

For Linux® instances with multiple CPUs, you can use Receive Packet Steering (RPS) to distribute incoming packets more evenly across specific CPUs.

By using hot caches, RPS can increase performance, especially with workloads that open numerous connections and transfer small packets.

This setting applies to network interfaces of all directly attached PCI functions, including PCI functions as VFIO pass-through devices on KVM guests.

For more information about RPS with PCI network adapter, see Exploring the performance of network adapters for Linux on IBM Z® [PDF].

Receive Flow Steering

Receive Flow Steering (RFS) is an extension of RPS. For selecting CPUs, RFS takes into account where the application runs that consumes an inbound packet.

For more information about RFS, see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-networking-configuration_tools?_gl=1*raeva4*_ga*MTUwOTc1NTkwNC4xNzQyMjkyNjA5*_ga_FYECCCS21D*czE3NDk1MzYxMzMkbzc1JGcxJHQxNzQ5NTQ4OTE1JGo2MCRsMCRoMA..#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_tools-Configuring_Receive_Flow_Steering_RFS.

Enable Striding RQ

Help to use inbound buffers more efficiently, for example for inbound streaming workloads.

Example:
# ethtool --set-priv-flags eno0 rx_striding_rq on
# ethtool --show-priv-flags eno0
Private flags for eno0:
...
rx_striding_rq : on
... 

Interrupt Moderation

Interrupt Moderation controls the waiting behavior on inbound packets before sending an interrupt. The settings manage the tradeoff between CPU cycles against latency and throughput. For most workloads, the default is a good compromise.

You can modify the waiting time with the rx-usecs and tx-usecs settings. You can modify the buffer count with the rx-frames and tx-frames settings.

Example:
# ethtool --show-priv-flags eno0
Coalesce parameters for eno0:
Adaptive RX: on TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 8
rx-frames: 128
...
# ethtool -C eno0 rx-usecs 4095 rx-frames 65535 tx-usecs 4095 tx-frames 65535

Larger values mean saving CPU cycles by waiting longer and potentially collecting more inbound packets with each interrupt. Larger values also mean more latency. The default, Adaptive RX is optimized for latency. Only increase the waiting times and buffer counts if you want to save CPU cycles at the expense of latency.

For more information about interrupt moderation, see https://support.mellanox.com/s/article/understanding-interrupt-moderation.