Network workload

The network workload was generated by using the network performance tool uperf.

Description of `uperf`

From www.uperf.org: uperf is a network performance tool that supports modelling and replay of various networking patterns ... uperf represents the next generation benchmarking tools (like filebench) where instead of running a fixed benchmark or workload, a description (or model) of the workload is provided and the tool generates the load according to the model. By distilling the benchmark or workload into a model, you can now do various things like change the scale of the workload, change different parameters, change protocols, and so on, and analyze the effect of these changes on your model. You can also study the effect of interleaving CPU activity, or think times or the use of SSL instead of TCP among many other things.

`uperf` profiles

uperf¹ uses a profile in XML format to define the desired characteristics of the workload that will run. An example of a profile describing a test that sends a 1-byte request, and receives a 1-byte response over a single connection:


<?xml version="1.0"?> 
<profile name="TCP_RR"> 
  <group nprocs="1"> 
    <transaction iterations="1"> 
      <flowop type="accept" options="remotehost=10.12.37.2 protocol=tcp tcp_nodelay"/> 
    </transaction> 
    <transaction duration="300"> 
      <flowop type="write" options="size=1"/> 
      <flowop type="read" options="size=1"/> 
    </transaction> 
    <transaction iterations="1"> 
      <flowop type="disconnect" />
    </transaction>         
   </group> 
</profile>

For more details about all the fields available in the uperf profile, refer to the www.uperf.org Web page.

As described above, uperf supports a wide variety of options and is highly configurable permitting the ability to model or simulate practically any network behavior desired.

Workload Configurations

The test methodology used includes workloads that have been divided into two high level categories, transactional and streaming workloads.

Workload categories

A transactional workload is comprised of two parts, a (send) request followed by (receiving) a response. This Request-and-Response (RR) pattern is typical of what is seen by web servers as users interact with websites using web browsers. The payload sizes for these RR patterns are relatively small.

Requests are typically in the range from a few bytes to simulate mouse-clicks up to a few hundred bytes to represent larger URLs or form data entered and sent by a user.
Responses are typically in tens of kilobytes (KB) to deliver web pages and up to a few megabytes (MB) to deliver the embedded images typically associated with most web content.
The ratio of RR (send/receive) payload sizes are typically in the 1:100 to 1:1,000 range. With ratios in this range, the workload is considered bi-directional which defines the meaning of transactional.

Streaming workloads (STR) tend to simulate the load characteristics that many Enterprise or SMB servers experience when supporting operations such as backup/restore, large file transfers and other content delivery services. Streaming workloads, although comprised of a Request-and-Response, are considered uni-directional because the Request-and-Response ratio can be well over 1:1,000,000 or higher. A small request can trigger responses that are many gigabytes or more in size.

Workload types

Each workload category, Request-and-Response (RR) and Streaming (STR), is further divided in to separately-defined workload types that each possess discrete characteristics.

RR testing will be split into three workload types:

Small packet / high transactional workloads, exclusively 1 byte requests and 1 byte responses
Nominal payload size transactional workloads, 200 byte requests and 1000 byte responses
Large payload transactional workloads, 1000 byte requests and 30KB responses

STR testing will be split into two workload types:

Streaming reads, 30 KB payloads
Streaming writes, 30 KB payloads

The intention is that these workload types should map reasonably closely to the specific characteristics of user workloads.

Simulating Users

In addition to the 5 basic workload types (3 RR and 2 STR), its also necessary to simulate the effects of users. This means having to generate the load of a single user and scale this up to reflect the load of many users. In uperf this can be achieved by running each workload one or more times concurrently. uperf has a several ways to multiply workload concurrency. The method used to specify concurrency (used for the test in this paper) is to use an optional parameter in the group statement.

The parameter nprocs= specifies the number of concurrent processes to create.
Each process executes the transactions defined in the group.
If all the defined transactions are considered to represent the activity of a single user, then the nprocs value specified effectively simulates the number of users you choose.

Workload tests

For each of the 5 workload types (3 RR and 3 STR) described in the previous topic Workload types, 4 discrete levels of concurrency were chosen to simulate increasing levels of load from additional users (1, 10, 50, 250 users) resulting in a total of 20 different tests.

Later in this paper (starting with Figure 1), various graphs showing performance data are included. Each graph lists each of the 20 different tests on the X-axis. The names for each of the tests have the form:

For transaction tests:

{category}{1c}-{requestsize}x{responsesize}--{users}

Where {category} is RR for the transaction Request-and-Response tests, {1c} stands for using just 1 uperf client for each uperf server, {requestsize} is the number of bytes sent by the client to the server, {responsesize} is the number of bytes in the response by the server to the client and {users} represents the total number of concurrent users (or processes) generating the overall load.

For example: rr1c-200x1000--10 describes a Request-and-Response test sending a 200 byte request and receiving a 1000 byte response being generated each by 10 concurent users.

For streaming tests:

{category}-{read|write}x{payloadsize}--{users}

Where {category} is STR for Streaming tests, {read|write} denotes which direction relative to the client data flows, {payloadsize} is the number of bytes in each datagram and {users} represents the total number of concurrent users (or processes) generating the overall load.

For example: str-readx30k--50 describes a streaming test read of 30KB datagrams being generated each by 50 concurrent users.

`uperf` pairs

The uperf implementation uses a Master / Slave model. The Master and Slave relationship can also be thought of as a Client and Server. When uperf runs the Client (Master) initiates communication with the Server (Slave). The Client invocation includes a parameter to the test definition profile. After connecting, the Client sends the Server a version of the test definition.

Since uperf requires both roles of client and server, we chose to assign each role to a separate KVM guest. Each uperf client has a unique uperf server to use. We named the association of a KVM guest client to its KVM guest server counterpart a uperf pair. A uperf pair forms the smallest aggregation of resources that defines the building block used to scale our testing configurations. Testing starts with a single uperf pair and then adds more pairs to increase and scale the load. Each step in the scale doubles the load of the previous step. The steps used 1, 2, 4 and 8 uperf pairs. All uperf pairs will run the exact same workloads and are expected to perform and behave similarly.

In addition, recall from the previous topic Workload tests, for each step in the number of uperf pairs tested, each of the 5 workload types (3 RR and 2 STR) run. Each workload type is run with 4 different levels of concurrency 1, 10, 50 and 250, where each value of concurrency equates to a single user.

So as the number of uperf pairs increase, so does the load of the KVM host(s), so does the number of KVM guests and so does the number of simulated users. With 8 uperf pairs, each using two KVM guests, one for the client and one for the server, using a total of 16 KVM guests, the define workload types will each simulate up to 2000 concurrent users.

The association of which KVM guests are used to compose a uperf pair differs depending on the KVM host configuration. In the single KVM host configuration, the uperf pairs are assembled from KVM guests running on that same KVM host.

Figure 1. Single KVM host `uperf` pair to KVM guest mappings

KVM guest 1 and 2 (light grey color) are used to form the first uperf pair. The next uperf pair uses the next sequential KVM guests 3 and 4 (light blue color) and so on.

Figure 2. `uperf` pair to KVM guest mappings when multiple KVM hosts are used

For the multiple host configuration, the uperf pairs are assembled from a KVM guest residing on a different KVM hosts. Here KVM guest 1 (in light grey) on both hosts are used to form a uperf pair. As with the single KVM host configuration, the next KVM guest from each host (in light blue, yellow and red) is used to form the next uperf pair and so on. Unlike the single host configuration which handles both clients and servers, the multiple host configurations ends up with all the KVM uperf clients running on one KVM host while all the KVM uperf servers run on the other. The goal was to spread each uperf pair across the KVM hosts.

In the multiple host configuration, for test measurement purposes, each host used a separate network interface which was configured from a separate OSA adapter (see the red connector arrows in Figure 2) to ensure traffic flowed from one LPAR to the other LPAR through the hardware switch, to evaluate the network path and behavior seen when communicating with an external system.

¹ uperf - A network performance tool