 |




The Linux SSL performance tests show the benefit of a Crypto
Express2 Feature (CEX2) configured for SSL acceleration (CEX2A)
in combination with the Central Processor Assist for Cryptographic
Function (CPACF).
Our tests compare the performance with and without System z hardware
support. The throughput numbers for a CEX2 configured as a cryptographic
coprocessor (CEX2C) are also available for reference. However
a CEX2C is designed and optimized for secure encrypted transactions.
|
For the use of cryptographic services, IBM System z9
supports the cryptographic hardware functions Central
Processor Assist for Cryptographic Function (CPACF)
and Crypto Express2 (CEX2) Feature.
- CPACF
- CPACF provides support for symmetric ciphers and
secure hash algorithms (SHA) on every central processor.
Hence the potential encryption/decryption throughput
scales with the number of central processors in the
system.
- CEX2
- The two PCI-X adapters on a CEX2 feature can be
configured in two ways: Either as cryptographic Coprocessor
(CEX2C) for secure key encrypted transactions, or
as cryptographic Accelerator (CEX2A) for Secure Sockets
Layer (SSL) acceleration. A CEX2A works only in clear
key mode.
Both adapters can be of the same type, or you
can configure one adapter as CEX2A and the other
as CEX2C.
|
|
The SSL communication protocol was designed to provide
a secure communication over an open insecure network
like the Internet. OpenSSL is the Open Source
implementation of the SSL protocol. OpenSSL may use
the hardware accelerated cryptographic services to speed
up an SSL communication. Other applications or protocols
may use these cryptographic services using different
APIs (e.g. PKCS#11).
OpenSSL needs the engine ibmca to communicate
with the interface library (libICA). The
libICA library then communicates with CPACF or via the
Linux generic device driver zcrypt with
a CEX2 (if available). Hence the device driver zcrypt
must be loaded in order to use CEX2 features.
The chart below shows the typical Linux OpenSSL software/hardware
stack for our performance tests.

|
|
|
The SSL protocol for secure network connections establishes
a connection between a client and a server node with an SSL
handshake.
For data encryption, SSL uses a symmetric cipher for performance
reasons. For the SSL handshake process, an asymmetric cipher
(RSA) is used which is very cost intensive. The SSL handshake
exchanges the keys for the symmetric cipher and chooses the
symmetric cipher for the data encryption. The symmetric ciphers
DES, TDES, and AES-128 are supported through the System z
CPACF function.
A CEX2 adapter can be configured as an SSL Accelerator (CEX2A),
which gives best performance in supporting the SSL handshake
(RSA acceleration). An adapter configured as Coprocessor (CEX2C)
speeds up SSL handshakes as well, but is optimized to support
secure key encrypted transactions.
|
The test opens an SSL connection between a client and
a server node using Linux OpenSSL. It exchanges only
the minimum data portion (40B), and then closes the
connection. After each data exchange, the test scenario
opens a new connection.
This test measures the maximum SSL handshake rate (that
is, number of connections) per second.
|
|
Without any cryptographic hardware support, eight parallel
SSL connections reach their throughput limit at about
750 handshakes/sec when four CPUs are used.
When using a CEX2C, from eight parallel connections
the single adapter limit is reached at 1200 handshakes/sec.
For a CEX2A, the adapter limit of 3250 handshakes/sec
is reached with 16 parallel connections. Compared to
the numbers which were reached without hardware support,
this is more than four times as many. Using a second
CEX2A adapter can double the throughput to more than
6000 handshakes/second per CEX2A feature.
The following chart shows the CPU savings when using
CEX2 cryptographic hardware:

This chart shows the CPU load for the test with 32
parallel connections. Four CPUs are busy without any
cryptographic hardware support. However, when a CEX2A
adapter is available, only two in four CPUs are busy.
|
|
Starting with version 2.1.0, the Linux generic cryptographic
device driver (zcrypt) provides a configurable polling
thread. The polling thread queries the cryptographic
adapter (CEX2) permanently for finished cryptographic
requests.
To list the version of the currently loaded crypto
device driver, enter
cat /proc/driver/z90crypt
Configuring the polling thread:
- For the older distributions which ship device driver
version 2.1.0, the polling thread is enabled per default.
- Starting with SLES10 SP2 and RHEL 5.2, the polling
thread is disabled per default in order to save CPU
costs when performing cryptographic operations under
z/VM.
- To enable the polling thread, enter
-
modprobe z90crypt [options] poll_thread=1
for monolithic modules
modprobe ap [options] poll_thread=1
for discrete modules
- To disable the polling thread, enter
-
modprobe z90crypt [options] poll_thread=0
for monolithic modules
modprobe ap [options] poll_thread=0
for discrete modules
- When running as a guest under z/VM, APAR VM64440
is recommended (available for z/VM 5.2 and 5.3).
The results show that enabling the polling thread utilizes
best the cryptographic adapter. This advantage is especially
visible in the ranges where the adapter does not fully
utilize its limit of 3250 handshakes/sec. When the cryptographic
adapter is not processing cryptographic requests, the
polling thread sleeps.
When the polling thread is disabled, the handshake
rate may decrease, depending on the number of parallel
SSL connections. This is because finished cryptographic
requests are fetched only with the Linux kernel timer
interrupt then, which is every 1/100s. Therefore the
maximum rate ends up at 100 handshakes/sec for a single
SSL connection.
For an optimal exploitation of a cryptographic adapter
(especially when the adapter is not fully utilized),
it is recommended to turn on the zcrypt polling thread.
For Linux guests under z/VM, APAR VM64440 changes the
z/VM poll rate window from "10-1.5ms" to "2-1.5ms".
This ensures the effectiveness of the Linux zcrypt polling
thread.
The chart below demonstrates the performance improvements
for an enabled polling thread showing the possible maximum
of SSL handshakes. The less the cryptographic adapter
is utilized the greater is the benefit of the polling
thread.

However, enabling the polling thread causes a CPU cost
overhead. This has to be considered especially for Linux
guests under z/VM. But these costs only become effective
when there are outstanding cryptographic requests and
the polling thread is active.

|
- IBM System z9 (2094-S18)
- LPAR with 4 CPUs
- IBM System Storage DS8300 (2107-922)
- 1 Crypto Express2 (CEX2) Feature
- SLES10 SP1 including
- Linux generic zcrypt device driver 2.1.0
- OpenSSL 0.9.8a
- Interface Library (libICA) 1.3.7
- OpenSSL HW engine support ibmca 1.3.7
|
|
|
The Central Processor Assist for Cryptographic Function
(CPACF) runs synchronized with its corresponding central
processor. Hence, each central processor has its own CPACF on System z hardware.
The secure hash algorithms (SHA-1 and SHA-256) support is
immediately available, but you have to enable the symmetric
cipher support for CPACF explicitly. CPACF fully supports
the symmetric cyphers DES, TDES, and AES-128.
Performance tests show that enabling CPACF significantly
improves the performance of secure SSL communication.
The chart below shows the throughput improvements with enabled
CPACF for the most common symmetric ciphers:
- All fully CPACF supported ciphers (DES, TDES, AES-128)
perform nearly three times better.
- Ciphers which are not supported improve as well, because
CPACF is also backing SHA algorithms.
All tests are based on a single client/server SSL connection,
with 1000KB data exchange.

CPACF saves 80% CPU cost for DES and AES-128 cipher encryption.
For TDES, CPACF actually saves 90% CPU cost.

- IBM System z9 (2094-S18)
- IBM System Storage DS8300 (2107-922)
- LPAR with 4 CPUs
- SLES10 SP1 including
- Linux generic zcrypt device driver 2.1.0
- OpenSSL 0.9.8a
- Interface Library (libICA) 1.3.7
- OpenSSL HW engine support ibmca 1.3.7
|
|
|
Our tests with different data sizes show the best performance
benefits with the cryptographic services CPACF and CEX2A.
|
Within each secure connection, the test exchanges 20KB.
The single client/server SSL connection uses the symmetric
cipher AES-128.
|
|
Compared to a system without cryptographic hardware
support, the major improvement is achieved through the
CEX2A. Because the SSL handshake is the most cost intensive
portion in this test, the smaller the transferred data
portion of a SSL connection, the more important is the
faster processing of the SSL handshake.

The following chart indicates the reduced CPU costs
depending on the used cryptographic hardware support.

|
|
Within each secure connection, the test exchanges 1000KB.
The single client/server SSL connection uses the symmetric
cipher AES-128.
|
|
In this test the major improvement is caused by CPACF,
because the most cost intensive portion of this test
is the exchange of the encrypted data.


|
- IBM System z9 (2094-S18)
- IBM System Storage DS8300 (2107-922)
- LPAR with 4 CPUs
- SLES10 SP1 including
- Linux generic zcrypt device driver 2.1.0
- OpenSSL 0.9.8a
- Interface Library (libICA) 1.3.7
- OpenSSL HW engine support ibmca 1.3.7
|
|
|
The s390 in-kernel crypto functions use CPACF supported ciphers
and secure hash algorithms to implement the interfaces defined
by the Linux in-kernel cryptographic API.
To use the s390 specific in-kernel
crypto algorithms, the Linux kernel must be configured. The
s390 crypto support can be built into the kernel or put into
modules. For this, switch the 'Cryptographic options' in the
kernel configuration menu. Both, the s390 variants and the
generic algorithms can be configured
at the same time. Linux chooses the s390 variant when the
CPACF feature supports them. Otherwise, Linux chooses the
generic option.
The following table shows the in-kernel crypto usable ciphers
and secure hash functions:
| Machine type |
Supported crypto algorithms |
| z890/z990 |
DES, TDES
SHA-1 |
| System z9 |
DES, TDES
AES-128
SHA-1, SHA-256 |
| System z10 |
DES, TDES
AES-128, AES-192, AES-256
SHA-1, SHA-256, SHA-384, SHA-512
|
|
IPsec is an extension of the IP protocol providing
security to IP and upper-layer protocols.
IPsec uses in-kernel cryptographic algorithms.The IPsec
protocols use secure hash algorithms and symmetric ciphers.
The charts below show results for three different
network workload types with a secure IP connection between
the client and server.
The workload types are:
| rr200x1k |
emulates an online transaction |
| crr64x8k |
emulates a website request |
| rr200x32k |
emulates a database query |
For each workload type three sequences were measured:
| no IPsec |
IPsec is not used |
| IPsec - no CPACF |
HW support for in-kernel cryptography is disabled
in the kernel |
| IPsec - CPACF |
HW support for in-kernel cryptography is enabled
in the kernel |
This chart shows normalized transaction rates. When
doing IPsec, there is an additional overhead to secure
the network connection, which decreases the number of
achievable transactions. Enabling the HW support for
in-kernel crypto dramatically absorbs this decrease.

Along with the transaction decrease the response times
get worse as well. The response time is up to 7 times
longer, when the cryptographic algorithms must be calculated
in software.

Calculating in-kernel cryptographic operations in software
with the generic algorithms leads to 13 times higher
CPU costs compared to CPACF.
|
- IBM System z9 (2094-S18)
- IBM System Storage DS8300 (2107-922)
- LPAR with 4 CPUs
- Linux kernel 2.6.24
- ipsec-tools-0.6.7
|
|
|
 |
|
 |