Skip to main content



Cryptographic support

developerWorks

   Results   |   Presentations   |   More

Technical overview
SSL handshake performance tests *Updated* June 2008
CPACF performance tests
CEX2 and CPACF performance tests
In-kernel crypto exploiting CPACF *New* June 2008




The Linux SSL performance tests show the benefit of a Crypto Express2 Feature (CEX2) configured for SSL acceleration (CEX2A) in combination with the Central Processor Assist for Cryptographic Function (CPACF).

Our tests compare the performance with and without System z hardware support. The throughput numbers for a CEX2 configured as a cryptographic coprocessor (CEX2C) are also available for reference. However a CEX2C is designed and optimized for secure encrypted transactions.

Technical overview
 
System z hardware

For the use of cryptographic services, IBM System z9 supports the cryptographic hardware functions Central Processor Assist for Cryptographic Function (CPACF) and Crypto Express2 (CEX2) Feature.

CPACF
CPACF provides support for symmetric ciphers and secure hash algorithms (SHA) on every central processor. Hence the potential encryption/decryption throughput scales with the number of central processors in the system.
CEX2
The two PCI-X adapters on a CEX2 feature can be configured in two ways: Either as cryptographic Coprocessor (CEX2C) for secure key encrypted transactions, or as cryptographic Accelerator (CEX2A) for Secure Sockets Layer (SSL) acceleration. A CEX2A works only in clear key mode.

Both adapters can be of the same type, or you can configure one adapter as CEX2A and the other as CEX2C.

SSL acceleration

The SSL communication protocol was designed to provide a secure communication over an open insecure network like the Internet. OpenSSL is the Open Source implementation of the SSL protocol. OpenSSL may use the hardware accelerated cryptographic services to speed up an SSL communication. Other applications or protocols may use these cryptographic services using different APIs (e.g. PKCS#11).

OpenSSL needs the engine ibmca to communicate with the interface library (libICA). The libICA library then communicates with CPACF or via the Linux generic device driver zcrypt with a CEX2 (if available). Hence the device driver zcrypt must be loaded in order to use CEX2 features.

The chart below shows the typical Linux OpenSSL software/hardware stack for our performance tests.

Linux SSL software and hardware stack


Back to top


SSL handshake performance tests

The SSL protocol for secure network connections establishes a connection between a client and a server node with an SSL handshake.

For data encryption, SSL uses a symmetric cipher for performance reasons. For the SSL handshake process, an asymmetric cipher (RSA) is used which is very cost intensive. The SSL handshake exchanges the keys for the symmetric cipher and chooses the symmetric cipher for the data encryption. The symmetric ciphers DES, TDES, and AES-128 are supported through the System z CPACF function.

A CEX2 adapter can be configured as an SSL Accelerator (CEX2A), which gives best performance in supporting the SSL handshake (RSA acceleration). An adapter configured as Coprocessor (CEX2C) speeds up SSL handshakes as well, but is optimized to support secure key encrypted transactions.

Test scenario

The test opens an SSL connection between a client and a server node using Linux OpenSSL. It exchanges only the minimum data portion (40B), and then closes the connection. After each data exchange, the test scenario opens a new connection.

This test measures the maximum SSL handshake rate (that is, number of connections) per second.

Results

Crypto Express2 - SSL handshakes

Without any cryptographic hardware support, eight parallel SSL connections reach their throughput limit at about 750 handshakes/sec when four CPUs are used.

When using a CEX2C, from eight parallel connections the single adapter limit is reached at 1200 handshakes/sec.

For a CEX2A, the adapter limit of 3250 handshakes/sec is reached with 16 parallel connections. Compared to the numbers which were reached without hardware support, this is more than four times as many. Using a second CEX2A adapter can double the throughput to more than 6000 handshakes/second per CEX2A feature.

The following chart shows the CPU savings when using CEX2 cryptographic hardware:

Crypto Express2 - CPU load for SSL handshakes

This chart shows the CPU load for the test with 32 parallel connections. Four CPUs are busy without any cryptographic hardware support. However, when a CEX2A adapter is available, only two in four CPUs are busy.

Polling thread in the zcrypt device driver

Starting with version 2.1.0, the Linux generic cryptographic device driver (zcrypt) provides a configurable polling thread. The polling thread queries the cryptographic adapter (CEX2) permanently for finished cryptographic requests.

To list the version of the currently loaded crypto device driver, enter

  • cat /proc/driver/z90crypt

Configuring the polling thread:

  • For the older distributions which ship device driver version 2.1.0, the polling thread is enabled per default.
  • Starting with SLES10 SP2 and RHEL 5.2, the polling thread is disabled per default in order to save CPU costs when performing cryptographic operations under z/VM.
  • To enable the polling thread, enter
    • modprobe z90crypt [options] poll_thread=1 for monolithic modules
    • modprobe ap [options] poll_thread=1 for discrete modules
  • To disable the polling thread, enter
    • modprobe z90crypt [options] poll_thread=0 for monolithic modules
    • modprobe ap [options] poll_thread=0 for discrete modules
  • When running as a guest under z/VM, APAR VM64440 is recommended (available for z/VM 5.2 and 5.3).

The results show that enabling the polling thread utilizes best the cryptographic adapter. This advantage is especially visible in the ranges where the adapter does not fully utilize its limit of 3250 handshakes/sec. When the cryptographic adapter is not processing cryptographic requests, the polling thread sleeps.

When the polling thread is disabled, the handshake rate may decrease, depending on the number of parallel SSL connections. This is because finished cryptographic requests are fetched only with the Linux kernel timer interrupt then, which is every 1/100s. Therefore the maximum rate ends up at 100 handshakes/sec for a single SSL connection.

For an optimal exploitation of a cryptographic adapter (especially when the adapter is not fully utilized), it is recommended to turn on the zcrypt polling thread. For Linux guests under z/VM, APAR VM64440 changes the z/VM poll rate window from "10-1.5ms" to "2-1.5ms". This ensures the effectiveness of the Linux zcrypt polling thread.

The chart below demonstrates the performance improvements for an enabled polling thread showing the possible maximum of SSL handshakes. The less the cryptographic adapter is utilized the greater is the benefit of the polling thread.

Possible maximum 
                          of SSL handshakes

However, enabling the polling thread causes a CPU cost overhead. This has to be considered especially for Linux guests under z/VM. But these costs only become effective when there are outstanding cryptographic requests and the polling thread is active.

CPU cost overhead

Test environment
  • IBM System z9 (2094-S18)
  • LPAR with 4 CPUs
  • IBM System Storage DS8300 (2107-922)
  • 1 Crypto Express2 (CEX2) Feature
  • SLES10 SP1 including
    • Linux generic zcrypt device driver 2.1.0
    • OpenSSL 0.9.8a
    • Interface Library (libICA) 1.3.7
    • OpenSSL HW engine support ibmca 1.3.7

Back to top


CPACF performance tests

The Central Processor Assist for Cryptographic Function (CPACF) runs synchronized with its corresponding central processor. Hence, each central processor has its own CPACF on System z hardware. The secure hash algorithms (SHA-1 and SHA-256) support is immediately available, but you have to enable the symmetric cipher support for CPACF explicitly. CPACF fully supports the symmetric cyphers DES, TDES, and AES-128.

Performance tests show that enabling CPACF significantly improves the performance of secure SSL communication.

The chart below shows the throughput improvements with enabled CPACF for the most common symmetric ciphers:

  • All fully CPACF supported ciphers (DES, TDES, AES-128) perform nearly three times better.
  • Ciphers which are not supported improve as well, because CPACF is also backing SHA algorithms.

All tests are based on a single client/server SSL connection, with 1000KB data exchange.

System z9 CPACF cipher support - Normalized throughput

CPACF saves 80% CPU cost for DES and AES-128 cipher encryption. For TDES, CPACF actually saves 90% CPU cost.

System z9 CPACF cipher support - CPU cost per throughput

Test environment
  • IBM System z9 (2094-S18)
  • IBM System Storage DS8300 (2107-922)
  • LPAR with 4 CPUs
  • SLES10 SP1 including
    • Linux generic zcrypt device driver 2.1.0
    • OpenSSL 0.9.8a
    • Interface Library (libICA) 1.3.7
    • OpenSSL HW engine support ibmca 1.3.7

Back to top


CEX2 and CPACF performance tests

Our tests with different data sizes show the best performance benefits with the cryptographic services CPACF and CEX2A.

Test scenario with small data sizes

Within each secure connection, the test exchanges 20KB. The single client/server SSL connection uses the symmetric cipher AES-128.

Results with small data sizes

Compared to a system without cryptographic hardware support, the major improvement is achieved through the CEX2A. Because the SSL handshake is the most cost intensive portion in this test, the smaller the transferred data portion of a SSL connection, the more important is the faster processing of the SSL handshake.

 

System z Crypto Express2 Support - Normalized throughput for small data sizes

The following chart indicates the reduced CPU costs depending on the used cryptographic hardware support.

System z Crypto Express2 Support - CPU cost per throughput for small data sizes

Test scenario with large data sizes

Within each secure connection, the test exchanges 1000KB. The single client/server SSL connection uses the symmetric cipher AES-128.

Results with large data sizes

In this test the major improvement is caused by CPACF, because the most cost intensive portion of this test is the exchange of the encrypted data.

System z Crypto Express2 Support - Normalized throughput for large data sizes

System z Crypto Express2 Support - CPU cost per throughput for large data sizes

Test environment
  • IBM System z9 (2094-S18)
  • IBM System Storage DS8300 (2107-922)
  • LPAR with 4 CPUs
  • SLES10 SP1 including
    • Linux generic zcrypt device driver 2.1.0
    • OpenSSL 0.9.8a
    • Interface Library (libICA) 1.3.7
    • OpenSSL HW engine support ibmca 1.3.7

Back to top


In-kernel crypto exploiting CPACF

The s390 in-kernel crypto functions use CPACF supported ciphers and secure hash algorithms to implement the interfaces defined by the Linux in-kernel cryptographic API.

To use the s390 specific in-kernel crypto algorithms, the Linux kernel must be configured. The s390 crypto support can be built into the kernel or put into modules. For this, switch the 'Cryptographic options' in the kernel configuration menu. Both, the s390 variants and the generic algorithms can be configured at the same time. Linux chooses the s390 variant when the CPACF feature supports them. Otherwise, Linux chooses the generic option.

The following table shows the in-kernel crypto usable ciphers and secure hash functions:

Machine type Supported crypto algorithms
z890/z990 DES, TDES
SHA-1
System z9 DES, TDES
AES-128
SHA-1, SHA-256
System z10 DES, TDES
AES-128, AES-192, AES-256
SHA-1, SHA-256, SHA-384, SHA-512

 

Results using IPsec

IPsec is an extension of the IP protocol providing security to IP and upper-layer protocols.

IPsec uses in-kernel cryptographic algorithms.The IPsec protocols use secure hash algorithms and symmetric ciphers.

The charts below show results for three different network workload types with a secure IP connection between the client and server.

The workload types are:

rr200x1k emulates an online transaction
crr64x8k emulates a website request
rr200x32k emulates a database query

For each workload type three sequences were measured:

no IPsec IPsec is not used
IPsec - no CPACF HW support for in-kernel cryptography is disabled in the kernel
IPsec - CPACF HW support for in-kernel cryptography is enabled in the kernel

This chart shows normalized transaction rates. When doing IPsec, there is an additional overhead to secure the network connection, which decreases the number of achievable transactions. Enabling the HW support for in-kernel crypto dramatically absorbs this decrease.

Normalized transaction rates

Along with the transaction decrease the response times get worse as well. The response time is up to 7 times longer, when the cryptographic algorithms must be calculated in software.

Response times for software-calculated cryptographic algorithms

Calculating in-kernel cryptographic operations in software with the generic algorithms leads to 13 times higher CPU costs compared to CPACF.

CPU cost for software-calculated cryptographic algorithms

Test environment
  • IBM System z9 (2094-S18)
  • IBM System Storage DS8300 (2107-922)
  • LPAR with 4 CPUs
  • Linux kernel 2.6.24
  • ipsec-tools-0.6.7

Back to top



Team
Please address any comments to the performance team: linux390@de.ibm.com