Overview of the PCIe3 CAPI and PCIe3 FPGA compression accelerator adapters
The PCIe3 Field Programmable Gate Array (FPGA) Compression Accelerator Adapter (EJ12/EJ13; CCIN 59AB) is the first FPGA accelerator adapter that IBM released. It is the predecessor of the PCIe3 Coherent Accelerator Processor Interface (CAPI) Compression Accelerator Adapter (EJ1A/EJ1B; CCIN 2CF0).
The characteristics of the cards are similar. The PCIe3 CAPI Compression Acceleration adapter provides better latency and throughput and significant CPU load reduction.
It is possible to use both adapters in one system. However, it is not possible to use the two adapters at the same time from one application. You must decide if the application must use the EJ12/EJ13 adapter or the EJ1A/EJ1B adapter. Mixed configuration is not possible. For example, application A uses two EJ12 adapters and application B uses two EJ1A adapters. You cannot have application C using one EJ12 and one EJ1A adapter. Using the same card type is preferred since it allows distributing the load between the adapters automatically.
The accelerated GZIP cards implement the well-defined, open standard DEFLATE compressed data format, which is used in zlib, gzip, Java, and many other applications. Within the gzip and zip file formats, it has become the standard for compressed data exchange.
The high compression bandwidth of the card reduces the latency for a single compression job significantly. Its aggregate throughput allows to keep pace with common I/O traffic.
Thus, the card offers reduced data for storage and network traffic, at the same time having no or even a positive impact on most I/O traffic. It enables good standard compression in cases where software overhead did not allow it so far.
- Storage or transmission of large amounts of data - on an average, larger than 100MB/s
- Expensive storage with high storage bandwidth, where the compression ratio of the accelerator compared to fast software compression, yields significant savings
- Applications with a high average throughput of data to be compressed
- High peak throughput of data that software compression cannot keep up with
- Where a low latency for individual compression streams is required, and it is more difficult to run in parallel on many CPUs
- When the standard DEFLATE compression format is required for interchange, as used in GZIP, zlib, zip, or jar. Software compression methods such as LZ4 or LZS with lower compression ratio, but high bandwidth on CPUs is not an option in that case.
To achieve the best performance gain, strive for data block sizes that are larger than 64 kB, or buffer up smaller blocks before sending to hardware. The accelerated zlib library has a selectable buffering feature built-in too. For details about slot priorities and placement rules, see PCIe adapter placement and select the system you are working on.
- High throughput compression, saves storage and I/O bandwidth with little or no overhead
- CPU offload, CAPI interface with negligible software load, frees up CPU cores for higher value computation or licensed software
- Lower power consumption by offloading the CPU intensive compression to an FPGA
- Zlib/gzip standard format, widely used for data interchange
- Up to 2 GB/sec compression/decompression throughput
- 4-30x speedup achievable
- Compression ration near software zlib/gzip
Usage Example
$ time genwqe_gzip -ACAPI -B0 -c linux-3.17.tar > linux-3.17.capi.tar.gz
real 0m1.409s
user 0m0.032s
sys 0m0.164s
$ time genwqe_gzip -AGENWQE -B0 -c linux-3.17.tar > linux-3.17.genwqe.tar.gz
real 0m1.425s
user 0m0.024s
sys 0m0.188s
$ time gzip -c linux-3.17.tar > linux-3.17.sw.tar.gz
real 0m17.392s
user 0m16.600s
sys 0m0.112s
$ du -ch linux.tar
2.3G linux.tar
2.3G total
$ time ZLIB_ACCELERATOR=CAPI LD_PRELOAD=/usr/lib/genwqe/libz.so.1 scp -C linux.tar tul3:
linux.tar
100% 2339MB 83.5MB/s 00:28
real 0m28.097s
user 0m15.832s
sys 0m4.108s
$ time scp -C linux.tar tul3:
linux.tar
100% 2339MB 22.5MB/s 01:44
real 1m43.848s
user 1m42.100s
sys 0m2.284s
Where:
The test setup contains an IBM Tuleta 8247-22L with 20 CPU cores and 8 threads each, avgerage: 3.694
GHz with the PCIe3 CAPI Compression Acceleration adapter and the predecessor hardware accelerator.