The following describes how to enhance throughput of the IBM PCIe Cryptographic Coprocessor when using the IBM Common Cryptographic Architecture application programming interface. There are characteristics of your CCA host application program that can affect performance and throughput of the coprocessor. To better understand how to enhance throughput, consider these areas before designing your CCA applications:
- Multi-threading and multi-processing
- Caching of AES, DES, and PKA keys
Multi-threading and multi-processing
IBM cryptographic coprocessors are hardware security modules (HSMs). An IBM PCIe HSM can process multiple CCA requests simultaneously. This is because the HSM has several independent hardware elements that enable it to multi-process. These elements include the RSA engine, DES engine, CPU, random number generator, and PCIe communications interface. All of these elements work independently, and each can process a part of a different CCA verb at the same time. By being able to work on several CCA API calls at the same time, the HSM can keep some or all of its hardware elements busy. Keeping all of the hardware elements busy maximizes overall system throughput.
In order for an application to take advantage of the multi-processing capability of the coprocessor, the host system must send multiple CCA requests at the same time. These requests need to be sent without waiting for each one to finish before sending the next one. One option is to have several independent host application programs all using the coprocessor at the same time. The best way to accomplish this is to design CCA host application programs that are multi- threading. Every thread would independently send CCA requests to the HSM, such as when a Web server application starts a new thread for each request that it receives over the network. This results in enhanced utilization and maximum system throughput.
Incoming requests are automatically managed, and it is not possible to overload the HSM.
Caching AES, DES, and PKA keys
The HSM can optionally keep ready-to-use keys in a memory cache. Using cached keys enhances throughput, but can have an adverse affect. Here is an explanation about how caching keys enhances throughput, followed by how using cache can cause problems.
When a wrapped key is used, it incurs the overhead of being validated and unwrapped. The hardware security module (HSM) can optionally store a fin ite number of unwrapped symmetric or asymmetric keys in a memory cache, safely stored and ready to use within the encapsulated secure module. When caching keys is activated, any retained RSA key that is used is also stored in cache. Even though retained keys are never wrapped, uncached retained keys incur the overhead of being retrieved from internal flash EPROM memory. As a result of using key caching, applications that reuse a common set of keys can run much faster than those that use different keys for each transaction. Typical CCA applications use a common set of AES, DES, and PKA keys. This makes caching very effective at improving throughput. It should be noted that PKA public keys and uncached keys that are in the clear are not cached because they have very low overhead.
To improve performance, the CCA implementation provides caching of key records obtained from key storage within the CCA host code. However, the host cache is unique for each host process. Caching can be a problem if different host processes access the same key record. An update to a key record caused in one process does not affect the contents of the key cache held for other processes. To avoid this problem, caching of key records within the key storage system can be suppressed so that all processes will access the most current key records. To suppress caching of key records, use the export system command to set the environment variable CSUCACHE to 'N' or 'n' or to any string that begins with either of those letters.