It's hard to believe that almost 20 YEARS AGO in 1993 Hursley Labs unleashed upon the IT World a revolutionary new product,. Known then as Message Queue Manager (MQM) for MVS/ESA it earned its reputation as a robust piece of messaging middleware which continues to thrive decades later thanks to a rich set of product features. The popularity of WebSphere MQ, its useability, reliability, and longevity are the result of several factors that came together perfectly not only because of research conducted internally to IBM but because of the strong customer influence that helped to guide all of the iterations of this middleware even up to now. With a need by the financial markets in the early days for just such a product it became clear that, among needed function, security would be of priority. With billions of monetary transactions being processed, and traversing networks across the globe nothing could be left to chance. Implementation of support for SSL (Secure Sockets Layer) would provide a safety net along the wire (once messages left their resident queues) since the product was, early on, looking at over 80 different platforms for which it could be run on. How secure an MQ network can be is limited only by the sky today, unlike when version 1.2 did not even provide client connections. As support for such connections came to be, followed by other things such as WebSphere MQ clustering it became paramount that a way to secure messages in transit be mandatory. WebSphere MQ 5.3 was then announced to offer this support.
So, how generally then does SSL work and why would I need it so much ? Say that travel agency CorpAgent, Inc. books a group of executives for a conference simultaneously, making use of the parallelism in MQ to reserve cars, hotel rooms, and flights for all parties. Confidential information entered into the message queuing network will include personal names, corporate credit card information, general flight logistics, and other types of biographical information which, if entering the network unencrypted, would be susceptible to eavesdropping. This would represent a sort of security breach which could eventually lead to undesirable outcomes such as stolen propriety data or identity theft. Thus, the need for SSL to prevent such outcomes becomes clear. SSL will carry such data securely over an otherwise insecure network. The general level and type of security is controlled by the administrator of the SSL configuration. If CorpAgent, Inc. must connect to three different service providers (one for flights, the other for hotels, and the third for car rentals) then the administrator can decide (with great granularity) which security algorithms will encrypt any travel request messages sent. The administrator will be able to control if only authentication of some of the parties involved in the transaction takes place or all of them. The administrator will also control if data across the wire is encrypted at all. If CorpAgent's requirement is only that it authenticates who the other side is, then the admin can decide that any message data across the wire is unlikely to be eavesdropped upon thus can flow in the clear.
But in order to determine how secure an SSL connection should be made, we have to understand all of the possible threats that can occur while data moves between the SSL endpoints. Of these, threats to data integrity, authentication, and confidentiality are the most damaging. Integrity of the bytes on the wire ensures that the corporate credit information CorpAgent sends with its requests are not altered in transit. SSL makes use of a MAC (or message authentication code) in order to track application records. These records, now associated with a unique sequence number, can be checked to confirm that what CorpAgent thinks it sent is verified as received by the remote side. If the remote side can not reconcile the MAC then an error is raised was can be a sign that the integrity of the data is no longer good. This may be the result of a man-in-the-middle-attack (MITM) which seeks to intercept data as it moves between its intended endpoints. It's interesting to note that, while in this case the man-in-the-middle-attack has affected data integrity, this would only be possible because the second threat (to authentication) has taken place first. Since the man-in-the-middle was able to impersonate each endpoint, neither end has any idea it isn't communicating with the intended entity. Client or server authentication (through the use of a ring of certificates stored on keyrings or in keystores) is intended to eliminate third parties that would engage in such eavesdropping.
Lastly, a successful man-in-the-middle attack will have destroyed the confidentiality of this communication since the attacker will be able to intercept all messages on this conversation, leaving the two endpoints vulnerable to as long as this insertion goes undetected.
So, how do we deal with these threats to secure communication ? How do we first authenticate that the party CorpAgent thinks they've connected to is actually the Flight Airline Reservation server, and not the MITM ? We begin with the SSL negotiation process which executes across a WebSphere MQ channel which has been defined on a queue manager to use any particular Cipherspec which can be used for authentication. Because, in our example, CorpAgent is initiating all of the requests to connect to a server this always places CorpAgent into the role of the SSL client. SSL Clients are not required by protocol to be authenticated however WebSphere MQ will initiate such authentication if the security admin configures the channel to do so. On the other hand, SSL Servers are ALWAYS authenticated and so when CorpAgent connects to the Flight Airline server, if SSL is in place then the negotiation process will always check to see if the Flight server is in fact truly that. To implement these checks, a digitally-signed certificate is created and placed into the keyring associated with the Channel Initiator where CorpAgent's channel is defined. The queue manager itself also defines a keyword referencing this keyring which, through a series of RACDCERT commands, will contain not only the personal certificate used to authenticate the client end of the channel but should also contain the full chain of digital certificates which must be checked in order to confirm the authenticity of personal certificates as well as any signing CA certificates which vouch for the personal certificate. While third-party certificate authorities (such as Verisign, or Thawte) can be used to establish this chain-of-trust, CA certificates created using RACF (or any other external security manager product) will accomplish this just as well. Whether a third-party CA is used depends on the nature of the enterprise. If CorpAgent's personal certificate will leave the internal network and travel around the world then it would make more sense to use third-party CA certicates to sign it. This is because outside of the CorpAgent network the CA certificates created under RACF will be unknown and so the personal certificate will fail authentication. There is a cost associated with using third-party CAs and so this should be considered when determining which route to take.
Now that CorpAgent has set up its keyring we turn to the Flight Airline server. Since servers are always authenticated we had better have a similar configuration on the SSL server side or the secure connection will never set up. The only major difference between the client and server SSL set-up may be that the personal certificate representing the Flight server is a different certificate identified by a differing label name (serial number, etc). There is no reason why the server's personal certificate can't also be signed by the same CA certificate which signed the client's personal certificate. In fact, this simplifies administration. Whichever CAs have been used to sign the respective personal certificates, it's paramount that the signing CA be accessible (in some way) to the same platform that the personal certificate (it has signed) will reside on; otherwise authentication will not complete successfully.
With personal certificates now in place on both server and client this sets us up for mutual authentication of both end points. If we've signed both personal certificates with a third-party CA then when the SSL channel on CorpAgent is started a standard exchange of SSL certificates will flow between CorpAgent and the Flight server to validate the other's authenticity. Once done, the channels between the two will end up in a running state. Depending on the CipherSpec negotiated on between the endpoints any data that flows across the wire will be encrypted using a session key that was derived during the previous certificate exchange. If the certificates used are only intended to provide authentication, then any Cipherspec of the NULL variety can be used. Cipherspecs that are NULL therefore will not provide message privacy.
SSLv3 and TLSv1 make use of 4 sub-protocols for all SSL connections.
To establish connections the Handshake protocol is used :
Session identifier is established to determine the session state
Peer certificate [can be NULL]
Compression method [used to compress data prior to encryption]
Cipherspec [sets cryptographic attributes]
To signal changes with reference to the cipher strategy used the Change CiperSpec protocol is used.
The Alert Protocol allows alert messaging to be used to signal important conditions and whether they are severe or not.
Record Protocol is used to encrypt application data before it's sent to the TCP/IP stack's transport layer.
Traces will show that, as expected, SSL flows associated with WebSphere MQ will use all of these sub-protocols and the typical connection establishment will see each endpoint signal Hello to the other. This is followed by the mandatory transmission of the server certificate (to CorpAgent in this case) and (if the client authentication flag has been set to required on MQ's RECEIVER channel definition) then the client CorpAgent will also have its certificate authenticated by the Flight server. Regardless of whether the client authentication flag is OPTIONAL or REQUIRED the client certificate will still always flow to the server, but not necessarily used to authenticate the client.
The <ClientKeyExchange>, <CertificateVerify>, and <ChangeCipherSpec> flows will be sent to the server and represent the last 3 flows as part of the Handshake protocol before the connection will be considered to have successfully negotiated the secure attributes of the connection. Depending on the cipher chosen the <ClientKeyExchange> may contain nothing at all, or it may pass a pre-master key or public key. If flowed, the pre-master key is encrypted using the public key of the server's certificate. As its name implies, the pre-master key is used to generate the master key. This process is part of creation of the symmetric (session) encryption key which is used to encrypt data that will flow once negotiation completes.
When CorpAgent sends <CertificateVerify> to the Flight server this allows Flight server to confirm that CorpAgent does in fact have possession of the private key for its own certificate and thus it's highly likely it is the true owner of it. As a point, private keys should never be used by any entity other than the entity for which the certificate was created. These keys should neither be exported elsewhere nor shared in any other way since it's clear that once this key has been compromised then all of the protections of SSL are lost. Certificates thought to be so compromised should be added to Certificate Revocation Lists (CRLs) so that any other client or server which receives such a certificate can determine whether the certificate is generally considered suspect or not. Private keys are used to decrypt data which is encrypted (in this case using Flight server's public key); so should CorpAgent compromise its private key, then any entity with that key can now decrypt data not intended for it. Private keys must be kept highly protected. While historically LDAP servers were used by WebSphere MQ on all platforms to facilitate certificate revocation status, in recent levels of the code Online Certificate Status Protocol (OCSP) can be used. In this scenario after the server's personal certificate has been authenticated the client may choose to check the certificate using OCSP as well as make an optional check against the CRL depending on how it has been configured. If mutual authentication is in effect this will afford the server the same opportunities to check for any certificate revocations.
Returning to the flows, for the next to last, CorpAgent will flow <ChangeCipherSpec> which indicates everything else that now flows will be authenticated information and if a non-NULL CipherSpec was negotiated upon, then it will be encrypted to that CipherSpec as well. <Finished> flows from the client and it will be encrypted pursuant to the same CipherSpec.
Flight server will return a like <ChangeCipherSpec> and <Finished> and the secure connection will have been established.
From our discussion, the key algorithms WebSphere MQ employs to encrypt data would be symmetric, however the techniques used to generate the shared secret key are all asymmetric. Why the difference ? Because this avoids any problems associated with attempting to accomplish distribution of the associated keys. Once those asymmetric techniques have been used to create the shared secret key each side is able to use that shared secret key for the life of the session to perform message encryption (and by the way, symmetric encryption takes less time to perform than asymmetric encryption).
The WebSphere MQ InfoCenter topic "An Overview of the SSL Handshake" discusses any idiosyncrasies that the MQ handshake may exhibit when compared to most SSL implementations. Essentially the process is as described above with any differences only being related to technique in order to establish the secure connection with an agreed upon shared secret key.
The roadblocks to successfully using Secure Sockets Layer with WebSphere MQ are essentially known and many technotes exist to provide remedial action. The case of the chain-of-trust being incomplete is probably the most common but easily solved. Once certificates on a z/OS queue manager are added and connected to the RACF database the Channel Initiator will learn of the contents of the keyring as it starts up. While it's true on all platforms that a true refresh of the keyring will be done when the REFRESH SECURITY, TYPE=SSL command is issued, this does not hold for z/OS MQ in all cases. Specifically, when certificates are pending expiration and are renewed, the same labelname is kept in place. Because a change in labelname is the only way that MQ can differentiate between certificates, a REFRESH in this case will not cause the renewed certificate to be used. As the Channel Initiator continues to run it will keep using the old certificate that will soon expire. Once this happens there will be no recourse but to cycle the Channel Initiator so that (as mentioned above) it can read anew all of the certificates that have actually changed in the keyring. Bottom line : A REFRESH is not always a refresh.
Network considerations can also delay establishment of a secure connection (while, if the same channel has its CipherSpec removed it will start successfully but as a non-secure channel). This can be because of the greatly increased size of the maximum transmission units (MTUs) that must be able to pass through each network hop when SSL is to be used. These increased sizes are associated with the certificate data to be exchanged. Take the example of a small network where CorpAgent's TCP/IP stack must pass all TCP/IP packets through an intermediate hop (known as the GateWay) and then the GateWay will pass any of those packets on the Flight Server network hop. In our example the client certificate that must pass is 1500 bytes long. The TCP/IP profile maximum transmission unit size has been set in CorpAgent's stack to be 1500 bytes. This means that stack is willing to send 1460 bytes of real payload into the network (and this data will have 20 bytes of IP header plus 20 more bytes of TCP header tacked onto the front of it. These headers ensure the proper routing of these packets through the network to their final destination at Flight Server). But what if GateWay is configured to only allow an MTU of 576 to pass even though Flight Server has the ability to handle all 1500 bytes. What happens ? Well, when Flight Server sends its flow <CertificateRequest> to CorpAgent, the WebSphere MQ on Flight Server will put up a RECEIVE and wait for that client certificate to arrive. CorpAgent will send all 1500 bytes (because it's configured to send a maximum size of 1500). If the Don't Fragment (DF) bit is turned on in the IP header of the packet (containing the certificate) then GateWay will receive all 1500 bytes. Since it knows it can only pass 576, the lion's share (924) will be discarded. This is per protocol since TCP/IP is supposed to retransmit any data which is does not eventually receive an acknowledgement for. Meanwhile, Flight Server gets those 576 bytes (but because 924 are missing its RECEIVE will never be posted complete and the secure connection will not successfully establish itself). On the CorpAgent side the stack there will have gone into retransmission mode for the portion of the packet that has never been acknowledged. Once the retransmission limit is exhausted the connection will time out and be blown away.
There are a few possible steps that could allow connection establishment. Setting the MTU size on CorpAgent to 576 should ensure that it won't ever send a transmission unit that can't pass in any network !) since 576 is a magic number. The DF bit becomes a non-issue because nothing will ever need fragmentation. If the network changes then this static adjustment might need to be re-visited but for this simple scenario it would serve fine.
More complex changes could include making use of the ICMP protocol to dynamically learn about sizing throughout the network. No changes to the MTUs on any stack's profile would need to be made, but the PATHMTUDISCOVERY keyword would need to be added (and ICMP activated on every hop along intended routes) in order to make the network topology more responsive to changes in the size of transmission units. How does it work ? When a connection is initially established between any two endpoints within the TCP/IP network, those endpoints are responsible to determine how much data they can effectively pass. The endpoints will send relatively large (based on MTU-size) packets towards the direction of the other endpoint. When a transmission unit is too large to pass the hop which can not pass it will generate a type-3 ICMP packet indicating so. That packet will flow back to the originator of the test packet who will continue to drop the size of the test packet that it had previously sent until no ICMP's are returned. It is at this point that the maximum transmission unit that can be passed becomes known. This too makes the DF bit a non-issue since fragmentation no longer is required. The activation of ICMP is crucial if this is the solution. PATHMTUDISCOVERY also has to be set on at both endpoints or the solution will not be reliable.
The last point I wanted to just pass over goes back to the elusive chain-of-trust which, if set up properly, has always worked well. In fact, even if the chain-of-trust had been previously incomplete, in some cases this did not prevent authentication from taking place. Certificates that, sometimes, did not exist in the local keyring could still be accessible from the remote side if sent. This served to complete the chain and secure connection establishment was the result; however, in z/OS 1.12 the letter of the law was followed. Certificate chains which were essentially incomplete (but had allowed authentication before) were now failing at the new level of System SSL. Symptoms in MQ would include receipt of the message CSQX633E indicating that local checks against received certificates had failed. While, previously, missing intermediate certificates would be made available to complete the chain, this was no longer the case. The code was then updated to allow such configurations to behave as expected and OA35143 helped to documented these cases. While the vast majority of cases like this are related to procedural errors in how the chain is set up, this defect is important to mention. With such apars applied, the RACDCERT command can be used to interrogate the contents of the z/OS keyring. The validity of the chain can then be established by comparing the ISSUED BY and SUBJECT fields of each certificate in the chain. You should always be able to work your way backwards (using the ISSUED BY field of the personal certificate - from the remote side - in order to find a corresponding signer whose SUBJECT value is the same as that ISSUED BY value. Continuing by the process of iteration to follow this chain until you reach a certificate whose ISSUED BY and SUBJECT values are the same. Once there, you have reached the root. If you can not reach the root then the chain is incomplete).
In the best of all worlds the chain-of-trust should be complete where personal certificates reside within the same keyrings as those CAs that sign them (or at the very least those CAs must somehow be accessible). With a good configuration the benefits of authentication, message privacy, message integrity, and an intact certificate authentication chain can be realized.
One day we'll have to blog about all the other sorts of certificates out there (like SITE, self-signed, etc) or why we might use a public key infrastructure to secure our connections instead of something like a BlockIP exit. After all, which one is really better ? Well, I thnk it's a case of making sure to use the right tool for the job at hand. Cheers, and I hope your connections are always free from the man-in-the-middle.