The emergence of cloud computing as an alternative deployment strategy for IT systems presents many opportunities, yet challenges traditional notions of data security. The fact that data security regulations are developing teeth leaves information technology professionals perplexed as to how to take advantage of cloud computing while proving compliant to regulations designed to protect sensitive information. (For example, TJX and Heartland Payment Systems together paid more than US$220M as fines and settlements for breaches of 94 and 130 million credit-card numbers in 2007 and 2009, respectively. These represent the largest publicly disclosed breaches of sensitive data from computer systems.)
There are many approaches to the problem; the pole positions are:
- To not use the cloud at all.
- To embrace cloud computing completely.
In my opinion, the optimal solution lies somewhere in the middle: With sensitive data secured and managed within controlled zones while non-sensitive data lives in clouds. This allows companies to prove compliance to data security regulations, while leveraging clouds, private or public, to the maximum extent possible.
In this article, I'll describe how a specific type of web application architecture optimizes IT investments by using cloud computing while complying with data security regulations.
The traditional web application architecture
Conceptually, web applications are simple. The browser — representing the client side of a client-server connection — displays a form and requests data from the user. The server is represented by a software program executing on some web application server. Upon the user submitting the form, the server program receives and processes the information and returns a response based on the outcome of the transaction. This interaction is shown in Figure 1.
Figure 1. Standard web application architecture
While the model can get complex depending on what tasks the applications perform, there is a common feature among them: The web form must identify the Uniform Resource Locator (URL) of the server so the browser knows where to send the form data when the user submits the form for processing.
For a majority of applications, users typically interact with the same server throughout the transaction. However, depending on many factors, the browser may be redirected to different servers and thus, to different URLs, for some parts of the transaction. And of course, users are shielded from the complexities of redirection, allowing them to perceive the transaction as seamless. More often than not, the redirections are to the same domain even if they are to different servers.
In some e-commerce applications, the browser may be redirected to a payment processor's site where the payment transaction is processed and redirected back to the original site to conclude the transaction. The advantage for the e-commerce site is that they do not have to build and maintain infrastructure for the payment-processing part of the transaction. This redirection is shown in Figure 2.
Figure 2. With redirection to a payment processor
Disadvantages of the current mode of IT investments
There are many disadvantages to how IT investments are currently made. Assuming a typical e-commerce application as an example, here is what a company must be responsible for in the current mode of IT investments:
- It must procure physical resources — compute, storage, and network — for all functions of the application:
- Customer registration
- Product management
- Purchase transactions
- Payment processing
- It must ensure redundancy of the computing infrastructure for business continuity — usually doubling the infrastructure investment.
- It must secure the entire infrastructure. Since most sites do not distinguish between sensitive and non-sensitive data, the security framework usually applies to all components of the infrastructure and data. This represents a mis-allocation of resources since non-sensitive information does not need the same degree of protection as sensitive information. (In the last few years, because of PCI-DSS [Payment Card Industry Data Security Standard], sites do make a distinction between a "PCI zone" and "non-PCI zone," "PCI data" and "non-PCI data." The PCI zone and data typically receives more attention and investment from a security standpoint than its non-PCI counterparts. While this might be considered a form of optimization because the non-PCI zone is within the network perimeter of the site, the company is still spending more money protecting data than if the application were designed with the architecture described in this paper.)
This mode of investing has remained unchanged for the last 40 years. While the capital outlay per investment has come down dramatically from the days of the mainframe, an application that must serve hundreds of thousands of users still requires a sizable capital outlay despite availability of commodity servers and open source software.
Emergence of cloud changes investment
The emergence of cloud-computing technology — especially public clouds — dramatically changes how such IT investments can be made. It is no longer necessary to make large, risky investments up front and depreciate those investments over the course of many years. With smaller outlays, companies can set up exactly the IT services they need and pay for only what they use. The economic impact of this change cannot be overstated; new businesses can come to market on significantly smaller budgets.
As significant as this change will be to delivering and managing IT services, the burden of securing sensitive data cannot be outsourced. While it may be contractually delegated to a third party, the responsibility of ensuring compliance to security regulations still remains with the owner of the data.
As such, I think that architects and designers of web applications will find the model described in this article useful to meeting their compliance obligations while taking full advantage of the power of cloud computing.
Meet Regulatory Compliant Cloud Computing
Business transactions consist of a mix of sensitive and non-sensitive data. What is deemed sensitive, as well as the ratio of non-sensitive-to -sensitive data, varies depending on the business and the type of transaction.
But assuming a normal distribution, for the vast majority of businesses the ratio of non-sensitive to sensitive data will approximate 4:1. Given this, the efficiency of IT investments can be improved by computing, storing, and managing sensitive data within regulated zones inside a secure perimeter while all non-sensitive data can be computed, stored, and managed in public clouds.
I call this architecture Regulatory Compliant Cloud Computing (RC3): A model of computing in which business transactions span across regulated zones and public clouds. Sensitive data is encrypted, tokenized, and managed in the regulated zone within the secure perimeter of an enterprise (or a delegated outsourced company) while all non-sensitive data resides in the public cloud. This is shown in Figure 3.
Figure 3. The Regulatory Compliant Cloud Computing architecture model
Next, we'll examine data classification in the RC3 architecture, then take a look at how data transactions from various industry scenarios work within the RC3 structure.
Data classification for RC3
A prerequisite for building RC3 applications is to classify data into three categories. This is necessary so that applications can be designed to deal with the data accordingly; to simplify communication between business units and technical people who develop and support IT services.
Let's look at the RC3 classifications.
Class 1/C1: Consists of sensitive and regulated data. This is data whose disclosure to the public would result in fines, potential lawsuits, and loss of goodwill to the breached entity. Examples of Class 1 data are credit card numbers, social security numbers, bank account numbers, other data of this type.
Class 2/C2: Consists of sensitive but non-regulated data. This is data which is not regulated but disclosure to the public would be detrimental to a company and/or result in some loss of goodwill to the breached entity. Examples of Class 2 data are employee salaries; sales figures for specific product lines; name, gender, and age of a customer, etc.
Class 3/C3: Consists of non-sensitive data. Or in other words, any data not C1 or C2. For example, product descriptions, images, etc.
It should be noted that data classification can be fluid: When sensitive data is tokenized in a well-designed encryption-and-key-management (EKM) system, it is effectively rendered non-sensitive. In this case, even C1/C2 data can be classified as C3 after it has undergone encryption and tokenization.
Based on these classifications, companies adopting RC3 will ensure the following:
- All C1 data will be processed and stored in regulated zones, within a secure network perimeter. These zones will prove they are compliant with applicable data security regulations. C1 data tokens (sensitive data that has been encrypted and replaced with tokens) may be stored in public clouds.
- All C2 data will be processed in secure, but not necessarily regulated, zones. C2 data tokens may be stored in public clouds.
- All C3 data may be processed and stored in public clouds.
Applications must be written to deal with this separation of data; but the web-application architecture — specifically the ability to redirect the browser to targeted servers — lends itself to support this model.
The next sections describe a few examples of transactions in different industry sectors. The model, however, can be applied to any industry that faces similar challenges.
An e-commerce RC3 transaction
For the e-commerce RC3 transaction example, depicted at a high-level, I've described the scenario using the Java™ application model, but you should understand the model is not exclusive to Java and can be easily duplicated in the .NET framework or using any scripting languages such as PHP, Ruby, etc. Additionally, while the examples might show the use of Amazon Web Services (AWS), this is merely for illustration; the model is easily duplicated in any public cloud infrastructures such as Azure, vCloud, IBM® SmartCloud, etc.
The regulated zone consists of a company demilitarized zone (DMZ) and a secure zone (SECZ). A web application server resides in the DMZ receiving connections from users on the Internet. It communicates with a database server and an enterprise key management infrastructure (EKMI) in the SECZ. The EKMI is responsible for encryption, tokenization, and key management for all C1 and C2 data. The EKMI is expected to have implemented all controls necessary to satisfy data security regulations. All communications are over TLS/SSL.
The public cloud zone (PBZ) consists of a web application server and a data store. The web application server receives connections from users on the Internet, as well as web service requests from the web application server in the company DMZ. All communications are over TLS/SSL. Web service requests from the company DMZ to the public cloud are further secured by SSL ClientAuth for mutual authentication between endpoints.
This type of transaction follows these steps.
- The user registers as a customer in the regulated zone and is assigned a
unique Customer ID (CID) which is treated as C3 data. The customer name
contact information is designated as C2 data while the customer's order
details are designated as C3 data. C2 data is encrypted, tokenized, and
stored in the EKMI. All C3 data is stored in the PBZ and is transmitted over
the client-authenticated SSL link along with session-related data for this
transaction. See Step 1 in Figure 4.
Figure 4. Steps in an RC3 e-commerce transaction
The user's browser is redirected to the PBZ at this point where most of the transaction is processed by:
- Reviewing a list of products.
- Determining their price and availability.
- Adding selected products to the cart.
- Providing shipping instructions and
- Any other non-payment related data.
The request headers carry session tokens assigned by the web application server in the DMZ; this allows transaction data in the PBZ to be correlated to the same transaction in the regulated zone. See Step 2 in Figure 4.
- When ready to check out, the user's browser is redirected to the company DMZ server
where the user submits a credit card for payment. Upon confirming the transaction,
the sensitive C1 data is encrypted, tokenized, and stored in the EKMI. Once
tokenized, the C3 data is stored in the PBZ through a client-authenticated web service
request. See Step 3 in Figure 4.
Some security notes about the e-commerce transaction:
- Compliance to data security regulations is proven by the fact that sensitive and regulated data is encrypted and stored by the EKMI in the secure zone.
- The PBZ does not store any credential information for the user. User authentication is performed in the regulated zone, a valid session token is assigned to this user, and the user's browser is redirected to the PBZ for further processing.
- Communications between the DMZ and PBZ are only in one direction, from the DMZ to the PBZ. The PBZ never communicates with servers in the regulated zone; if the application is designed appropriately, there is no need to do so. This ensures that any compromise in the PBZ never spills over into the regulated zone.
- Servers from the regulated zone communicate with the PBZ only over SSL client-authenticated web services. This avoids the need to store any authentication credentials in the PBZ. (SSL client authentication only requires the storage of a valid and trusted digital certificate on the target machine to authenticate a client connection. The client, however, must possess a valid private key to the digital certificate and participate in the SSL client-auth protocol.)
A healthcare RC3 transaction
This example, depicted at a high-level, is similar to the e-commerce transaction except that this transaction goes further by showing how large BLOBs (binary large objects) of unstructured data, such as an X-ray image, can also be stored in the PBZ while proving compliance. We're assuming that basic information about the patient was already created prior to this transaction.
This type of transaction follows these steps.
- A technician at an X-ray lab authenticates herself to servers in
the regulated zone of a hospital and establishes a session. If new patient data needs
to be created, this is done in the regulated zone where a Patient ID (PID) is
assigned. Some elements of the patient's demographic data are designated as C1/C2
data; as such, they are encrypted and tokenized by the EKMI. The hospital has the
choice of keeping the tokenized C1/C2 data within the controlled zone or storing it in
the PBZ using the secure one-way web service into the cloud. See Step 1 in Figure 5.
Figure 5. Steps in an RC3 healthcare transaction
- The technician's browser is redirected to the PBZ where she submits the non-sensitive parts of the transaction, such as:
- Date and time of the visit.
- Requesting doctor's identifier and his/her prescription for the test.
- Attending technician and actions carried out.
- Any other non-sensitive data.
The application is designed so that this part of the transaction does not carry any C1 or C2 data. See Step 2 in Figure 5.
- When ready to submit the X-ray image and the radiologist's report, the technician's browser is redirected to the regulated zone. The technician uploads the X-ray image and the report, which may be converted to an XML document by the web application. The rather large XML document consists of C1 data which must be secured.
The C1 data is received in the DMZ web application server and sent to a cryptographic engine capable of encrypting large unstructured data. A symmetric key is generated and used to encrypt the document contents. The symmetric key is escrowed in the EKMI while the encrypted X-ray and report is stored in the PBZ through a secure web service request. See Step 3 in Figure 5.
All security notes that apply to the e-commerce transaction also apply to the healthcare transaction. The only difference between the two transactions is the addition of unstructured data, the X-ray, to the healthcare transaction requiring the use of a specialized engine that is capable of handling the encryption and decryption of large BLOBs.
A manufacturing RC3 transaction
This example shows an engineer in an industrial setting, submitting a sensitive document such as a blueprint with a bill of materials (BOM) to an assembly line for manufacturing.
This type of transaction follows these steps.
- An engineer authenticates to servers in the regulated zone and establishes a session. The engineer is then redirected to the PBZ. A web service request securely transfers session-related information from the SECZ to the PBZ.
In the PBZ, the engineer creates a new transaction that only accepts C3 data into the cloud. The transaction is assigned a unique transaction ID and returned to the browser of the engineer in the request's response headers.
Since the transaction is for the creation of a new part by the manufacturing plant, the public part of the transactions accepts the non-sensitive components of the BOM. See Step 1 in Figure 6.
Figure 6. Steps in an RC3 manufacturing transaction
- The engineer's browser is redirected to the SECZ where the sensitive part of the transaction is submitted. This is information such as:
- The blueprint of the object to be manufactured.
- The sensitive parts of the BOM.
- Special instructions about the assembly, if any.
- Any other sensitive data.
The application is designed so that this part of the transaction carries necessary C1 and C2 data for encryption and tokenization in the SECZ. The encrypted blueprint is saved in the PBZ since it is now desensitized. See Step 2 in Figure 6.
All security notes that apply to the previous transactions apply to this transaction too.
To close, with the appropriate encryption key management, it is possible to use public clouds for computing and storing sensitive data while meeting the compliance requirements of data security regulations. The technology to enable this is currently available; what remains is for applications to be designed to take advantage of these capabilities — mainly, to design cloud applications so that they apply the appropriate level of security resources to differently classified levels of data the application accesses in order to function.
For more on how to perform tasks in the IBM Cloud, visit these resources:
- Up and download files from a Windows instance.
- Install IIS web server on Windows 2008 R2.
- Create an IBM Cloud instance with the Linux command line.
- Create an IBM Cloud instance with the Windows command line.
- Extend your corporate network with the IBM Cloud.
- High availability apps in the IBM Cloud.
- Parameterize cloud images for custom instances on the fly.
- Windows-targeted approaches to IBM Cloud provisioning.
- Deploy products using rapid deployment service.
- Integrate your authentication policy using a proxy.
- Configure the Linux Logical Volume Manager.
- Deploy a complex topology using a deployment utility tool.
- Provision and configure an instance that spans a public and private VLAN.
- Secure IBM Cloud access for Android devices.
- Recover data in IBM SmartCloud Enterprise.
- Secure virtual machine instances in the cloud.
- Cast Iron skill building: Learn the configuration not coding approach to connecting your public and private clouds and on-premise applications. Develop administration and configuration skills for WebSphere Cast Iron.
- In the developerWorks cloud developer resources, discover and share knowledge and experience of application and services developers building their projects for cloud deployment.
- Find out how to access IBM SmartCloud Enterprise.
Get products and technologies
- See the product images available for IBM SmartCloud Enterprise.
- Join a cloud computing group on developerWorks.
- Read all the great cloud blogs on developerWorks.
- Join the developerWorks community, a professional network and unified set of community tools for connecting, sharing, and collaborating.