Java development 2.0: Securing Java application data for cloud computing

Use private-key encryption to protect cloud data

Data security is a serious concern for organizations considering cloud adoption, but in many cases it needn't be. In this installment of Java development 2.0, learn how to use private-key encryption and the Advanced Encryption Standard to secure sensitive application data for the cloud. You'll also get a quick tutorial on encryption strategy, which is important for maximizing the efficiency of conditional searches on distributed cloud datastores.

Andrew Glover, CTO, App47

Andrew GloverAndrew Glover is a developer, author, speaker, and entrepreneur with a passion for behavior-driven development, Continuous Integration, and Agile software development. He is the founder of the easyb Behavior-Driven Development (BDD) framework and is the co-author of three books: Continuous Integration, Groovy in Action, and Java Testing Patterns. You can keep up with him at his blog and by following him on Twitter.



24 January 2012

Also available in Chinese Russian Japanese Vietnamese Portuguese Spanish

About this series

The Java development landscape has changed radically since Java technology first emerged. Thanks to mature open source frameworks and reliable for-rent deployment infrastructures, it's now possible to assemble, test, run, and maintain Java applications quickly and inexpensively. In this series, Andrew Glover explores the spectrum of technologies and tools that make this new Java development paradigm possible.

In just a few years, cloud computing platforms and services have dramatically changed the landscape of Java™ application development. They've lowered the barriers associated with system maintenance and configuration and have simultaneously decreased the cost and increased the speed of getting software to market. Conceptually, cloud computing makes sense: business managers love the return on investment, and developers love the freedom from infrastructure code. But many shops still are struggling with whether or not to move to cloud platforms.

Data security is one of the main concerns for an organization considering migrating its software to a cloud infrastructure. The more sensitive the data is, the more reason there is for concern. As software developers, it's important that we understand both the real security risks of cloud computing and the realistic approaches to solving at least some of these concerns.

In this installment of Java development 2.0, I'll explain what makes storing data in the clouds different from storing it on a centralized machine. I'll then show you how to use the Java platform's built-in private-key encryption standards and utilities to reasonably secure your data, even if it is stored on a distributed cloud datastore. Finally, I'll demonstrate a strategic approach to encryption, using the conditions of the query as a baseline for whether or not to encrypt your data.

Securing cloud data

Cloud computing doesn't exactly introduce new data security concerns; in most cases, it just amplifies them. Putting data in the cloud potentially exposes it to a larger audience, which is usually a good thing. But if the data exposed is meant to be private, or only conditionally accessed, then the results could be catastrophic. The fundamental issue with cloud computing is that it removes entrusted data from a developer or sys-admin's immediate control. Rather than being stored and managed locally, data in the cloud is stored on distributed devices that could be located anywhere, and conceivably accessed by anyone.

Data privacy in the EU

The European Union's approach to data privacy on cloud platforms is much stricter than that of the United States: personal data belonging to a citizen of the EU (such as a French citizen's medical record) must reside on servers existing inside the EU.

Even if your company can live with the fact of a decentralized, far away datastore, you'll want your applications in the cloud to proceed with a modicum of data security. When you start to think about data security, two important questions arise:

  • Is the data secure during transit?
  • Is the data secure at rest?

Data in transit relates to how data passes from one location to another one; that is, which communication technology and infrastructure you're using. Data at rest relates to how — and how well — your data is stored. If, for example, you store user names and passwords in a database without encrypting them, then your data at rest is not secure.

For securing data in transit over the web, it's common to use HTTPS. This is HTTP with data encryption traveling from browsers to clients. Another advantage of HTTPS is its ubiquity: Most developers have configured Apache, Tomcat, or Jetty to use HTTPS.

Encryption is also the common mechanism for securing data at rest, and cloud computing doesn't change that. While encryption can be esoteric, you only need to know some basic things about it in order to reasonably secure your application data. And once your data is secure, it really doesn't matter whether you serve it up locally or via a cloud platform or datastore.


Private-key encryption

Encryption is the process of transforming plain, human-readable text into unreadable text. You do this with a cryptographic algorithm, also known as a cipher. The encrypted text is decrypted back into readable text via a key, which is essentially a password. Encryption works to secure information by making it unreadable by anyone lacking the key.

Two types of key-based encryption are used in computing: public-key encryption and private-key encryption. Public-key encryption is the most common technique for securing data in transit; in fact, it's the underlying architecture of HTTPS transaction security. This form of encryption requires two keys in a public-private key set. The public key encrypts data and the private key is used to decrypt that data.  In public-key encryption, the public key can be safely distributed, while the private key must remain under the control of an administrator. Public-key encryption makes it easy to share encrypted information.

Private keys and privacy

Regardless of the encryption algorithm you use, you must ensure that your private key is secure. Your pass phrase should meet high security standards and should never be stored in clear text — especially not in the cloud. Fortunately, the Java platform's security infrastructure creates reasonably complex keys and lets you secure them in the Java platform keystore.

In private-key encryption, data is encrypted and decrypted using a single private key. This type of encryption makes it hard to share encrypted data with a third party because both the sender and receiver must use the same key. If that key is compromised, then all of the encrypted information is compromised. Private-key encryption is highly effective if the data being encrypted doesn't need to be shared with other parties, so that the key can at all times be kept under tight control.

Private-key encryption is an effective means of securing application data that will be stored and transmitted via a cloud infrastructure. Because the encryption key remains in the control of an administrator or application creator, cloud providers and other potential eavesdroppers do not have uncontrolled access to that data.


Encrypting a Java application

You can choose from a variety of options for securing Java applications, including the standard Java platform libraries. You also have a range of encryption standards and packages to choose from. For the following examples, I'll use the core Java libraries and AES, or Advanced Encryption Standard. I'll use a private key to both encrypt plain text and decrypt ciphertext, which is plain text that has been encrypted. I like AES because it's approved by the National Security Agency and standardized by the United States government.

For maximum flexibility and for ease of testing, I'll create some cryptography interfaces and associated implementation classes that simply wrap core Java classes. I will then show you how to use these classes to securely persist and even query data in cloud datastores like Amazon's SimpleDB or even MongoHQ's MongoDB.

In Listing 1, I define a simple generic cryptography interface that defines two methods for encrypting and decrypting data. This interface will serve as a front for various algorithms; that is, my implementation classes will use a particular cipher like AES.

Listing 1. A cryptography interface
package com.b50.crypto;

public interface Cryptographical {
 String encrypt(String plaintext);
 String decrypt(String ciphertext);
}

With my Cryptographical interface I can either encrypt text or decrypt cipher text. Next in Listing 2, I'll use the Java Security API to create another interface that represents a key:

Listing 2. A key interface
package com.b50.crypto;

import java.security.Key;

public interface CryptoKeyable {
  Key getKey();
}

As you can see, my CryptoKeyable interface just serves as a wrapper for the Java platform's core Key type.

If you're using AES encryption, then the binary characters generated when you encrypt plain text will need to be base-64 encoded — or at least that's so if you'll want to use them in web requests (for example, with SimpleDB domains). Thus, I'll encode all encrypted strings and decode any decrypted strings.

My Cryptographical implementation class for AES, shown in Listing 3, not only handles AES encryption but base-64 encoding and decoding:

Listing 3. An AES implementation of my Cryptographical interface
package com.b50.crypto;

import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
import javax.crypto.Cipher;
import javax.crypto.NoSuchPaddingException;
import java.security.InvalidKeyException;
import java.security.Key;
import java.security.NoSuchAlgorithmException;

public class AESCryptoImpl implements Cryptographical {

 private Key key;
 private Cipher ecipher;
 private Cipher dcipher;

 private AESCryptoImpl(Key key) throws NoSuchAlgorithmException,
   NoSuchPaddingException, InvalidKeyException {
  this.key = key;
  this.ecipher = Cipher.getInstance("AES");
  this.dcipher = Cipher.getInstance("AES");
  this.ecipher.init(Cipher.ENCRYPT_MODE, key);
  this.dcipher.init(Cipher.DECRYPT_MODE, key);
 }

 public static Cryptographical initialize(CryptoKeyable key) throws CryptoException {
  try {
   return new AESCryptoImpl(key.getKey());
  } catch (NoSuchAlgorithmException e) {
   throw new CryptoException(e);
  } catch (NoSuchPaddingException e) {
   throw new CryptoException(e);
  } catch (InvalidKeyException e) {
   throw new CryptoException(e);
  }
 }

 public String encrypt(String plaintext) {
  try {
   return new BASE64Encoder().encode(ecipher.doFinal(plaintext.getBytes("UTF8")));
  } catch (Exception e) {
   throw new RuntimeException(e);
  }
 }

 public String decrypt(String ciphertext) {
  try {
   return new String(dcipher.doFinal(new BASE64Decoder().decodeBuffer(ciphertext)), 
     "UTF8");
  } catch (Exception e) {
   throw new RuntimeException(e);
  }
 }
}

The Java KeyStore

Next, let's think about the encryption key. The Java platform's core libraries can be used to create strong encryption keys; however, these methods will always produce a new randomly generated key. So, if you create a key using the Java KeyGenerator class, you'll need to store that key for future use (that is, until you decide to decrypt the text encrypted with that key). For this, you can use the Java platform KeyStore utility and corresponding classes.

KeyStore holds a set of classes that enable you to save a key to a password-protected binary file, called a keystore. I can test keys in Java with a few test cases. First, I create two instances of a Key and show that each one's corresponding encrypted String is different, shown in Listing 4:

Listing 4. Simple encryption using two different keys
@Test
public void testEncryptRandomKey() throws Exception {
 SecretKey key = KeyGenerator.getInstance("AES").generateKey();
 Cryptographical crypto = AESCryptoImpl.initialize(new AESCryptoKey(key));
 String enc = crypto.encrypt("Andy");
 Assert.assertEquals("Andy", crypto.decrypt(enc));

 SecretKey anotherKey = KeyGenerator.getInstance("AES").generateKey();
 Cryptographical anotherInst = AESCryptoImpl.initialize(new AESCryptoKey(anotherKey));
 String anotherEncrypt = anotherInst.encrypt("Andy");
 Assert.assertEquals("Andy", anotherInst.decrypt(anotherEncrypt));

 Assert.assertFalse(anotherEncrypt.equals(enc));
}

Next, in Listing 5, I demonstrate that a given key instance always yields the exact same encrypted text for a corresponding String:

Listing 5. A private key corresponds to a single string
@Test
public void testEncrypt() throws Exception {
 SecretKey key = KeyGenerator.getInstance("AES").generateKey();

 KeyStore ks = KeyStore.getInstance("JCEKS");
 ks.load(null, null);
 KeyStore.SecretKeyEntry skEntry = new KeyStore.SecretKeyEntry(key);
 ks.setEntry("mykey", skEntry, 
   new KeyStore.PasswordProtection("mykeypassword".toCharArray()));
 FileOutputStream fos = new FileOutputStream("agb50.keystore");
 ks.store(fos, "somepassword".toCharArray());
 fos.close();

 Cryptographical crypto = AESCryptoImpl.initialize(new AESCryptoKey(key));
 String enc = crypto.encrypt("Andy");
 Assert.assertEquals("Andy", crypto.decrypt(enc));

 //alternatively, read the keystore file itself to obtain the key

 Cryptographical anotherInst = AESCryptoImpl.initialize(new AESCryptoKey(key));
 String anotherEncrypt = anotherInst.encrypt("Andy");
 Assert.assertEquals("Andy", anotherInst.decrypt(anotherEncrypt));

 Assert.assertTrue(anotherEncrypt.equals(enc));
}

What I encrypt with a particular key, I need to decrypt with the same key. Using the Java KeyStore is a convenient and safe way to store my keys.

Notes about the keystore

The code in Listings 4 and 5 is boilerplate, but it serves to demonstrate a few things:

  • The keystore has a name.
  • The file stored is protected by a password.
  • The keystore can store more than one key.
  • Each key inside the store has an associated password.

For this test case, I decided to create a new keystore each time I ran the test. I could just as easily have opened up an existing keystore for each new test. If I wanted to use an existing keystore, I would have needed to know its password, as well as the password to access a particular key.

The key is everything when it comes to encryption. It doesn't matter how strong the underlying cipher is; if my key is compromised, then my data is exposed. This also means ensuring that the keystore and its associated pass phrases are always secure. (For instance, in a production application I wouldn't hardcode the passwords as I did for the purpose of demonstration in Listings 4 and 5.)


Cloud cryptography

When you encrypt data, you change its properties. Essentially, this means that an integer encrypted won't then respond nicely to integer comparison. Consequently, it's important to think through how, why, and under what circumstances you'll eventually query for data stored in the cloud. The good news is that in many cases, the data you wish to keep private will be of different business value than data you wish to manipulate: encrypting an account's name or some personal information about the account holder makes sense, but encrypting an account balance might not (because who will care about an account balance that can't be tied back to a person?).

Querying with encryption

Encrypted data is easily searched for when doing exact matches such as: "Find me all accounts named 'foo' (where 'foo' is encrypted)." But that doesn't work natively for conditional queries like: "Find me all accounts whose overdue balance is greater than $450 where $450 is encrypted."

For instance, let's imagine that I use a simple cipher that reverses character order and adds the character i to the end of a string. In this case, the string foo would become oofi and 450 would become 054i. If the name value in the table was encrypted using this simple cipher, I could easily query by exact matches, such as "select * from table where name = 'oofi'". But a comparison of the encrypted value of 450 would be a different beast altogether: "select * from table where amount > 054i" isn't quite the same as "select * from table where amount > 450".

In order to do a data comparison in this case, I might have to do some decryption in the application — that is, I'd need to select all data from a table, decrypt the amount field, and then perform the comparison. Being unable to rely on the underlying datastore for this activity means that my filtering probably won't be as fast as it would be with the datastore. Given that I want to maximize efficiency, I should think through what data I want to encrypt, and how I want to encrypt it. Encrypting with future queries in mind is a good way to improve the overall efficiency of the program.

It's easy to encrypt an account name in MongoDB and search by its encrypted name, as shown in Listing 6:

Listing 6. Encryption with MongoDB
@Test
public void encryptMongoDBRecords() throws Exception {
 KeyStore.SecretKeyEntry pkEntry = getKeyStoreEntry();
 Cryptographical crypto = 
   AESCryptoImpl.initialize(new AESCryptoKey(pkEntry.getSecretKey()));

 DB db = getMongoConnection();
 DBCollection coll = db.getCollection("accounts");

 BasicDBObject encryptedDoc = new BasicDBObject();
 encryptedDoc.put("name", crypto.encrypt("Acme Life, LLC"));
 coll.insert(encryptedDoc);


 BasicDBObject encryptedQuery = new BasicDBObject();
 encryptedQuery.put("name", crypto.encrypt("Acme Life, LLC"));

 DBObject result = coll.findOne(encryptedQuery);
 String value = result.get("name").toString();
 Assert.assertEquals("Acme Life, LLC", crypto.decrypt(value));
}

My first step in Listing 6 is to use the getKeyStoreEntry method to read an existing keystore. I next obtain a connection to a MongoDB instance, which in this case happens to live in the cloud over at MongoHQ. I then grab a link to the accounts collection (what an RDBMS programmer would call the accounts table) and proceed to insert a new account record with its corresponding name encrypted. Finally, I search for that same record by encrypting my search string (where name equals an encrypted "Acme Life, LLC").

The record in MongoDB would look something like what's shown in Listing 7. (Note that your encrypted "Acme Life, LLC" string would be different from mine because you would have a different key.)

Listing 7. A MongoDB encryption test case
{
 _id : "4ee0c541300484530bf9c6fa",
 name : "f0wJxYyVhfH0UkkTLKGZng=="
}

I've left the actual key (name) unencrypted in the document, but I could have encrypted that as well. If I did so, my corresponding queries would simply need to reflect the change. I could have also encrypted the collection name. Direct String comparisons will work regardless of whether they are encrypted.

This strategy isn't limited to MongoDB implementations. For instance, I could execute roughly the same test case with SimpleDB, as shown in Listing 8:

Listing 8. A SimpleDB encryption test case
@Test
public void testSimpleDBEncryptInsert() throws Exception {

 KeyStore.SecretKeyEntry pkEntry = getKeyStoreEntry();
 Cryptographical crypto = 
   AESCryptoImpl.initialize(new AESCryptoKey(pkEntry.getSecretKey()));

 AmazonSimpleDB sdb = getSimpleDB();
 String domain = "accounts";
 sdb.createDomain(new CreateDomainRequest(domain));

 List<ReplaceableItem> data = new ArrayList<ReplaceableItem>();

 String encryptedName = crypto.encrypt("Acme Life, LLC");

 data.add(new ReplaceableItem().withName("account_02").withAttributes(
  new ReplaceableAttribute().withName("name").withValue(encryptedName)));

 sdb.batchPutAttributes(new BatchPutAttributesRequest(domain, data));

 String qry = "select * from " + SimpleDBUtils.quoteName(domain) 
   + " where name = '" + encryptedName + "'";

 SelectRequest selectRequest = new SelectRequest(qry);
 for (Item item : sdb.select(selectRequest).getItems()) {
  Assert.assertEquals("account_02", item.getName());
 }
}

Here I followed the same steps from my MongoDB example: I read from an existing keystore, obtained a connection to Amazon's SimpleDB, and then inserted an account record whose name attribute was encrypted. Finally, I looked up the account by name, using the encrypted value as its key.


In conclusion

While cloud computing promises to make your data accessible to a wide audience, there's plenty that you can do to protect sensitive data. In this article, I've showed you how to use Java platform libraries to secure data at rest on a cloud infrastructure such as MongoDB or SimpleDB. Private-key encryption keeps data security in the hands of the administrator of that data. And storing private keys in your Java KeyStore keeps them manageable and safe. There's only one password for accessing a private key, and one thing you never want to do is store that password in clear text anywhere near the cloud.

Stored data that has been encrypted does behave differently from clear-text data in the context of some searches. Exact matches will work fine, but conditional queries that involve non-exact matches can be a headache. The solution here lies in how you handle (or do not handle) the comparison. Always think through what you'll encrypt with your intended queries in mind. Encrypting all of your data could be overkill, so give consideration to what is being queried and how.

Resources

Learn

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Cloud computing, Security
ArticleID=788839
ArticleTitle=Java development 2.0: Securing Java application data for cloud computing
publish-date=01242012