Metadata, scanning and security in the cloud

Share this post:

Systems are always susceptible to scanning—this is the reality of the information age. We are all public and global citizens, and our data transcends geographies, so it is our responsibility to use systems and technology properly. For this we need to understand how the technology we use works and use the great tools we have at our disposal to make life easier. Metadata aims to help us categorize all of this information so that we can find and use it.

There has been much debate recently about the fact that metadata is being collected and used by the government in their investigations, but the general public knows very little about what metadata is and what it says about you.

We have been using metadata for a long time, in several different ways, to improve the systems used to provide services. For example, we use geolocation information to provide relevant information on enhanced reality applications, to determine the correct filters to use on digital camera pictures, to change the time on your devices so you are not late to a meeting when traveling and so on.

Defining metadata

But what exactly is metadata? In essence, metadata is data about data (think of it as the tags you use to classify information or group objects). It is descriptive information about that data. This information can be manually input (by users registering into a system for service) or automatically gathered (like the information the cell phone sends to the towers with info on the channel, tower, protocol, signal strength and so on) and may include information on how it is formatted, the collection or creation date, who collected it and with what device.

The interesting piece here is determining if the metadata is personally identifiable information (PII) or not, and since there is a big debate on this topic, I will just say that it depends on what data you are collecting.

Metadata, like any other label, will categorize the data, and this means that if your indexing system requires it, the metadata will contain PII. But regardless of this, with enough linked metadata, a system will be able to profile the user and know him or her in extreme detail.

This opens up the next debate topic: Is this good or bad? The short answer is that it is neither. If used properly, metadata can provide the service-delivering organizations the ability to tailor the output to how you (or your metadata profile) define the best way to get through to you, based on your likes or dislikes, your usage of the devices and technology and so on. This is how most search engines prioritize results that are shown to you and send pop ups for relevant alerts (like sales on your favorite items and ads delivered to you).

The dark side of metadata is that if you link enough of it together,  you can have a very detailed profile of a person and even identify them directly. The method for doing this is simple, but you’d still need a large data sample to generate a profile or to find the right data sources (metadata repositories) to do so. Unfortunately, most people have freely shared this information by several means in public forums and  social media, or even image metadata in the pictures taken with digital equipment (including EXIF, IPTC and XMP metadata that you can fortunately strip from the files when posting online).


Recently, many people have talked about the use of encryption mechanisms to keep this data secure, but the reality is that encryption will hide the data of your message, not the metadata of your communication, and here is where private cloud solutions come into play.

Since most people use open systems to communicate, let’s use the example of email. Email is the electronic equivalent of using a postcard, so we won’t go too deep into this, as you are probably already aware that it is scanned by spam filters, antivirus and even junk mail rules from the destination.

Let’s say you want to have the digital equivalent to sending a letter; so now you require encryption, which is the digital envelope. This makes your data (the information inside your email) less susceptible to being read, as it is no longer as easy to do since a key is required to open the letter (similar to having someone steam open an envelope). You can’t really avoid it, but it makes it a hassle for the hacker to do so. Still, your metadata is open and will identify you, the recipient, the message size, your sending device and its attributes. Just remember that the email header is metadata, so anything you see in the header will be sent in a format that the system routing the email will easily understand. It still has a lot of PII that you are sending to a public forum (in the case of free email), and most companies will use this data to generate revenue (selling marketing data, targeting ads and so on).

Now if you really need to be secure, your only other option would be to use a system where metadata, or the type of metadata being requested, is used internally and not shared, and this is where cloud collaboration solutions come into play. Inside your internal company, the recipient and sender information are only being used to route the data from one point to the other. In larger enterprises, this may span over several countries using encrypted virtual private networks (VPNs) and other technologies that will guarantee that data privacy and metadata are being used properly. Think of it like using a private carrier for your mail.

This same thing happens to all types of metadata. Without it we would not be able to tweak and respond to user needs proactively, but being conscious of what data exists in the metadata you use will help you make a sound decision when choosing your cloud-enabled services, and you should be aware of what the metadata you provide is used for, so read the fine print and work with your cloud service provider to get the best solution to match your security needs.

How do you secure your data in the cloud? Stay tuned to Thoughts on Cloud for my next post about this topic.

More stories

Why we added new map tools to Netcool

I had the opportunity to visit a number of telecommunications clients using IBM Netcool over the last year. We frequently discussed the benefits of have a geographically mapped view of topology. Not just because it was nice “eye candy” in the Network Operations Center (NOC), but because it gives an important geographically-based view of network […]

Continue reading

How to streamline continuous delivery through better auditing

IT managers, does this sound familiar? Just when everything is running smoothly, you encounter the release management process in place for upgrading business applications in the production environment. You get an error notification in one of the workflows running the release management process. It can be especially frustrating when the error is coming from the […]

Continue reading

Want to see the latest from WebSphere Liberty? Join our webcast

We just released the latest release of WebSphere Liberty, It includes many new enhancements to its security, database management and overall performance. Interested in what’s new? Join our webcast on January 11, 2017. Why? Read on. I used to take time to reflect on the year behind me as the calendar year closed out, […]

Continue reading