When it comes to cyber-security, it is crucial both to plan an infrastructural security of the network (with firewalls, IDS, IPS, SIEM …) and to minimize the number of exploitable vulnerabilities in the applications (with application security scanners). Belong this, to ensure the security of information owned by an organization, it is still critical to protect the data itself from external and internal threats.
Data Security is surely a hot topic and it refers to apply measures to prevent unauthorized access to computers, web-pages or databases, ensuring the information confidentiality. The information must be protected from corruption as well, guaranteeing its integrity and availability.
We can identify three main drivers for Data Security:
-
the risks related to a potential loss/breach of data;
-
privacy issues;
-
compliance and regulations.
Data risks: not only data breach
Many information security regulations have been defined to make organizations adopt security policies, procedures and controls. Being compliant to regulations like PCI DSS, SOX, HIPAA, ISO/IEC and the upcoming GDPR, is critical in cyber-security.
However, Data Security policies are not applied only for compliant purposes; there are effective risks to keep in mind. The greatest fear for the information owner is the data breach, when the data is leaked and its spread becomes unmanageable. Data loss and data corruption are other risks, especially when a back-up of the information is not available.
In the big data age, many organizations tend to store huge amount of data even if not immediately needed to take advantage of it in the future. This is the so called “dark data” and it represent a risk when sensitive or personal information is stored.
Indeed, privacy issues must be considered when stored data are “personal” or “sensitive”. Any information directly or indirectly related to an identifiable person is personal. Sensitive data is instead a special category of personal data where the information is related to the data subject’s racial origin, ethnic origin, health, political opinions or sexual life. Special protection for this kind of information must be implemented due the higher risks related to its hypothetical loss.
Data classification
When an organization approaches to Data Security, the first step is to understand what data types are available and where this data is located. This is the classification and discovery phase when all the data sources are analyzed to search what information is stored and how is stored. Many data categories can be identified according to different terms and levels, but the main categorization is between structured and unstructured data.
Structured data refers to information with a high degree of organization usually stored in relational databases, while unstructured data refers to everything else without such degree of organization (files, audio, text, social media posts …). The process of searching information in unstructured data is harder due to its nature and to the less mature technologies.
The difference between structured and unstructured data also reflects in the choice of the database. Traditional SQL databases (SQL Server, MySQL, IBM Db2 or Oracle) are still the most used for storing structured data, but the rise of no-SQL databases for unstructured data must be considered in Data Security strategies.
Data protection
When an organization knows what it stores, the next step is the protection of the information. Anonymization is crucial in data protection because it allows to hide the data subject’s identity. The anonymization is often addressed through obfuscation methodologies.
Obfuscation techniques are used to anonymize data, but also just to hide parts of information. The four main obfuscation techniques are:
- Redaction, when a piece of data is blacked out (often used for credit cards when the first digits are substituted with special characters as * or #);
- Masking, when the authentic information is replaced with unauthentic information maintaining the data structure and format (often used for testing applications);
- Tokenization, a non-mathematical substitution of the original value with a token. The relation between original value and token is a mapping stored somewhere;
- Encryption, where the information is encoded with a key and only authorized parties can access the data with that key. Many encryption techniques and algorithms are currently known to protect both data in transit and data at rest.
Beside data obfuscation, it is crucial to have a footprint of who is accessing the data sources, what are the operations on this data and from where the accesses come from. A detailed database activity monitoring, based on the logs of the database servers, can identify if the implemented access control policies are respected and can highlights possible abnormal behaviors. Thanks to database activity monitoring, it is also faster to detect an eventual data breach and consequently react.
Data Security process
Securing and protecting your sensitive data is a long journey which starts with understanding the issues related to your data and continues through the next steps:
- Data classification: discovery of the sensitive data and where they are located. Access control and protection measures are based on what is found out in this initial phase.
- Data protection: how to protect my data? What methodologies are suitable for my environment and my applications? Obfuscation techniques are chosen in this phase to anonymize and encoding data.
- Database activity monitoring: monitor and log all the activities on my data sources to react with notification alerts or even by blocking some action.
- Audit and report: audit the process and report on what is found to the designed responsible. A report can be based on the sensitive data discovered (for example data that matches a given pattern) or on the potentially malicious operations on the data sources.
Author: Ander Schiavella