Recent advances in cloud native technologies and data and AI tools offer great opportunities for developers and data scientists to modernize enterprise IT systems and develop smarter applications.
Also, as enterprises continue to optimize agility and cost of operation, they are increasingly moving select applications to SaaS providers, while keeping their mission-critical systems on-premises for a variety of technical and non-technical reasons.
Secure data integration is critical
The need for data integration and orchestration across these distributed environments is increasing with the rate of fragmentation of the underlying systems. For instance, building a smart sales recommender system that leverages the latest advances in data and AI in public cloud may require data scientists to integrate customer data from disparate sources. These sources may include a SaaS sales application (e.g. Salesforce), on-premises systems for order management and financial transactions, or marketing applications in the cloud, etc.
Primary security concerns
Ensuring data protection as it is being accessed, moved, processed, and used across these environments is a major security concern. Therefore, understanding and managing data security and compliance in hybrid cloud requires an understanding and management of the entire data lifecycle — from generation to disposal, as data flows across environments.
Data security is part of a continuum that ranges from tools and processes for low-level infrastructure security to application and end-point security.
When dealing with data security, several requirements must be addressed and consistently managed. To close the loop, these requirements typically involve data classification, data protection, and continuous compliance.
Data classification is where the management of the data lifecycle begins — with defining and assigning labels that capture the sensitivity of data, then broader classifications that may categorize data in public, private, and restricted classes. Finer-grained sub-classifications are then introduced depending on data content (sensitive personal information, Crown Jewel, trade secrets, etc.) or data context (location of the data, regulations, SLAs, etc.).
One of the key challenges in data classification is data sprawl, due to the distributed nature of enterprise IT systems, applications, and end points. While properly classified data across the enterprise is required for compliance, audit, and secure business operations, it is challenging to ensure that every data set — acquired or derived — is properly classified.
Data protection is about defining and implementing of a set of processes and methods to secure data while it is being stored, shared, or used. Determining the appropriate security methods generally depends on the classification of the data and the associated regulatory or business requirements.
Encryption is the bedrock of data security. Picking up the right encryption method depends on the type of data, the level of protection, and the intended use of the encrypted data.
Event stream encryption
Recent developments in hybrid data integration using scalable messaging tools like Apache Kafka have enabled data scientists and developers to move large amounts of data — reliably and in a scalable fashion — and build smarter Event-driven applications that react to instant changes in the underlying data sources. Recent advances in Kafka encryption let users encrypt data both in motion and at rest, at the topic level, while keeping their own keys.
File modular encryption
In other cases, users may need to move data files from one environment to another (e.g., for analytics purposes) and make them available to several users or applications. Instead of generating several encrypted versions of the file for each target user/application, data owners can selectively encrypt different sensitive columns of the same file using different keys and then share only the keys of the relevant columns with the target users/applications. 
Order preserving encryption
There are also situations where users want to move sensitive data from a source database (e.g., on-prem) into a target database (e.g., in public cloud) in order to support some application or perform some analytics without even decrypting the data. Encryption schemes such as order preserving encryption (OPE) allow data owners to encrypt and share ordered data with target users who can run range queries on the encrypted data. 
Fully homomorphic encryption
Encryption schemes such as OPE are useful in many situations but aren’t suitable for the broad variety of analytics and AI applications that require complex operations on data. For example, training a neural network model requires extensive scalar and tensor calculations on the data. This is where recent developments in fully homomorphic encryption (FHE) come to the rescue. FHE schemes enable broader computation directly on encrypted data, producing encrypted results that, when decrypted, match the result of the same computation on the clear data. 
IT Hardware vendors are also adding capabilities to their systems that allow better isolation of data and processing. For example, AMD recently announced Secure Encrypted Virtualization (SEV) — an advanced security feature that encrypts VM memory using a dedicated per-VM key that is generated and managed by an embedded security processor.
This is another approach to data protection that has gained significant traction since its introduction in early 2000 for protecting sensitive data in credit card transactions. Tokenization is used to replace sensitive data elements with surrogate values so that processing is still possible. Depending on the outcome, there are many approaches to tokenization. For example, reversible tokens can be generated so that these can be converted back to their original values, or one-way tokens can be generated to prevent converting back.
There are several use cases where a data owner may want to tokenize data before moving it into another untrusted environment or before sharing it with another IT system because of regulatory requirements (e.g., PCI for payments).
Application testing in untrusted development environments (e.g., public clouds) is one use-case where effective testing requires test data to be as similar as possible to real data while ensuring that it is secure and compliant.
Quantum safe encryption
Data retention is also a key requirement for data security and compliance. With recent advances in Quantum Computing, many researchers anticipate that future computers will be able to break encryption schemes such as RSA and Elliptic Curves. Therefore, it is important to ensure that all enterprise sensitive data in untrusted environments is encrypted with quantum safe encryption. This applies to both data at rest and data in transit, especially as sensitive data and encryption keys are being transferred over networks using TLS (which is based on RSA). The ongoing NIST Post Quantum Crypto competition aims at releasing new standards for QSC in 2022-2024. 
Key lifecycle management
Key lifecycle management is an integral part of any data security solution. As data is encrypted and shared across hybrid environments, and as the number of keys, users, and applications proliferate, it is critical to consistently manage the lifecycle of the encryption keys in a secure, centralized key management system. Data owners can then encrypt sensitive data and manage their own keys. A best data security practice is to use tamper-resistant Hardware Security Modules (HSMs) to generate and manage strong master keys. Those keys can then be used to encrypt the actual keys that are used to encrypt sensitive data objects. 
Continuous security and compliance
Data security is a dynamic challenge because there are always unexpected events that happen in complex systems where both technology and people can be sources of exposure. According to the 2020 Verizon Data Breach Investigation Report, 30% of the data breaches involved internal actors, and 22% of the breaches were due to errors such as misconfigurations and accidental publishing of sensitive data or credentials. Therefore, the data security and compliance posture of any environment needs to be continuously monitored, analyzed, and managed proactively to detect (and possibly predict) potential incidents as early as possible.
Gain insights into threats across hybrid multicloud environments
Continuous monitoring and analysis of the data security posture is absolutely critical in hybrid clouds because of the inherent fragmentation of both the data and the security tools.
As organizations continue to expand their IT in multiple clouds, security teams spend increasingly more time manually integrating, analyzing, and correlating alerts from disparate security tools. Security requires an ongoing, iterative process to contain threats. And threats, by their nature, continually adapt to find gaps in that process.
IBM Cloud Pak® for Security was introduced as a platform for security data integration, built on open technology (Red Hat Openshift) that runs in any environment.
Through its rich and growing set of connectors and integrated analytics, it provides, among other things, federated search and real-time incident cross-correlation capabilities that equip security teams with better threat intelligence.
Finally, while hermetically lock-tight security isn’t attainable, the enterprise is made more secure by a sound strategy and technology that dictates considered actions at every level — especially as enterprise IT spans hybrid multicloud environments.