Q: What is FHE technology and how does it work?
FHE technology enables you to perform computations on encrypted data. With FHE, you can encrypt numbers and perform operations like multiplication, addition or rotation within an encrypted vector. When the data owner decrypts the result, it should be the same as if it were done unencrypted (rounded within a certain error tolerance). Only the data owner (the person who encrypted the data) has the private key and can decrypt and observe the result.
Q: Does FHE encrypt data in use? Why should we protect data in use?
Yes, FHE encrypts data in use and helps to ensure confidentiality even if encrypted data is leaked or exfiltrated. Protect data in use to help eliminate the window of vulnerability that is inherent to current approaches in protecting data. Current security architectures for protecting data address data in rest and data in transit - few secure data in use.
Data in use is particularly vulnerable because it has to be decrypted to be processed. Anytime you have decrypted data out in the open, it creates a window of vulnerability that malicious entities can exploit to access sensitive workloads.
Q: What does a scheme mean in FHE?
A scheme is a specific algorithmic method or approach to the underlying mathematics being used to perform the homomorphic calculations.
Q: Is FHE quantum resistant and what does quantum resistant mean?
Yes, FHE is considered to be quantum resistant. Quantum resistant technologies enable organizations to protect their data against quantum computers; there is no advantage in using a quantum processor to break the encryption algorithm. Cryptography is generally built on assumptions about the computational difficulty (or complexity) of solving certain mathematical problems. FHE relies on a different class of difficult problems like Learning with Errors, a well known lattice-based problem widely used in quantum resistant cryptography.
Q: How is FHE different from a Trusted Execution Environment?
With a Trusted Execution Environment (TEE), one can run computations on unencrypted data with a hardware enforced root of trust. A TEE is intended to be secure by design, however if an application running within it had a security issue (insecure code, application code deployed with malicious malware, zero day exploit, etc), your data may be exposed and an attacker might be able to exfiltrate or manipulate your data.
But with FHE, data remains encrypted at all times even during data leakage or exfiltration. FHE (which guarantees data confidentiality at all times) and a TEE (which provides data integrity) can be combined to secure your 'crown jewel' sensitive data.
Q: What are low level primitives?
Low level primitives are the base operations that can be performed directly on ciphertext and include multiplication, addition, and rotation. All higher level functions are implemented in terms of low level primitives, which make them homomorphic as well. More complex operations like operations on matrices or arrays are built upon the foundational primitives.
Q: Would you use FHE in place of data masking and data obfuscation?
You should use FHE if data masking or data obfuscation interfere with the analytics you want to perform. If this is not the case, then data masking is the better option because it would be faster to perform computations on data that is not encrypted.
Q: What is the FHE sponsor user program and how do I sign up?
Sponsor users interact directly with designers and developers to improve the FHE user experience and help shape the future of FHE at IBM. IBM is currently accepting sponsor users at no cost to engage with application developers, data scientists, applied AI teams, crypto leads, and executives from enterprises around the world to refine the FHE user experience.
Please reach out to email@example.com for next steps.
Q: What is lattice cryptography and how does it relate to FHE?
Lattice cryptography is a generic term for constructions of cryptographic primitives that involve lattices. They are generally hard to solve (and thus secure) and it is assumed that they are quantum resistant; even quantum computers find them hard to solve. FHE schemes are based on lattice cryptography and are considered to be quantum resistant.
Q: What cryptography schemes does FHE use and why would I use one over the other?
There are many different cryptography schemes that FHE uses. The leading FHE schemes are BGV, BFV, TFHE and CKKS. They differ by the type of numbers that they are able to operate on.
For example, BGV, BFV and TFHE operate on integers (e.g. doing a database lookup and checking to see if two numbers are the same) while CKKS is best used with real or complex numbers (e.g. AI/ML applications like neural networks).
Error and Bootstrapping
Q: Why does FHE introduce error and wont it ruin my computational results? What is bootstrapping and how does it relate to error?
All FHE schemes rely on a technique called Learning with Errors (LWE). LWE introduces error to the ciphertext to provide security. This error keeps growing as you continue to perform operations on the encrypted data until at some point you can't continue. In order to continue and preserve the computational results, bootstrapping is required.
Bootstrapping reduces the error in the ciphertext that is intentionally added to keep the scheme protected. This allows you, in principle, to continue to do as many computations as you want. However, this operation is very costly to perform and significantly degrades performance.
In the BGV scheme, the error in the ciphertext is actually hidden from the user. When the user decrypts, they don't see the error at all.
In the CKKS scheme, the error is actually encrypted into the data itself. When the user decrypts, they don't receive the exact value, they receive the value plus some error.
Q: What is circuit depth and why is it relevant to FHE?
Circuit depth is the number of operations that you perform in a sequence. It relates to FHE because for schemes that don't support bootstrapping, you have a hard limit on the depth of computations you can perform. For schemes that support bootstrapping, you can implement the operation but it will be costly and it may not be efficient if the circuit depth is too large. In this case, you will have to bootstrap more than once which may cause the computation to be impractical. It's usually much more practical to have a low depth circuit for FHE operations.
Q: Does circuit depth relate to memory consumption?
Yes, circuit depth is directly related to memory consumption. Large circuit depths result in an increase in memory consumption.
Q: Does circuit depth relate to compute time?
Yes, circuit depth is directly related to compute time. The deeper the circuit, the more gates will be present and the more time it will take to perform the computation.
IBM HElayers SDK
Q: What is the IBM HElayers?
HElayers is a Software Development Kit (SDK) that enables the practical and efficient execution of secure AI workloads using an emerging cryptographic technique called Fully Homomorphic Encryption (FHE). Targeted for use by data scientists and application developers, HElayers enables seamless use of advanced privacy preserving techniques without requiring users to understand the cryptographic underpinnings required to efficiently run FHE workloads.
HElayers is written in C++ and includes a Python API that enables application developers and data scientists to use the power of FHE by supporting a variety of analytics such as Linear Regression, Logistic Regression, and Neural Networks. It has been designed with a layered set of capabilities and coupled with appropriate APIs so users can fully utilize the services provided by the SDK. HElayers is delivered as an open platform capable of using the latest FHE implementations for a given use case. It is enabled with patented optimization and performance boosting innovation for computation, AI innovation and use case requirements that facilitate the practical use of a wide variety of AI workloads over FHE data.
Q: What are the different layers in HElayers?
HElayers has four layers. The first layer is the abstraction layer that can wrap an underlying HE library, providing a uniform API to access different libraries, thus allowing to write library-agnostic and scheme-agnostic code (as much as possible). The second layer underneath is the packing algorithms and math layer. This layer contains tools for doing high-level operations such as matrix multiplication and polynomial evaluation. The third layer contains the AI and query tools which enable inference over multiple types of models, neural networks, decision trees, logistic regression, linear regression, and k-means. The fourth and last layer is the use cases layer. This layer contains complete solutions for several applications. Each solution is in a form of a demo providing end-to-end usage implementation on some test data.
Q: What is HElib? What is NTL? How are they related?
HElib is an open source FHE library. It is a core library and API that implements IBM's reference implementation of FHE schemes. HElib implements two schemes: BGV and CKKS. To perform some of the number theory calculations to do FHE, we depend on another open source C++ library called NTL (Number Theory Library). NTL computes with modular arithmetic and deals with large numbers.
Q: What programming languages does IBM support for FHE?
Python and C++ API support is available for HElayers.
Q: What are the hardware and software requirements?
FHE can be implemented entirely in software; there is no hardware dependency. The only software that is required is Docker 19 or 20 and URL access to download the SDK.
In most use cases, for the client side, almost any modern PC, laptop or mobile device has sufficient resources to perform FHE tasks. As for the server side, where the intense encrypted computation happens, it depends on the use case. For serious development and prototyping, we recommend at least 32 GB of RAM, and more than 4 CPU cores to host a development experience suitable for modern developer test and iteration cycles. Performance of our FHE implementations scales well, meaning that more cores and more RAM lead to better the performance.
Q: Is there a trusted hardware requirement for FHE?
No, trusted hardware is not strictly required. However, pairing FHE with a secure runtime could provide data integrity assurances. Because of the malleability property of FHE, anyone with access to the public key could manipulate the encrypted data. Thus, data may be subject to tampering in a completely untrusted environment in some circumstances and under some threat models.
Q: Which Linux distributions are supported today?
Today, we support Ubuntu 20.04.
Q: Is there any special background I should have to be a successful FHE programmer right now?
Today, you may need some additional background in computer science or a related subject to be an effective FHE programmer depending on your use case. If you are a programmer that would like to develop FHE schemes, then you need to be a cryptographer. If you want to be a programmer that works directly with the lower level APIs provided by HElib for free, you need to be familiar with FHE schemes, the special considerations in using them, and the domain that you're working with (e.g. neural networks).
HElayers offers high-level APIs that allow you to solve AI and other problems without requiring background knowledge of FHE. For example, if you're used to working with neural networks in a standard library like Keras, HElayers can provide you with the tools necessary to import and convert your work to homomorphic encryption.
Q: Does FHE require any code or application changes?
Generally, yes FHE requires code changes. Some things that are easy to do with plaintext, or unencrypted data, are difficult to do with homomorphic encryption schemes. For example, a common function like MAX (taking the maximum of two numbers) is easy to compute using plaintext, but very difficult to compute using encrypted data.
Algorithms need to be changed to make them 'HE friendly' so that they are able to be decomposed into three foundational FHE primitives of multiplication, addition and rotation.
For complex operations, if there is no direct implementation feasible due to performance requirements, one must substitute the desired function by polynomial approximation. Over time, the API (Application Programming Interface) library will provide more functions that have been implemented using the three FHE primitives. Our goal is to make the developer experience as comfortable and familiar as possible, while minimizing the number of changes that is needed to implement FHE.
Q: Do you need to approximate computations with polynomials to make it compatible with FHE?
Yes. Most FHE schemes only provide two basic operators: addition and multiplication. This means that everything you compute will have to be a polynomial or approximated using polynomials. The larger the polynomial, the better the approximation. However, having larger polynomials means it will be more difficult to compute. For example, if you have a function involving a square root, division, and MAX operators, it may be very easy to do using plaintext. But for FHE, it will result in a complicated polynomial and will be much slower to compute. This again touches why in principle everything is possible to compute under homomorphic encryption, but in practice, it's not always practical.
Q: How do I ensure that sensitive data is encrypted correctly and that the access to this sensitive data is without any security threat?
In cryptography (specifically quantum safe and public key cryptography), algorithms are based on the computational complexity or 'hardness' of certain problems to solve by classical or quantum computers. 'Hardness' refers to the ability of an algorithm to solve a problem efficiently. FHE is based on a mathematical problem called Learning with Errors that both classical and quantum computers fail at solving efficiently. Like with all public private key cryptography systems, it is important to keep the secret key protected. Interactions with your secret key should only be done from a trusted environment and should not be revealed to anyone except the data owner.
Q: Any special challenges with testing and debugging FHE programs?
There are three main challenges with testing and debugging FHE programs. The first is that because encrypted computations require more resources than unencrypted computations, computations are slower. The second is the additional layer of complexity that is created due to the constraints of working with ciphertext of a fixed size. The last challenge is the error (noise) that needs to be managed that is inherent to FHE operations.
In HElayers, we provide tools to help address these challenges. For example, we offer APIs to measure noise and compare the computation in plaintext with a computation done under encryption. This allows you to track how the noise grows and manage the noise.
Q: What types of AI/ML models do we support today?
We currently support decision trees, logistic regression, linear regression, neural networks (fully connected and convolutional layers), Support Vector Machine (SVM), nearest neighbor, and basic statistics like mean.
Q: What types of activation layers do we support today?
For activation layers, we support polynomial functions. This is something that is not commonly used outside of FHE, but it's something that usually needs to change when trying to make a neural network 'HE friendly.'
Q: What types of SQL commands do we support today?
There are some SQL statements that we are able to support like SELECT, INSERT, UPDATE, DELETE. There are some commands that are quite challenging in terms of performance like JOIN and LIKE operations.
Q: What can I program with FHE? What can’t I program with FHE?
In principle, the 'Fully' in Fully Homomorphic Encryption means that you can compute on any function that you want. However, in practice, some computations are very difficult to the point of being impractical.
FHE computation is most similar to a hardware circuit; we get the data, we run it through a sequence of gates and each gate performs an operation. Because the data remains encrypted and we are not able to see it in the clear, it is difficult to perform branching with FHE (e.g. if the data meets a certain condition, do X, else do Y). As a result, the data must run through all of the gates.
Q: What programming constructs are not available on FHE in terms of control flow?
There is no control flow for FHE - the input arrives and it runs through a circuit involving gates. You can't branch at all no ifs, no loops. It's like one big circuit.
Q: What is the performance and speed of FHE now?
Performance is dependent on the use case and the corresponding implementation. Transactional workloads are generally higher in latency than batch workloads. For many applications, FHE is fast enough now to be used. This is especially the case when FHE enables workloads that could not have been computed in an automated fashion before (due to privacy or data sharing concerns)
The time scales and requirements for things like online transaction fraud detection and medical image analysis are quite different. In the former, we would demand performance of hundreds or thousands of transactions per second, while in the latter a response in a few minutes or hours might be acceptable.
Today, FHE can perform tasks like logistic regression and neural networks with three or four layers in less than a second. When these neural networks become more complicated and have more layers, it will be in the order of minutes. For example, we are able to support a neural network called AlexNet with a latency in the order of minutes while our competitors take about a day and a half.
Q: Are there any special CPUs that are needed?
There are no special CPUs or hardware that is strictly needed to perform FHE operations. However, FHE is computationally intensive and utilizing faster processors with larger cache sizes and memory capacity will result in better performance.
Q: What is the slowdown in terms of CPU I should expect for FHE vs non FHE operations?
It depends on the use case. Generally, encrypted computations require more resources than unencrypted computations. Overhead can be as low as a factor of two, or as high as a factor of 10 (or even larger without careful optimization). If you are processing in batches, you'll get much better performance because the operations are SIMD (Single Instruction, Multiple Data). Note that when you are able to work with batches, you are not just looking at latency, but rather on the amortized times and throughput of an overall system design which can lead to better performance.
Q: Is FHE Turing complete?
FHE is Turing complete because, in theory, it can perform any function that is computable.