IBM Federated Learning

Federated Learning provides the tools for training a model collaboratively, using a federated set of secure data sources. The data sources are never moved or combined, but they each contribute to training and improving the quality of the common model.

Tech preview notice

This is a technology preview and is not supported for use in production environments.

Attention: The tech preview of Federated Learning experiments is deprecated and support for these experiments might be removed in a future refresh of Cloud Pak for Data 3.5. To use the fully supported Federated Learning component, upgrade to Cloud Pak for Data 4.0 or higher. For details on upgrading, see Upgrading Cloud Pak for Data.

Federated Learning is appropriate for any situation where parties want to leverage their data without sharing their data. For example, an aviation alliance might want to model how a global pandemic impacts airline delays. Each participating party in the federation can use their data to train a common model without ever moving or sharing their data, thus preserving data privacy and security and improves pragmatics. The resulting model can be deployed to provide more accurate predictions for scoring data to give each member of the alliance better results and insights.

This illustration shows how federated parties send data to train the common model without sharing data with each other. The aggregator manages updates to the model.

Federated Learning concept overview
Figure 1: Given the query (Q), each party computes a reply (R) based on their own local data (D) which they send back to the aggregator, where results fuse together as a single Federated Learning model (F).

Federated Learning provides the means to:

Discover different parties for federation
Configure and deploy a Federated Learning experiment
Connect multiple parties to the aggregator of the experiment to share training results

When to use Federated Learning

Federated Learning allows secure model training for large enterprises when the training uses heterogenous data from different sources. The focus is to enable sites with large volumes of data with different format, quality and constraints to be collected, cleaned and trained on an enterprise scale. Another key feature is that Federated Learning also allows you to train large datasets without having to transfer that data to a centralized location, which reduces data privacy risk and computational complexity.

Terminology

Admin: The master user that configures the Federated Learning experiment to specify how many parties are allowed, which frameworks to use, and sets up the Remote Training System. They start the Federated Learning experiment and sees it to the end.
Party: Also called agents, they represent separate users that contribute different sources of data to train collaboratively. Federated Learning ensures the training occurs without raw data being shared across the different parties for data security.
Aggregator: The aggregator fuses the model results between the parties to build one model.
Fusion method: The algorithm that is used to combine the results that the parties return. A training algorithm for federated learning usually contains two parts:
- A fusion module on the aggregator side issues queries to parties, and fuse the ModelUpdate sent by the parties to update the global model within each round.
- A local training module on the party side to invoke operations such as to train local models, update and construct a ModelUpdate based on the latest local model parameters, or party’s metrics depending on the model trained.
Data handler: In IBM Federated Learning, data handlers are used to load and pre-process data. It also helps to ensure that data collected from multiple sources are formatted uniformly to be trained.
Remote Training System: An asset that stores a connection to a remote server hosting data contribution from a party.

How to use Federated Learning

If you want a quick, hands-on, step-by-step guidance of how to run Federated Learning, please see the Federated Learning Tutorial.

Please see Creating the Federated Learning experiment for high level steps on how to get started with Federated Learning.

Additional resources

Federated Learning provides a list of helper functions to facilitate the data preparation process. See the API documentation to learn more about interacting with Federated Learning APIs. All of the current helper functions assume that the input data is of type numpy.ndarray.
Interested in Providing Feedback? Fill out a survey here.