IBM Federated Learning – machine learning where the data is

Share this post:

Training sophisticated AI models depends on the use of large and high-quality datasets. In enterprises, this data may be spread across different Clouds, application silos, data centers in different countries and subsidiaries – making it difficult to combine and analyze. Data in different locations also may be subject to different regulatory and privacy requirements. Bringing data together into a single repository for training is often not possible or practical. One method to address this is federated learning, a distributed machine learning process in which different parties collaborate to jointly train a machine learning model without the need to share training data with the other parties. However, to date, this has been difficult to deploy in computational environments of enterprises.

IBM Research recently announced the community edition of a framework for federated learning ( at the ICML Federated Learning workshop in a keynote by Rania Khalaf. IBM federated learning focuses on enterprise use cases such as integrating data silos, dealing with customer privacy, regulatory compliance and large amounts of data in different locations. In an enterprise setting, parties to a federated learning process are typically data centers, Cloud instances from different providers, or edge services that host data from machines, trucks or other equipment in the field. IBM Federated Learning provides an architecture that works with enterprise networking and security requirements, integrates well with current machine learning libraries such as Keras, Tensorflow, SK Learn, and RLLib and has simple APIs for federated learning algorithm development as well as for the integration of advanced privacy and secure multi-party computation (SMC) approaches.

IBM Federated Learning also uses an aggregator that coordinates the federated learning process and fuses the local training results into a common model in the way described in Figure 1. An aggregator A and parties P1 to P3 collaborate to train a model, a neural network in this case. Each party has data that will not leave this Cloud instance or edge device in the course of the training process. The aggregator queries parties (Q) about their local results. The parties will perform the local training and then respond with their update (R1, R2 and R3), a vector of weights in the case of a neural network. The aggregator then fuses the results from each party into a merged model, for example by averaging the model weights. This fused model is then distributed to the parties for the next round of training. Details about the execution model of the federated learning process can be found in the White Paper [1] and on the IBM Federated Learning Web site.

Many different federated learning algorithms can be implemented on this execution model. IBM Federated Learning comes with out-of-the-box support for different models types, neural networks, SVMs, decision trees, linear as well as logistic regressors and classifiers, and many machine learning libraries that implement them. Neural networks are typically trained locally, and the aggregator performs the model fusion, which is often a more lightweight operation compared to the local model training. In the case of traditional machine learning models such as decision trees or gradient boosted trees, this might be different. Tree rebalancing or “boosting” a tree ensemble typically is performed at the aggregator. In a decision tree, queries to parties ask for how many data elements correspond to the leaf nodes. The aggregator then rebalances the tree, depending on the answers. In this case the main computational activity is in the aggregator whereas the computation at each party is quite lightweight. This shows that in many cases new federated versions of popular machine learning algorithms are implemented as a pair of the local part and the aggregator part.

Fusion algorithms for neural networks can take different forms. Federated averaging, as discussed above, is universally applicable but not always very effective. Algorithms that consider model fusion as a matching problem [2,3] or consider the structure of a neural network [4] often converge faster on a common model or can even marge a good model in a one-time fusion.  IBM Federated Learning comes with a library of fusion algorithms supporting different use cases. Regulatory requirements often call for more advanced privacy protections, in particular to prevent re-engineering training data from parties’ model updates by the aggregator or from the resulting final model. Learning algorithms can implement differential privacy mechanisms to efficiently manage privacy budgets, such as naïve Bayes[5]. Advanced SMC schemes can be used with IBM Federated Learning, such as showcased in [6], [7] with homomorphic encryption and in [8] using a functional encryption approach. A convenient crypto API enables users to change the cryptographic approach without changing the machine learning program.

While IBM Federated Learning supports this wide range of federated learning algorithms, security and privacy approaches, and machine learning libraries, it is designed in a way to make this complex process manageable in an enterprise. This is facilitated by its modular architecture, shown in figure [2].

This enables a machine learning team to focus on the machine learning aspects of the problem – using the libraries best suited for the problem – while a systems team chooses the right networking and deployment configuration and a security team choosing the right cryptographic approach when needed. This keeps the learning curve flat for fast deployment in an enterprise and allows to fit in well with typical operations model for an enterprise.

IBM Federated Learning also makes it easy for researchers to design and try out new federated algorithms with little effort and benchmark them against the library of existing ones that comes with IBM Federated Learning. New machine learning libraries can be integrated, and researchers can try out novel SMC approaches using this framework, focusing on the specific experiments rather than needing to rebuild the whole stack.


  1. “IBM Federated Learning: an Enterprise Framework White Paper V0.1” H. Ludwig, N. Baracaldo, G. Thomas et al., 2020,
  2. “Statistical model aggregation via parameter matching” M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, and N. Hoang. In Advances in Neural Information Processing Systems, 2019.
  3. “Federated Learning with Matched Averaging” H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni. In International Conference on Learning Representations, 2020.
  4. “Model fusion with Kullback–Leibler divergence”, S. Claici, M. Yurochkin, S. Ghosh, and J. Solomon. In International Conference on Machine Learning, 2020.
  5. “Differential Privacy Library”.
  6. “A Hybrid Approach to Privacy-Preserving Federated Learning.” S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang. and Y. Zhou, in AISec 2019).
  7. “Secure Model Fusion for Distributed Learning Using Partial Homomorphic Encryption”, Changchang Liu, Supriyo Chakraborty, Dinesh Verma, in PADG 2019.
  8. “HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning” Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar and Heiko Ludwig, in AISec 2019.


Sr. Manager AI Platforms Department, IBM Research

Nathalie Baracaldo

Manager of AI Security and Privacy Solutions, IBM Research

Gegi Thomas

Senior Engineer, IBM Research

More AI stories

Using iter8 and Kiali to evolve your cloud applications while gaining insights into their behavior

IBM Research has partnered with Red Hat to bring iter8 into Kiali. Iter8 lets developers automate the progressive rollout of new microservice versions. From Kiali, developers can launch these rollouts interactively, watch their progress while iter8 shifts user traffic to the best microservice version, gain real-time insights into how competing versions (two or more) perform, and uncover trends on service metrics across versions.

Continue reading

Rethinking quantum systems for faster, more efficient computation

As we looked closer at the kinds of jobs our systems execute, we noticed a richer structure of quantum-classical interactions including multiple domains of latency. These domains include real-time computation, where calculations must complete within the coherence time of the qubits, and near-time computation, which tolerates larger latency but which should be more generic. The constraints of these two domains are sufficiently different that they demand distinct solutions.

Continue reading

The Open Science Prize: Solve for SWAP gates and graph states

We're excited to announce the IBM Quantum Awards: Open Science Prize, an award totaling $100,000 for any person or team who can devise an open source solution to two important challenges at the forefront of quantum computing based on superconducting qubits: reducing gate errors, and measuring graph state fidelity.

Continue reading