IBM Federated Learning – machine learning where the data is

Share this post:

Training sophisticated AI models depends on the use of large and high-quality datasets. In enterprises, this data may be spread across different Clouds, application silos, data centers in different countries and subsidiaries – making it difficult to combine and analyze. Data in different locations also may be subject to different regulatory and privacy requirements. Bringing data together into a single repository for training is often not possible or practical. One method to address this is federated learning, a distributed machine learning process in which different parties collaborate to jointly train a machine learning model without the need to share training data with the other parties. However, to date, this has been difficult to deploy in computational environments of enterprises.

IBM Research recently announced the community edition of a framework for federated learning ( at the ICML Federated Learning workshop in a keynote by Rania Khalaf. IBM federated learning focuses on enterprise use cases such as integrating data silos, dealing with customer privacy, regulatory compliance and large amounts of data in different locations. In an enterprise setting, parties to a federated learning process are typically data centers, Cloud instances from different providers, or edge services that host data from machines, trucks or other equipment in the field. IBM Federated Learning provides an architecture that works with enterprise networking and security requirements, integrates well with current machine learning libraries such as Keras, Tensorflow, SK Learn, and RLLib and has simple APIs for federated learning algorithm development as well as for the integration of advanced privacy and secure multi-party computation (SMC) approaches.

IBM Federated Learning also uses an aggregator that coordinates the federated learning process and fuses the local training results into a common model in the way described in Figure 1. An aggregator A and parties P1 to P3 collaborate to train a model, a neural network in this case. Each party has data that will not leave this Cloud instance or edge device in the course of the training process. The aggregator queries parties (Q) about their local results. The parties will perform the local training and then respond with their update (R1, R2 and R3), a vector of weights in the case of a neural network. The aggregator then fuses the results from each party into a merged model, for example by averaging the model weights. This fused model is then distributed to the parties for the next round of training. Details about the execution model of the federated learning process can be found in the White Paper [1] and on the IBM Federated Learning Web site.

Many different federated learning algorithms can be implemented on this execution model. IBM Federated Learning comes with out-of-the-box support for different models types, neural networks, SVMs, decision trees, linear as well as logistic regressors and classifiers, and many machine learning libraries that implement them. Neural networks are typically trained locally, and the aggregator performs the model fusion, which is often a more lightweight operation compared to the local model training. In the case of traditional machine learning models such as decision trees or gradient boosted trees, this might be different. Tree rebalancing or “boosting” a tree ensemble typically is performed at the aggregator. In a decision tree, queries to parties ask for how many data elements correspond to the leaf nodes. The aggregator then rebalances the tree, depending on the answers. In this case the main computational activity is in the aggregator whereas the computation at each party is quite lightweight. This shows that in many cases new federated versions of popular machine learning algorithms are implemented as a pair of the local part and the aggregator part.

Fusion algorithms for neural networks can take different forms. Federated averaging, as discussed above, is universally applicable but not always very effective. Algorithms that consider model fusion as a matching problem [2,3] or consider the structure of a neural network [4] often converge faster on a common model or can even marge a good model in a one-time fusion.  IBM Federated Learning comes with a library of fusion algorithms supporting different use cases. Regulatory requirements often call for more advanced privacy protections, in particular to prevent re-engineering training data from parties’ model updates by the aggregator or from the resulting final model. Learning algorithms can implement differential privacy mechanisms to efficiently manage privacy budgets, such as naïve Bayes[5]. Advanced SMC schemes can be used with IBM Federated Learning, such as showcased in [6], [7] with homomorphic encryption and in [8] using a functional encryption approach. A convenient crypto API enables users to change the cryptographic approach without changing the machine learning program.

While IBM Federated Learning supports this wide range of federated learning algorithms, security and privacy approaches, and machine learning libraries, it is designed in a way to make this complex process manageable in an enterprise. This is facilitated by its modular architecture, shown in figure [2].

This enables a machine learning team to focus on the machine learning aspects of the problem – using the libraries best suited for the problem – while a systems team chooses the right networking and deployment configuration and a security team choosing the right cryptographic approach when needed. This keeps the learning curve flat for fast deployment in an enterprise and allows to fit in well with typical operations model for an enterprise.

IBM Federated Learning also makes it easy for researchers to design and try out new federated algorithms with little effort and benchmark them against the library of existing ones that comes with IBM Federated Learning. New machine learning libraries can be integrated, and researchers can try out novel SMC approaches using this framework, focusing on the specific experiments rather than needing to rebuild the whole stack.


  1. “IBM Federated Learning: an Enterprise Framework White Paper V0.1” H. Ludwig, N. Baracaldo, G. Thomas et al., 2020,
  2. “Statistical model aggregation via parameter matching” M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, and N. Hoang. In Advances in Neural Information Processing Systems, 2019.
  3. “Federated Learning with Matched Averaging” H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni. In International Conference on Learning Representations, 2020.
  4. “Model fusion with Kullback–Leibler divergence”, S. Claici, M. Yurochkin, S. Ghosh, and J. Solomon. In International Conference on Machine Learning, 2020.
  5. “Differential Privacy Library”.
  6. “A Hybrid Approach to Privacy-Preserving Federated Learning.” S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang. and Y. Zhou, in AISec 2019).
  7. “Secure Model Fusion for Distributed Learning Using Partial Homomorphic Encryption”, Changchang Liu, Supriyo Chakraborty, Dinesh Verma, in PADG 2019.
  8. “HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning” Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar and Heiko Ludwig, in AISec 2019.


Sr. Manager AI Platforms Department, IBM Research

Nathalie Baracaldo

Manager of AI Security and Privacy Solutions, IBM Research

Gegi Thomas

Senior Engineer, IBM Research

More AI stories

Moving beyond the self-reported scale: Objectively measuring chronic pain with AI

Together with Boston Scientific, we are presenting research that details the feasibility and progress towards our new pain measurement method at the 2021 North American Neuromodulation Society Annual Meeting.

Continue reading

How the world’s first smartwatch inspired cutting-edge AI 

Between 2000 and 2001, IBM Research made headlines when it launched an internet-enabled designer watch running Linux, an open-source operating system. Dubbed WatchPad, its aim was to demonstrate the capabilities of the then-novel OS for mobile and embedded devices.

Continue reading

Who. What. Why. New IBM algorithm models how the order of prior actions impacts events

To address the problem of ordinal impacts, our team at IBM T. J. Watson Research Center has developed OGEMs – or Ordinal Graphical Event Models – new dynamic, probabilistic graphical models for events. These models are part of the broader family of statistical and causal models called graphical event models (GEMs) that represent temporal relations where the dynamics are governed by a multivariate point process.

Continue reading