Federated learning is a decentralized approach to training machine learning (ML) models. Each node across a distributed network trains a global model using its local data, with a central server aggregating node updates to improve the global model.
Artificial intelligence (AI) models require massive volumes of data. These datasets are typically centralized in a single location for model training, opening up opportunities for any personally identifiable information (PII) contained in the datasets to be exposed during transmission or storage.
Federated learning helps address these concerns as sensitive information remains on the node, preserving data privacy. It also allows for collaborative learning, with varied devices or servers contributing to the refinement of AI models.
Federated learning involves 4 main stages:
● Initialization
● Local training
● Global aggregation
● Iteration
Federated learning starts with initializing a global machine learning model on a central server. This model is the basis from which the federated learning process begins.
The central server distributes the global model to connected client nodes, which can be other servers or edge devices such as smartphones and Internet of Things (IoT) devices. It also relays relevant information, including configuration variables such as hyperparameters and the number of epochs or complete passes through the training data.
Upon receiving the global model and all the necessary details, each client node proceeds with training. The training process is akin to any neural network, with client nodes training the model using only their on-device or local data.
When they’ve completed the number of epochs, client nodes transmit the updated model parameters or gradients to the central server—no fully trained local models or raw data are sent back.
The central server aggregates all the client node updates. There are different forms of aggregation, but a common method is federated averaging, which calculates the weighted average of all updates. These combined updates are then incorporated into the global model.
The central server again distributes the new global model to connected client nodes, and the federated learning process repeats until the model reaches full convergence or is fully trained.
Federated learning can vary based on the structure of datasets or the nature of client nodes. It’s typically classified into these categories:
● Cross-device
● Cross-silo
● Horizontal
● Vertical
Cross-device federated learning uses devices with volatile connectivity and limited computing resources, such as mobile phones and IoT devices. This type of federated learning needs to account for unreliable network connections, and because client nodes can only handle small datasets, many devices will usually be required for local training.1
E-commerce companies, for example, can train a recommendation engine on user data across multiple devices to deliver more personalized product recommendations.1
Unlike the cross-device federated learning approach, cross-silo entails a limited number of servers or data centers with stable connectivity and computational resources powerful enough to store and process huge volumes of data. Client nodes are treated as silos holding personal data, and this data must not leave the system or be shared externally due to privacy concerns.1
Cross-silo federated learning can be valuable in industries such as finance and healthcare. For instance, a consortium of hospitals can train a shared model on their own patient data to enhance the diagnosis or prediction of certain diseases. Similarly, a coalition of banks can train a common machine learning algorithm using their own transaction records to improve fraud detection.1
In horizontal federated learning, client node datasets share the same features or structure but have different samples. For instance, clinics can train a shared analytical model because each one has the same variables for their clinical trial data but distinct values for the patients involved in the trials.
Conversely, vertical federated learning involves client node datasets that share the same samples but have a different structure or features. For example, a retailer and a bank might enter into a partnership for more personalized customer offers, and they can train a common recommendation engine because they might have the same customer data but varied purchasing and financial information.
The decentralized nature of federated learning offers these key advantages:
● Efficiency
● Enhanced data privacy
● Improved compliance
Federated learning eliminates the need to access or transfer large datasets. This leads to decreased latency and a reduction in the required bandwidth for training machine learning models.
The privacy-preserving architecture of federated learning systems means that sensitive data never leaves a device. This helps minimize the risk of cyberattacks or data breaches.
Most federated learning systems also implement cryptographic techniques including differential privacy and secure multiparty computation (SMPC) to boost data privacy.
Differential privacy adds noise to model updates before transmitting them to the central server, while SMPC allows the central server to carry out secure aggregation computations on encrypted model updates. These methods make it difficult to reverse engineer or distinguish which client node contributed an update, strengthening data security.
Because data is kept and processed locally, federated learning can help enterprises comply with data protection regulations. Compliance is crucial for sectors such as finance and healthcare, which handle private data.
Federated learning signifies a transformative shift in training AI models, but it also comes with limitations. Here are some challenges associated with federated learning:
● Adversarial attacks
● Communication overhead
● Heterogeneity
Federated learning is vulnerable to data poisoning attacks, where threat actors inject malicious data during local training or alter model updates for transmission to compromise or corrupt the central model.
Anomaly detection, adversarial training, strict access controls and other security measures can help safeguard against these attacks.
Regular exchanges between client nodes and the central server can result in substantial bottlenecks. For better communication efficiency, consider strategies such as compressing model updates before transmission, quantization and sparsification to relay a subset of the updates or only essential updates. These strategies must be balanced with any accompanying decrease in accuracy.
Federated learning’s decentralized design can bolster data diversity that can help mitigate bias. However, this also means that data is not identically distributed and can be imbalanced. Some devices might have more data than others, skewing the global model toward these data-heavy nodes.
A few ways to address this statistical heterogeneity include sampling methodologies or techniques that factor in variation in distribution, clustering nodes with similar data distributions during model training and optimization algorithms such as FedProx, which is targeted for heterogeneous networks.
Systems heterogeneity is also an issue, with devices having different computing capabilities. Adaptive local training can be applied to tailor model training according to what a node can handle.
Federated learning holds the promise of helping solve real-world problems, with organizations joining forces even across borders and geographical regions. Here are some industries that can benefit from federated learning:
● Finance
● Healthcare
● Retail and manufacturing
● Urban management
Financial institutions can work together to diversify data for credit risk assessment models, allowing better credit access for underserved groups. They can also use federated learning to provide more personalized banking and investment advice, thereby improving the user experience.
Hospitals and research institutions can train shared deep learning models that aid in drug discovery for rare diseases. Federated learning systems can also assist in finding better treatment strategies and enhancing patient outcomes for underrepresented communities.
Retailers can use federated learning to track sales and inventory across multiple locations without revealing any customer data, allowing them to maximize stock levels and lessen waste. Meanwhile, manufacturers can aggregate data from different parts of the supply chain to optimize logistics.
Smart cities can take advantage of federated learning to glean insights from the myriad devices and sensors scattered around urban areas while keeping resident data private. These insights can be used to better direct traffic, for instance, or to monitor environmental conditions such as air and water pollution.
Implementing federated learning for real-world applications can be complex, but several frameworks exist to train models on decentralized data and streamline server and client workflows. Here are some popular federated learning frameworks:
● Flower
● IBM Federated Learning
● NVIDIA FLARE
● OpenFL
● TensorFlow Federated
Flower is an open source framework for collaborative AI and data science. It can be used to craft federated AI systems with numerous connected clients. It’s compatible with most machine learning frameworks and interoperable with various hardware platforms and operating systems.
IBM Federated Learning is a framework for federated learning in enterprise environments. It works with various machine learning algorithms, including decision trees, Naïve Bayes classifiers, neural networks and reinforcement learning.
IBM Federated Learning also comes with a rich library of fusion methods for combining model updates and supports various fairness techniques to help combat AI bias.
NVIDIA FLARE (Federated Learning Application Runtime Environment) is an open source and domain-agnostic software development kit for federated learning.
It has built-in training and evaluation workflows, privacy-preserving algorithms and learning algorithms for federated averaging and FedProx. NVIDIA FLARE also has management tools for orchestration and monitoring.
OpenFL is a Python-based open source federated learning framework originally created by Intel and now under The Linux® Foundation. OpenFL works with deep learning frameworks such as PyTorch and machine learning libraries including TensorFlow. Its security features include differential privacy and support for hardware-based trusted execution environments.
TensorFlow Federated (TFF) is an open source framework developed by Google for machine learning on decentralized data. TFF’s application programming interfaces (APIs) are divided into 2 layers:
● Federated Learning API is the high-level layer that facilitates implementing federated learning tasks such as training or evaluation using existing machine learning models.
● Federated Core API is the low-level layer for building new federated learning algorithms.
All links reside outside ibm.com
1 Cross-silo and cross-device federated learning on Google Cloud, Google Cloud, 3 June 2024.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to confidently incorporate generative AI and machine learning into your business.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Learn how to select the most suitable AI foundation model for your use case.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.