Table of contents

Creating a Federated Learning experiment

Learn how to create a Federated Learning experiment to train a machine learning model with data from different users with their own data sources.

Tech preview This is a technology preview and is not yet supported for use in production environments.

Note: Because Federated Learning is a collaborative approach to training a common model from a set of remote data sources, some of the steps are done by the admin and others by the contributing parties. Unless otherwise noted, the steps in this topic are performed by the admin, who creates the experiment and coordinates the training.

Before you begin

Before you begin, make sure that you have:

  • An IBM Watson Studio project.
  • An untrained model file, in pickled or SavedModel format. The untrained model file is a “blank” slate of your model framework that is trained in the Federated Learning experiment. To see example code to generate an untrained model, see Scikit-learn model configuration and Tensorflow 2 model configuration. The model and generated components must be compressed into a zip file, for example, see this untrained model file.

Tip: When you build your experiment, you choose your model’s framework, fusion method, hyperparameters, and you must customize a data handler. You can review the options in Choosing your framework, fusion method, and hyperparameters and in Data handler examples.

Overview: Create a Federated Learning experiment

In the following sections, you will:

Note: The following sections describe creating the Federated Learning experiment from the experiment builder. If you create the experiment programmatically, refer to Considerations for building an experiment using the API for details on setting up the environments and loading the required libraries.

Start the Federated Learning aggregator (for admin)

In this step, we start to create the Federated Learning experiment. The process is in these steps:

Step 1: Set up the Federated Learning experiment

In this step, you set up a Federated Learning experiment from an IBM Watson Studio project using the Federated Learning experiment builder.

  1. From the project, click Add to Project > Federated Learning.
  2. Name the experiment and add an optional description and tags.
  3. In the Configure tab, choose the training framework and model type. See Choosing the framework and Choosing the fusion method for a table of frameworks, fusion methods, and their attributes.
  4. Click Select under Model specification and upload the zip file containing your untrained model.
  5. In the Define hyperparameters tab, you can choose hyperparameter options available for your framework and fusion method to specify how to tune your model. For details, see Choosing your framework, fusion method, and hyperparameters.

Step 2: Create the remote training system

In this step, you create the remote training system that specifies the participating parties of the experiment.

  1. From Select remote training system, click Add new systems.
    Screenshot of Remote Training System UI
  2. Complete the fields:
    • Name: A name to identify this specific instance of the Remote Training System.
    • Description: An optional description for more details of the training system.
    • System administrator: The user who has access to run the current Federated Learning experiment.
    • Allowed users: Users who are allowed to interface with the current Federated Learning experiment training run. Note: For an admin or party to show up they must be collaborators in your project.
    • Tags: Associate keywords with the Remote Training System to make it easier to find.
  3. Click Add. The specific Remote Training System instance is then saved. If you are creating multiple remote training instances, you can repeat these steps.
  4. Click Add systems to save the remote training system as an asset in the Federated Learning system. Note: You can use a remote training definition for future experiments. For example, in the Select remote training system tab, you can select any Remote Training System that you previously created.

Step 3: Start the experiment

In this step, the Federated Learning experiment will generate a party connector script that you must send to the participating parties.

  1. Click Review and create to view the settings of your current Federated Learning experiment. Then, click Create. Screenshot of Review and Create Experiment UI
  2. The Federated Learning experiment will be in Pending status while the aggregator is starting. When the aggregator starts, the status will change to Setup – Waiting for remote systems.

Connecting the remote training parties (for parties)

In this section, the parties must complete all the steps to configure and run the party connector script to train the model by using their data.


  1. Ensure that you are running in a Conda environment. If you are not, run the following code to set up Conda:
     # Create a new conda environment
     conda create -n fl_env python=3.7.10
     # Activate env
     conda activate fl_env
     # Install Watson Machine Learning client
     pip install --upgrade ibm-watson-machine-learning

    Note: If you are running in a Watson Studio or Cloud Pak for Data project, you do not need to set up Conda.

  2. Ensure that you have the proper libraries set up with this code:

     pip install \
     tensorflow-cpu==2.4.2 \
     keras==2.2.4 \
     torch==1.7.1 \
     scikit-learn==0.23.2 \
     numpy \
     scipy \
     environs \
     parse \
     websockets==8.1 \
     jsonpickle==1.4.1 \
     pandas \
     pytest \
     pyYAML \
     requests \
     pathlib2 \
     psutil \
     setproctitle \
     tabulate \
     lz4 \
     opencv-python \
     gym \
     cloudpickle==1.3.0 \
     image \

    Note: To understand better how to set up the environment to run Federated Learning, please see the Federated Learning API Samples.

  3. Ensure that your data is loaded and pre-processed. They need to implement a data handler class in their environment in the same directory as the data set and the party connector script. For a general template of a data handler, see the data handler template.

    This is an example of a data handler for the MNIST data set.

     import numpy as np
     # imports from ibmfl
     from import DataHandler
     from ibmfl.exceptions import FLException
     class MnistKerasDataHandler(DataHandler):
       # Data handler for MNIST dataset.
       def __init__(self, data_config=None, channels_first=False):
             self.file_name = None
             # data_config loads anything inside the info part of the data section. 
             if data_config is not None:
                 # this example assumes the local dataset is in .npz format, so it searches for it.
                 if 'npz_file' in data_config: 
                     self.file_name = data_config['npz_file']
             self.channels_first = channels_first
             if self.file_name is None:
                 raise FLException('No data file name is provided to load the dataset.')
                     data_train = np.load(self.file_name)
                     self.x_train = data_train['x_train']
                     self.y_train = data_train['y_train']
                     self.x_test = data_train['x_test']
                     self.y_test = data_train['y_test']
                 except Exception:
                     raise IOError('Unable to load training data from path '
                                   'provided in config file: ' +
       def get_data(self):
             Gets pre-processed mnist training and testing data. 
             :return: training and testing data
             :rtype: tuple
             return (self.x_train, self.y_train), (self.x_test, self.y_test)
       def preprocess_data(self):
             Preprocesses the training and testing dataset.
             :return: None
             num_classes = 10
             img_rows, img_cols = 28, 28
             if self.channels_first:
                 self.x_train = self.x_train.reshape(self.x_train.shape[0], 1, img_rows, img_cols)
                 self.x_test = self.x_test.reshape(self.x_test.shape[0], 1, img_rows, img_cols)
                 self.x_train = self.x_train.reshape(self.x_train.shape[0], img_rows, img_cols, 1)
                 self.x_test = self.x_test.reshape(self.x_test.shape[0], img_rows, img_cols, 1)
             print('x_train shape:', self.x_train.shape)
             print(self.x_train.shape[0], 'train samples')
             print(self.x_test.shape[0], 'test samples')
             # convert class vectors to binary class matrices
             self.y_train = np.eye(num_classes)[self.y_train]
             self.y_test = np.eye(num_classes)[self.y_test]
  4. Log in to the Watson Studio project and click the Federated Learning experiment. Screenshot of View Setup Information
  5. Click View setup information and download the party connector script for each party by clicking on the “down arrow” icon. Optional: Alternatively, you can choose to download the yml configuration file and send it to the party to run instead of the party connector script. To see details of this method, see Party yml configuration. Download the party connector script.
  6. Each party must configure the party connector script and provide valid credentials. For example:

       from ibm_watson_machine_learning import APIClient
       from import Party
       wml_credentials = {}
       # There are three ways to validate credentials.
       # 1. With username and password
       # wml_credentials = {
       #  "username": "<username>",
       #  "password": "<password>",
       #  "instance_id" : "openshift",
       #  "url": <url>,
       #  "bedrock_url": "<bedrock_url>", # Optional: This field is only needed if the service's attempt to generate `bedrock_url` automatically failed. If your credentials do not contain any errors, you can manually retrieve and add bedrock_url with this command: oc get routes -n ibm-common-services, it is the first string under the resulting column of HOST/PORT.
       #  "version": "4.0"
       # }
       # 2. With API key
       # wml_credentials = {
       #  "username": "<username>",
       #  "apikey": "<apikey>", 
       #  "instance_id" : "openshift",
       #  "url": <url>,
       #  "bedrock_url": "<bedrock_url>", # Optional: This field is only needed if the service's attempt to generate `bedrock_url` automatically failed. If your credentials do not contain any errors, you can manually retrieve and add bedrock_url with this command: oc get routes -n ibm-common-services, it is the first string under the resulting column of HOST/PORT.
       #  "version": "4.0"
       # }
       # 3. With token
       # wml_credentials = {
       # To retrieve the IAM token, you must run these commands:
       # 1. Run: oc get routes -n ibm-common-services, and retrieve the bedrock URL, which is the first string under the resulting column of HOST/PORT.
       # 2. Retrieve the temporary IAM (Identity and Access Management) access token by using an API call:
       # curl -k -X POST -H "Content-Type: application/x-www-form-urlencoded;charset=UTF-8" \
       # -d "grant_type=password&username=<username>&password=<password>&scope=openid" \
       # <bedrock-url>/idprovider/v1/auth/identitytoken
       # 3. Request the bearer token by using the IAM access token
       # curl -k X GET '<Zen-url>/v1/preauth/validateAuth' \
       # -H 'username: admin' \
       # -H 'iam-token: <iam-token>'
       #  "token": <token>, 
       #  "instance_id" : "openshift",
       #  "url": <url>,
       #  "bedrock_url": "<bedrock_url>", # Optional: This field is only needed if the service's attempt to generate `bedrock_url` automatically failed. If your credentials do not contain any errors, you can manually retrieve and add bedrock_url.
       #  "version": "4.0"
       # }
       wml_client = APIClient(wml_credentials)
       party_config = {
         "aggregator": {
           "ip": ""
         "connection": {
           "info": {
             "id": "fb81d844-fbed-4f72-afd8-030917d6bc8f"
         # Supply the name of the data handler class and path to it.
         # The info section may be used to pass information to the 
         # data handler.
         # For example,
         # 	"data": {
         # 		"name": "MnistSklearnDataHandler",
         # 		"path": "example.mnist_sklearn_data_handler"
         # 		"info": {
         #       "npz_file":"./example_data/example_data.npz"
         # 		},
         # 	},
         "data": {        
           "name": "<data handler>",
           "path": "<path to data handler>",
           "info": {
             <file type of local dataset>: <path to data> - path to data file
         "local_training": {
           "name": "XGBRegressorLocalTrainingHandler",
           "path": ""
         "protocol_handler": {
           "name": "PartyProtocolHandler",
           "path": ""
       p = Party( config_dict = party_config, token = "Bearer " + wml_client.wml_token  )
       while not p.connection.stopped:

  1. Optional: You can set log_level if you have not installed the log_config.yaml file. To see how you can install the log_config.yaml file, see Log level configuration.log_level can take these options as parameters:
    • “DEBUG”: Provides information for diagnosing problems or troubleshooting.
    • “INFO”: Standard log level that provides informative messages to ensure that the application is behaving as expected.
    • “WARNING”: Logs behaviours that are unexpected or indicative of potential problems that do not affect the application’s current run.
    • “ERROR”: Logs when the application hits an issue that prevents the application’s functions from working.
    • “CRITICAL”: Reports errors that might prevent the execution of the application.
      For example, p = Party( config_dict = party_config, token = auth_token, log_level: "INFO").
  2. After configuring and saving the script, each party should run the party connector script with this command:

    p = Party(config_dict = party_config, token = auth_token)

Monitor training progress and deploy the trained model (for admin or parties)

After you have completed the configuration for the Federated Learning experiment and each participating party has run the party connector script from the previous step, the experiment starts training automatically. As the experiment trains, you can check the progress of the experiment with the visual graph and table. After the training is complete, you can deploy the model and test it with the new data.

Check the progress of the experiment

You can see a visualization of the training progress. For each round of training, you can view the four stages of training:

  • Sending model: Federated Learning sends the model data to each party.
  • Training: The process of training the collected data locally.
  • Receiving models: When the local data is trained, the results are sent back to the Federated Learning experiment.
  • Aggregating: The aggregator combines the results of all the trained data to produce a final model.

View your results

When the training is complete, a chart that displays the model accuracy over each round of training is drawn. Hover over the points on the chart for more information on a single point’s exact metrics.

A Training rounds table shows details for each training round.

Screenshot of View Setup Information

When you are done with the viewing, click Save model to project to save the Federated Learning model to your project.

Deploy your model

After you save your Federated Learning model, you can deploy and score the model like other machine learning models.

See Deploying models.

Considerations for building an experiment using the API

If you build your experiment entirely programmatically, note these requirements and considerations:

  • In your code, you must set up and load the conda environment
  • You must install the required libraries
    You can see how this is done in this Federated Learning sample of building an experiment using the API.

Learn more