Develop a hyperparameter search plugin

Learn how to implement a hyperparameter search algorithm that uses the random search algorithm.

Create a search plugin

  1. Create a Python file named optimizer.py, which will act as the entrance for WML Accelerator to call the plugin.
  2. In optimizer.py, create a class named PluginOptimizer to extend the BasePluginOptimizer, for example:
    from plugins.core.logger import logger
    from plugins.core.base_plugin_opt import BasePluginOptimizer
    class PluginOptimizer(BasePluginOptimizer):
        def __init__(self, name, hyper_parameters, **kwargs):
            super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs)
        def search(self, number_samples, last_exp_results):
            exp_list = []
            #### implement your algorithm logic here ###
            return exp_list
    
The class name PluginOptimizer must not be changed. All plugin classes must extend BasePluginOptimizer. It is recommended that you only make minimal changes to the search method and only override other functions if necessary.
Table 1. BasePluginOptimizer functions. Detailed description of the BasePluginOptimizer functions in WML Accelerator
Function Parameters Return Description Required
__init__
  • name, string, plugin optimizer name
  • hyper_parameters, list, hyper parameters that need to be tuned
  • kwargs, dict, algorithm parameters
/ init is called once the plugin is initialized.

The hyperparameters and algorithm parameters are defined in the task submission rest body and are passed to the init function.

false
search

- number_samples, integer, number of hyperparameter candidates requested

- last_exp_results, list, the execution results of last suggested hyperparameter sets

- hyper_params, list, suggested hyper-parameter sets to run

User must implement this search function.

For each search loop, dlpd daemon will call the search function to compute next hyper-parameter candidates, and then starts training workload for each hyper-parameter set. After all the trainings are done, the training scores(loss/accuracy) will be passed to next round of search.

true
get_state / - state_dict, dict, the algorithm states to be saved get_state automatically be called AFTER calling the search function to save the plugin algorithm internal states. The saved states will be passed to the next set_state function for algorithm status recovery (see section 5.2.4 for details). false
set_state - state_dict, dict, the algorithm states to be recovered / set_states automatically be called BEFORE calling the search function to restore the plugin algorithm internal states (see section 5.2.4 for details). false
Note: Avoid using save and restore as function names in the PluginOptimizer class as they already exist. The save and restore are functions are reserved for use by BasePluginOptimizer to handle the save and restore logic.

Implement random search logic

Random Search algorithm sets up a search space of hyperparameter values and selects random combinations as the next training candidates. In the search function, you need to parse the hyper_parameters parameter and search the value space for each hyperparameter.
  1. In the init function, save the hyper_parameters parameter as an instance variable _hyper_parameters, so it can be used in the search function. Also, create an instance variable _exp_history to store all the experiment history results. You do not need to save the experiment history for the random search algorithm. However, for other algorithms such as Bayesian search, it is required to compute new experiment candidates based on the history results.
        def __init__(self, name, hyper_parameters, **kwargs):
            super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs)
            
            logger.info("all tuning hyper parameters: \n{}".format(hyper_parameters)) # get all hyper parameters that need to be tuned
            self._hyper_parameters = hyper_parameters
            self._exp_history = []
     
    
    The output format of hyper_parameters:
    [
           {
               'name': 'required, string, hyperparameter name, the same name will be used in the config.json so user model can load it',
               'type': 'required, string, one of Range, Discrete',
               'dataType': 'required,string, one of INT, DOUBLE, STR,
               'minDbVal': 'double, required if type=Range and datatype=double',
               'maxDbVal': 'double, required if type=Range and datatype=double',
               'minIntVal': 'int, required if type=Range and datatype=int',
               'maxIntVal': 'int, required if type=Range and datatype=int',
               'discreteDbVal': 'double, list like [0.1, 0.2], required if type=Discrete and dataType=double',
               'discreteIntVal': 'int, list like [1, 2], required if type=Discrete and datatype=int',
               'discreateStrVal': 'string, list like ['1', '2'], required if type=Discrete and datatype=str',
               'power': 'a number value in string format, the base value for power calculation. ONLY valid when type is Range',
               'step': 'a number value in string format, step size to split the Range space. ONLY valid when type is Range',
               'userDefined': 'boolean, indicate whether the parameter is a user defined parameter or not'
           }
    ]
    
    An example output of the above code:
    all tuning hyper parameters:
    [{'name': 'base_lr', 'type': 'Range', 'dataType': 'DOUBLE', 'minDbVal': 0.01, 'maxDbVal': 0.1, 'userDefined': False}]
    
  2. Implement the search function:
        def search(self, number_samples, last_exp_results):
     
            logger.info("last exps results:\n{}".format(last_exp_results))
            if not last_exp_results is None and len(last_exp_results) > 0:
                self._exp_history.extend(last_exp_results)
            
            # start random search of the hyper-parameters
            exp_list = []
            for i in range(number_samples):
                hypers = {}
                for hp in self._hyper_parameters:
                    type = hp.get('type')
                    if type == "Range":
                        val = self._getRandomValueFromRange(hp)
                    elif type == "Discrete":
                        val = self._getRandomValueFromDiscrete(hp)
                    else:
                        raise Exception("un-supported type {} for random search.".format(type))
                    hypers[hp.get('name')] = val
                exp_list.append(hypers)
                
            logger.info("suggest next exps list:\n{}".format(exp_list))
            return exp_list
    
  3. Continue to implement the _getRandomValueFromRange and the _getRandomValueFromDiscrete functions:
        def _getRandomValueFromRange(self, hp):
     
            data_type = hp.get('dataType')
            if data_type == "DOUBLE":
                val = hp.get('minDbVal') + np.random.rand() * (hp.get('maxDbVal') - hp.get('minDbVal'))
            elif data_type == "INT":
                val = np.random.randint(hp.get('minIntVal'), hp.get('maxIntVal'))
            else:
                raise Exception("un-supported data type {} for random range search.".format(data_type))
            
            logger.debug("next {} val: {}".format(hp.get('name'), val))
            return val 
     
        def _getRandomValueFromDiscrete(self, hp):
                    
            data_type = hp.get('dataType')
            if data_type == "DOUBLE":
                vals = hp.get('discreteDbVal')
            elif data_type == "INT":
                vals = hp.get('discreteIntVal')
            else:
                vals = hp.get('discreateStrVal')
            val = vals[np.random.randint(len(vals))]
            
            logger.debug("next {} val: {}".format(hp.get('name'), val))
            return val
    
    An example output of the code of 2 and 3, with number_samples=1:
    last exps results:
    [{'id': 0, 'score': 3.593962, 'hyperparameters': {'base_lr': 0.08849518263874222}}]
    next base_lr val: 0.09288991388261642
    suggest next exps list:
    [{'base_lr': 0.09288991388261642}]
    
    Note: The returned exp_list is a list of hyperparameter key-value dictionaries. Each hyperparameter key-value dictionary must include all hyperparameters that need to be tuned, otherwise an Exception is thrown.

Handling algorithm parameters

A tuning algorithm can have some parameters that allow users to specify when submitting each tuning task. To demonstrate this, the random_seed parameter was added to the random search algorithm.
  1. Specify the random_seed parameter when submitting a task, by adding the below configuration to the algoDef part of the rest body:
    "algoDef": {
    …,
    "algoParams": [{
    "name": "random_seed",
    "value": 2
                }]
          }
    
  2. Parse the algorithm parameters in the init function:
        def __init__(self, name, hyper_parameters, **kwargs):
            super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs)
            
            logger.debug("all tuning hyper parameters: \n{}".format(hyper_parameters)) # get all hyper parameters that need to be tuned
            self._hyper_parameters = hyper_parameters
            self._exp_history = []
     
      # get all optimizer search parameters that user passed
            logger.info("all optimizer search parameters: \n{}".format(kwargs))
     
            # get optimizer parameters, the parameters value is string
            if kwargs.get('random_seed'):
                self._random_seed = int(kwargs.get('random_seed'))
                np.random.seed(self._random_seed)
       
    
    An example output of the add code:
    all optimizer search parameters:
    {'random_seed': '2'}                                      
    

Handling algorithm internal states

User tuning algorithm could have some internal variables that are shared and re-used between each search loop. For better demonstrating this ability, we designed the random search algorithm to reuse the random state between each search call. Each time search begins, it will first recover from the last random states, and then perform the random search based on the states. If the random_seed is set before, we expect the proposed random hyperparameter sequence to be the same.

To reach this purpose, you need to define get_state and set_state functions to implement the algorithm states save and restore behavior. The get_state function is automatically called AFTER calling the search function to save the plugin algorithm internal states. If there are previously saved states (i.e, not the first round of search), the set_states function will automatically be called BEFORE calling the search function to restore the plugin algorithm internal states.

The following is an example of an implementation of the get_state and set_state functions. Note that in the get_state function, you only need to define a key-value dictionary that includes all states to be saved, the hyperparameter plugin module handles the remaining state persist logic. In the get_state function, a previously saved state dictionary is passed in, you need to use the state dictionary to recover the algorithm status.
    def get_state(self):
        return {'rng_state': np.random.get_state()}
    
 
    def set_state(self, state_dict):
        np.random.set_state(state_dict.get('rng_state'))

Debugging the plugin algorithm: Check plugin algorithm execution logs

Debugging the plugin algorithm, due the following:
  1. We recommended that you use the plugins.core.logger module to print logs in the plugin algorithm code. Using this module ensures that the logs are printed to the intended location.
  2. If you specify logLevel when installing plugins algorithm (section 5.1.1), then that is the setting that will be used. Otherwise, the log level INFO will be used.
  3. Make sure the dependent python module glog is installed in your conda environment.
  4. To use plugins.core.logger:
    from plugins.core.logger import logger
     
    logger.info("This is the INFO log.")
    logger.debug("This is the DEBUG log.")
    
  5. Check the plugin algorithm execution logs:
    • If remote execution mode is disabled, check the plugin algorithm logs in ${EGO_TOP}/dli/${DLI_VERSION}/dlpd/logs/dlpd.log.

    • · If remote execution mode is enabled, check the plugin algorithm logs in the corresponding Spark application log under ${SPARK_HOME}/logs/<appID>, or from the cluster management console:
      1. Log on to the cluster management console, select Workload > Instance Group, and click the instance group name.
      2. In the Applications tab, click the application that runs the plugin.
      3. Click the Drivers and Executors tab, and download the executor stderr log.

Debugging the plugin algorithm: Debug plugin algorithm code without submitting HPO task

  1. Create a debug_work_directory folder as your root debug working directory, then put your plugin algorithm scripts under it like below. The algo_name folder is your root plugin algorithm directory.
    ${debug_work_directory}
    |-- ${algo_name}
    |-- optimizer.py
    
  2. To submit the debug request, send below debug request with your HPO task submission body. This creates a task_attr.pb file under debug_work_directory for debugging.
    cd ${debug_work_directory}
    curl -k -o task_attr.pb -X POST \
    --header 'Authorization: Bearer <jwt_token>'
    --header 'Content-Type: application/json' \
    'https://${wmla_console_route}/platform/rest/deeplearning/v1/hypersearch/algorithm/debug' \
    --data '{"hpoName": "hpo-task-name" ...}' # full HPO task submit request body including hyperParams settings 
  3. Source the conda environment, run the following command to debug your plugin algorithm scripts:
    > export PYTHONPATH=${debug_work_directory}:$PYTHONPATH
    > export PYTHONPATH=${DLI_SHARE_TOP}/tools/tune:$PYTHONPATH
    > export PYTHONPATH=${DLI_SHARE_TOP}/tools/plugins/core:$PYTHONPATH
    > export HPO_PLUGIN_LOG_LEVEL=DEBUG
    > python ${DLI_SHARE_TOP}/tools/tune/plugins/plugin_launcher.py --attribute_file ${debug_work_directory}/task_attr.pb --number_samples 1 --output_file ${debug_work_directory}/new_exps.pb --algorithm_name ${algo_name} --work_dir ${debug_work_directory}
    
Note:
  • The algo_name parameter in the command should be same as the name of the plugin algorithm folder under debug_work_directory.
  • If you experience the “UnboundLocalError: local variable 'opt' referenced before assignment” error when executing the command, try removing the task_attr.pb file and re-run step 2 again.