Develop a hyperparameter search plugin
Learn how to implement a hyperparameter search algorithm that uses the random search algorithm.
Create a search plugin
- Create a Python file named optimizer.py, which will act as the entrance for WML Accelerator to call the plugin.
- In optimizer.py, create a class named
PluginOptimizerto extend theBasePluginOptimizer, for example:from plugins.core.logger import logger from plugins.core.base_plugin_opt import BasePluginOptimizer class PluginOptimizer(BasePluginOptimizer): def __init__(self, name, hyper_parameters, **kwargs): super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs) def search(self, number_samples, last_exp_results): exp_list = [] #### implement your algorithm logic here ### return exp_list
| Function | Parameters | Return | Description | Required |
| __init__ |
|
/ | init is called once the plugin is initialized. The hyperparameters and
algorithm parameters are defined in the task submission rest body and are passed to the |
false |
| search |
- number_samples, integer, number of hyperparameter candidates requested - last_exp_results, list, the execution results of last suggested hyperparameter sets |
- hyper_params, list, suggested hyper-parameter sets to run |
User must implement this search function. For each search loop, dlpd daemon will call the search function to compute next hyper-parameter candidates, and then starts training workload for each hyper-parameter set. After all the trainings are done, the training scores(loss/accuracy) will be passed to next round of search. |
true |
| get_state | / | - state_dict, dict, the algorithm states to be saved | get_state automatically be called AFTER calling the search function to save the plugin algorithm internal states. The saved states will be passed to the next set_state function for algorithm status recovery (see section 5.2.4 for details). | false |
| set_state | - state_dict, dict, the algorithm states to be recovered | / | set_states automatically be called BEFORE calling the search function to restore the plugin algorithm internal states (see section 5.2.4 for details). | false |
PluginOptimizer class as they already exist. The save and restore are functions are
reserved for use by BasePluginOptimizer to handle the save and restore
logic.Implement random search logic
- In the
initfunction, save the hyper_parameters parameter as an instance variable _hyper_parameters, so it can be used in the search function. Also, create an instance variable _exp_history to store all the experiment history results. You do not need to save the experiment history for the random search algorithm. However, for other algorithms such as Bayesian search, it is required to compute new experiment candidates based on the history results.def __init__(self, name, hyper_parameters, **kwargs): super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs) logger.info("all tuning hyper parameters: \n{}".format(hyper_parameters)) # get all hyper parameters that need to be tuned self._hyper_parameters = hyper_parameters self._exp_history = []The output format of hyper_parameters:[ { 'name': 'required, string, hyperparameter name, the same name will be used in the config.json so user model can load it', 'type': 'required, string, one of Range, Discrete', 'dataType': 'required,string, one of INT, DOUBLE, STR, 'minDbVal': 'double, required if type=Range and datatype=double', 'maxDbVal': 'double, required if type=Range and datatype=double', 'minIntVal': 'int, required if type=Range and datatype=int', 'maxIntVal': 'int, required if type=Range and datatype=int', 'discreteDbVal': 'double, list like [0.1, 0.2], required if type=Discrete and dataType=double', 'discreteIntVal': 'int, list like [1, 2], required if type=Discrete and datatype=int', 'discreateStrVal': 'string, list like ['1', '2'], required if type=Discrete and datatype=str', 'power': 'a number value in string format, the base value for power calculation. ONLY valid when type is Range', 'step': 'a number value in string format, step size to split the Range space. ONLY valid when type is Range', 'userDefined': 'boolean, indicate whether the parameter is a user defined parameter or not' } ]An example output of the above code:all tuning hyper parameters: [{'name': 'base_lr', 'type': 'Range', 'dataType': 'DOUBLE', 'minDbVal': 0.01, 'maxDbVal': 0.1, 'userDefined': False}] - Implement the search
function:
def search(self, number_samples, last_exp_results): logger.info("last exps results:\n{}".format(last_exp_results)) if not last_exp_results is None and len(last_exp_results) > 0: self._exp_history.extend(last_exp_results) # start random search of the hyper-parameters exp_list = [] for i in range(number_samples): hypers = {} for hp in self._hyper_parameters: type = hp.get('type') if type == "Range": val = self._getRandomValueFromRange(hp) elif type == "Discrete": val = self._getRandomValueFromDiscrete(hp) else: raise Exception("un-supported type {} for random search.".format(type)) hypers[hp.get('name')] = val exp_list.append(hypers) logger.info("suggest next exps list:\n{}".format(exp_list)) return exp_list - Continue to implement the _getRandomValueFromRange and the
_getRandomValueFromDiscrete
functions:
def _getRandomValueFromRange(self, hp): data_type = hp.get('dataType') if data_type == "DOUBLE": val = hp.get('minDbVal') + np.random.rand() * (hp.get('maxDbVal') - hp.get('minDbVal')) elif data_type == "INT": val = np.random.randint(hp.get('minIntVal'), hp.get('maxIntVal')) else: raise Exception("un-supported data type {} for random range search.".format(data_type)) logger.debug("next {} val: {}".format(hp.get('name'), val)) return val def _getRandomValueFromDiscrete(self, hp): data_type = hp.get('dataType') if data_type == "DOUBLE": vals = hp.get('discreteDbVal') elif data_type == "INT": vals = hp.get('discreteIntVal') else: vals = hp.get('discreateStrVal') val = vals[np.random.randint(len(vals))] logger.debug("next {} val: {}".format(hp.get('name'), val)) return valAn example output of the code of 2 and 3, with number_samples=1:last exps results: [{'id': 0, 'score': 3.593962, 'hyperparameters': {'base_lr': 0.08849518263874222}}] next base_lr val: 0.09288991388261642 suggest next exps list: [{'base_lr': 0.09288991388261642}]Note: The returned exp_list is a list of hyperparameter key-value dictionaries. Each hyperparameter key-value dictionary must include all hyperparameters that need to be tuned, otherwise an Exception is thrown.
Handling algorithm parameters
- Specify the random_seed parameter when submitting a task, by adding the below configuration to
the algoDef part of the rest
body:
"algoDef": { …, "algoParams": [{ "name": "random_seed", "value": 2 }] } - Parse the algorithm parameters in the init
function:
def __init__(self, name, hyper_parameters, **kwargs): super(PluginOptimizer, self).__init__(name, hyper_parameters, **kwargs) logger.debug("all tuning hyper parameters: \n{}".format(hyper_parameters)) # get all hyper parameters that need to be tuned self._hyper_parameters = hyper_parameters self._exp_history = [] # get all optimizer search parameters that user passed logger.info("all optimizer search parameters: \n{}".format(kwargs)) # get optimizer parameters, the parameters value is string if kwargs.get('random_seed'): self._random_seed = int(kwargs.get('random_seed')) np.random.seed(self._random_seed)An example output of the add code:all optimizer search parameters: {'random_seed': '2'}
Handling algorithm internal states
User tuning algorithm could have some internal variables that are shared and re-used between each search loop. For better demonstrating this ability, we designed the random search algorithm to reuse the random state between each search call. Each time search begins, it will first recover from the last random states, and then perform the random search based on the states. If the random_seed is set before, we expect the proposed random hyperparameter sequence to be the same.
To reach this purpose, you need to define get_state and set_state functions to implement the algorithm states save and restore behavior. The get_state function is automatically called AFTER calling the search function to save the plugin algorithm internal states. If there are previously saved states (i.e, not the first round of search), the set_states function will automatically be called BEFORE calling the search function to restore the plugin algorithm internal states.
def get_state(self):
return {'rng_state': np.random.get_state()}
def set_state(self, state_dict):
np.random.set_state(state_dict.get('rng_state'))
Debugging the plugin algorithm: Check plugin algorithm execution logs
- We recommended that you use the plugins.core.logger module to print logs in the plugin algorithm code. Using this module ensures that the logs are printed to the intended location.
- If you specify logLevel when installing plugins algorithm (section 5.1.1), then that is the setting that will be used. Otherwise, the log level INFO will be used.
- Make sure the dependent python module glog is installed in your conda environment.
- To use
plugins.core.logger:
from plugins.core.logger import logger logger.info("This is the INFO log.") logger.debug("This is the DEBUG log.") - Check the plugin algorithm execution logs:
-
If remote execution mode is disabled, check the plugin algorithm logs in ${EGO_TOP}/dli/${DLI_VERSION}/dlpd/logs/dlpd.log.
-
· If remote execution mode is enabled, check the plugin algorithm logs in the corresponding Spark application log under
${SPARK_HOME}/logs/<appID>, or from the cluster management console:- Log on to the cluster management console, select Workload > Instance Group, and click the instance group name.
- In the Applications tab, click the application that runs the plugin.
- Click the Drivers and Executors tab, and download the executor stderr log.
-
Debugging the plugin algorithm: Debug plugin algorithm code without submitting HPO task
- Create a debug_work_directory folder as your root debug working directory,
then put your plugin algorithm scripts under it like below. The algo_name folder is your root plugin
algorithm
directory.
${debug_work_directory} |-- ${algo_name} |-- optimizer.py - To submit the debug request, send below debug request with your HPO task submission body. This
creates a task_attr.pb file under debug_work_directory for
debugging.
cd ${debug_work_directory} curl -k -o task_attr.pb -X POST \ --header 'Authorization: Bearer <jwt_token>' --header 'Content-Type: application/json' \ 'https://${wmla_console_route}/platform/rest/deeplearning/v1/hypersearch/algorithm/debug' \ --data '{"hpoName": "hpo-task-name" ...}' # full HPO task submit request body including hyperParams settings - Source the conda environment, run the following command to debug your plugin algorithm
scripts:
> export PYTHONPATH=${debug_work_directory}:$PYTHONPATH > export PYTHONPATH=${DLI_SHARE_TOP}/tools/tune:$PYTHONPATH > export PYTHONPATH=${DLI_SHARE_TOP}/tools/plugins/core:$PYTHONPATH > export HPO_PLUGIN_LOG_LEVEL=DEBUG > python ${DLI_SHARE_TOP}/tools/tune/plugins/plugin_launcher.py --attribute_file ${debug_work_directory}/task_attr.pb --number_samples 1 --output_file ${debug_work_directory}/new_exps.pb --algorithm_name ${algo_name} --work_dir ${debug_work_directory}
- The algo_name parameter in the command should be same as the name of the plugin algorithm folder under debug_work_directory.
- If you experience the “UnboundLocalError: local variable 'opt' referenced before assignment” error when executing the command, try removing the task_attr.pb file and re-run step 2 again.