Configure HPO Algorithm¶

Supported algorithms¶

Auptimizer supports a number of different HPO algorithms. The names and descriptions are listed below:

Name	Algorithm
passive	Manually run job (for debug purpose)
random	Random search
sequence	Grid Search
spearmint	Spearmint: Bayesian Optimization based on Gaussian Process
bohb	HpBandSter: Bayesian Optimization and HyperBand
hyperopt	Hyperopt: Bayesian Optimization with Tree of Parzen Estimators (TPE)
hyperband	Hyperband: Multi-armed bandit approach
eas	EAS: Efficient Architecture Search by Network Transformation (Illustration purpose)

Use python -m aup.init to set up the experiment configuration interactively.

For finer control, advanced users can change the configuration manually by directly modifying the experiment.json file.

Configuration details¶

Below we cover the most common pieces. For requirements related to specific algorithms, please refer to the respective documentation.

The general structure of the configuration file is as follows:

{
 "proposer": "random",
 "n_samples": 10,
 "random_seed": 1,
 "script": "auto.py",
 "parameter_config": [
   {
     "name": "x",
     "range": [
       -5,
       5
     ],
     "type": "float"
   }
 ],
 "resource": "cpu",
 "resource_args": {
   "save_model": true
 },
 "job_failure": {
   "job_retries": 3,
   "ignore_fail": true
 },
 "n_parallel": 3,
 "target":"min",
 "workingdir:"./"
}

Name	Default	Explanation
proposer	random	hpo method used to propose new hyperparameter values (see below for full list)
n_samples	10	number of jobs to run
script		script to run
n_parallel	1	number of parallel jobs
job_retries	0	number of retries for failed jobs.
ignore_fail	False	whether to continue the experiment if a job fails.
target	max	search for max or min
resource		type of resource to run the experiment, [cpu, gpu, aws, node, passive]
parameter_config	{}	hyperparameter specification (see below)
workingdir	“./”	path to run the script, important for running jobs remotely (SSH/AWS)
resource_args	{}	other parameters to enable features like tracking intermediate results, saving best model, etc (see below)

for parameter_config:

Name	Content
name	name of the hyperparameter variable. Must be the same as used in the training script
range	[min, max] or a list of values
type	float, int, choice types are supported

Minor modifications or changes may be required for each algorithm. These options can be found at the corresponding API pages under aup.Proposer package (see API links below).

for resource_args:

Name	Default	Explanation
save_model	False	whether to save the best performing model (see below)
multi_res_labels	None	a list of additional results to be tracked, e.g. [“flops”, “param”]
track_intermediate_results	False	if true, intermediate results during training epoches will be tracked
early_stop	None	parameters related to early stopping strategies

For details of the early_stop parameter and how to apply early stopping strategies to HPO experiments, please refer to Early Stopping.

resource_args can also include SSH/AWS specific parameters, please refer to Additional runtime configuration for Node/AWS for more details.

Note:

If job_failure is not specified, the experiment will stop whenever a job fails.
For job_retries, preferance is given to a different resource, if multiple resources are available.
For ignore_fail, currently [BOHB, EAS, Hyperband] proposers do not support experiment continuation upon job failure.

Additional functionalities¶

Track intermediate results¶

This feature allows the user to save and track multiple intermediate results at different points during the HPO experiment. Auptimizer still uses the final result as the main result for the HPO algorithm, but saves the intermediate records in the database under the table intermediate_results.

Usage¶

The feature can be used by adding the following parameter to the experiment configuration file:

"resource_args": {
  "track_intermediate_results": true
 }

Then in the training script, aup.print_result(res) should be placed where the user wants the results to be tracked:

def main(*args, **kwargs):
    # model and data preparation
    for epoch in range(n_epochs):
        # training for one epoch
        aup.print_result(res)

In the above example, the intermediate results are returned every epoch. The result at last epoch is regarded as the main result for the user script and is then used by the HPO algorithm.

The intermediate results will be shown on the dashboard if tracked.

Note: It is possible to use multiple results feature in conjunction with intermediate results to track multiple intermediate results as well.

Save the best model¶

This feature allows the user to save the best performing model after running the HPO experiment. This is achieved by running the training script again using the best hyperparamters obtained during HPO the experiment. The model, by default, will be saved to path aup_models/models_<eid>/<user_defined_model_path>.

Usage¶

In order to use this feature, please add the following parameter to the experiment configuration file:

"resource_args": {
  "save_model": true
 }

Depending on whether the @aup_args decorator is used, the training script needs the following additional modifications.

If @aup_args is used, the user needs to define a funtion to save the model, and register this function with aup_save_model. We suggest using this approach if running the experiment on remote machines (SSH/AWS) to be able to correctly locate and retrieve the model saved on the remote machine.

Please see the example below:

# define a function "save_model(model)" to save the model to a user-defined path
def save_model(model):
    os.makedirs('model_train')
    model.save('./model_train/mnist.h5')

@aup_args
def main(*args, **kwargs):
    # training code
    ...
    # register the model saving function with model as argument
    aup.aup_save_model(save_model, model)

    ...

If @aup_args is not used, the user needs to manually check whether the save_model parameter is True in the job’s configuration. The main function should also take save_model and folder_name as arguments. Please see the example below:

def main(*args, **kwargs, save_model=False, folder_name=None):
    # training code
    ...
    if save_model is True:
        # manually locate the path for saving the model
        # this is important if running on remote machines
        path = os.path.join('aup_models', folder_name)

        if os.path.exists('aup_models') is False:
            os.makedirs('aup_models')

        if os.path.exists(path) is True:
            shutil.rmtree(path)

        os.makedirs(path)
        os.chdir(path)

        model.save('./model_train/mnist.h5')
    ...

Return multiple results¶

This feature allows the user to save and track multiple secondary results along with the primary result for the HPO experiment. Auptimizer still uses the main result for the HPO algorithm, but saves the secondary results in the database under the table multiple_results. There is no upper limit on how many secondary results the user can track.

Usage¶

The feature can be used by adding the following parameter to the experiment configuration file:

"resource_args": {
  "multi_res_labels": ["x", "y"]
}

In the above configuration file, x and y are the secondary results the user wants to track and record. The user script would then return the results as a list including the primary result res along with the secondary parameters as follows:

@aup_args
def HPO():
  res = calculate_results()
  return [res, x, y]

In the above example, res is the primary result which is always placed at the first index of the returned list, which will be used by the HPO algorithm. The remaining results are matched directly with the list provided in multi_res_labels. Hence, the length of the returned list from user script is 1 + length of multi_res_labels parameter.

Note: It is possible to use multiple results feature in conjunction with intermediate results to track multiple intermediate results as well.

Pause and resume jobs¶

Serial: optimize parameters by running jobs sequentially
Parallel: optimize parameters by running jobs in parallel
Pause: pause and save current HPO status
Resume: resume previously paused HPO process

Algorithm	Documentation	Serial	Parallel	Pause (save)	Resume
Random	`aup.Proposer.RandomProposer`	✓	✓	✓	✓
Sequence	`aup.Proposer.SequenceProposer`	✓	✓	✓	✓
Passive	`aup.EE.Resource.PassiveResourceManager`	✓	✓	✓	✓
Spearmint	`aup.Proposer.SpearmintProposer`	✓	✓	❌	❌
Hyperopt	`aup.Proposer.HyperoptProposer`	✓	✓	❌	❌
Hyperband	`aup.Proposer.HyperbandProposer`	✓	✓	❌	❌
BOHB	`aup.Proposer.BOHBProposer`	✓	✓	❌	❌
EAS	`aup.Proposer.EASProposer`	✓	❌	❌	❌