Resource Managers

aup.EE.Resource.AbstractResourceManager

Abstract Interface of Resource Managers.

Using get_resource_manager() to create the corresponding object with the following resource type.

For different resource supports, see Set up environment.

APIs

class AbstractResourceManager(connector, n_parallel, *args, **kwargs)[source]

Bases: abc.ABC

Create Resource to run jobs.

Parameters

connector (AbstractConnector) – Connector to database

append_interm_res(jid, interm_res)[source]
append_multiple_results(jid, irid, eid, scores)[source]
early_stop_daemon_fun()[source]
finish(status='FINISHED')[source]

Finish up the resource allocation. :param status: status of the experiment :type status: string

Returns

Max/Min result in experiment (job id, score)

Return type

None | [int, float]

finish_job(jid, score, status=None)[source]

Finish one job

Parameters
  • jid (int) – job ID

  • score (float | None) – job for the experiment

get_available(username, rtype, rid_blacklist=None)[source]

method to get the available resource to run a job

Parameters
  • username (str) – username for job running

  • rtype (str) – resource type

  • rid_blacklist ([int]) – resource ids to ignore

Returns

a random selection of all available resource IDs

Return type

int

is_job_stopped(jid)[source]

Returns whether or not a specific job stop is pending

Parameters

jid (int) – job ID

Returns

whether or not the given job ID is in the list of pending job stops

Return type

bool

log_error_message(msg)[source]
refresh()[source]

Method for refreshing timers/variables etc

abstract run(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished

run_curve_fitting(interm_res, c_jid, step, comp_fn, curve_fitting_threshold, best_val)[source]
run_job(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running interface, this is called by aup.EE.Experiment.

It is a wrapper for run().

Parameters
  • job (Job) – Job to run

  • rid (int) – resource ID

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – call back function to update result

set_last_multiple_results(eid, jid)[source]
stop_job(jid)[source]

Stop a job for early stopping strategies

Parameters

jid (int) – job ID

suspend()[source]

Suspend job upon request

get_resource_manager(resource, connector, n_parallel, auppath='.aup', **kwargs)[source]

Get resource manager for a specific resource type

Parameters
  • resource (str) – gpu or cpu type resource

  • connector (AbstractConnector) – database connector

  • n_parallel (int) – how many parallel jobs to be run

  • auppath (str) – aup environment folder

Returns

resource manager

Return type

AbstractResourceManager

aup.EE.Resource.CPUResourceManager

Resource Manager for CPUs on a single machine.

However, user can specify arbitrary number for parallel computing, no real control of resources (yet).

APIs

class CPUResourceManager(connector, n_parallel, *args, **kwargs)[source]

Bases: aup.EE.Resource.AbstractResourceManager.AbstractResourceManager

finish(maximize=True, status='FINISHED')[source]

Finish up the resource allocation. :param status: status of the experiment :type status: string

Returns

Max/Min result in experiment (job id, score)

Return type

None | [int, float]

run(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished

aup.EE.Resource.GPUResourceManager

Resource manager for Single GPU machine.

It supports:

  1. Multiple Cards

  2. Multiple jobs running on a shared card (no control over GPU resource limit)

APIs

class GPUResourceManager(connector, n_parallel, auppath='.aup', *args, **kwargs)[source]

Bases: aup.EE.Resource.CPUResourceManager.CPUResourceManager

run(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished

aup.EE.Resource.SSHResourceManager

APIs

class SSHResourceManager(connector, n_parallel, key='node_mapping', auppath='.aup', async_reconnect=30, async_timeout=None, async_run=False, reconn_wait_time=30, max_retries=3, **kwargs)[source]

Bases: aup.EE.Resource.CPUResourceManager.CPUResourceManager

static load_node_mapping(key='node_mapping', auppath='.aup')[source]

Loads ssh configurations from file.

refresh()[source]

Method for refreshing timers/variables etc

run(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished

parse_hostname(host)[source]

Parse the host name, in the following formats:

  • username@ip or

  • username@ip:port or

  • username@ip ssh_key or

  • username@ip:port ssh_key

Parameters

host (str) – host name string

Returns

username, hostname, port=22, key (parsed from ~/.ssh/id_rsa)

aup.EE.Resource.AWSResourceManager

APIs

class AWSResourceManager(*args, **kwargs)[source]

Bases: aup.EE.Resource.SSHResourceManager.SSHResourceManager

run(job, rid, *args, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished

aup.EE.Resource.PassiveResourceManger

Leave the user to run script interactively.

  • It supports only one job running at a time.

  • It prints the command on the screen and asks user to return the value

APIs

class PassiveResourceManager(connector, *args, **kwargs)[source]

Bases: aup.EE.Resource.AbstractResourceManager.AbstractResourceManager

get_available(username, rtype)[source]

method to get the available resource to run a job

Parameters
  • username (str) – username for job running

  • rtype (str) – resource type

  • rid_blacklist ([int]) – resource ids to ignore

Returns

a random selection of all available resource IDs

Return type

int

run(job, rid, exp_config, call_back_func, **kwargs)[source]

Job running implemented for the specific resource manager. It is called by run_job().

Parameters
  • job (Job) – a job object

  • rid (int) – resource id returned from get_available().

  • exp_config (BasicConfig) – experiment configuration

  • call_back_func (function object) – function to trigger after job finished