Resource Managers¶
aup.EE.Resource.AbstractResourceManager¶
Abstract Interface of Resource Managers.
Using get_resource_manager()
to create the corresponding object with the following resource type.
For different resource supports, see Set up environment.
APIs¶
-
class
AbstractResourceManager
(connector, n_parallel, *args, **kwargs)[source]¶ Bases:
abc.ABC
Create Resource to run jobs.
- Parameters
connector (AbstractConnector) – Connector to database
-
finish
(status='FINISHED')[source]¶ Finish up the resource allocation. :param status: status of the experiment :type status: string
- Returns
Max/Min result in experiment (job id, score)
- Return type
None | [int, float]
-
finish_job
(jid, score, status=None)[source]¶ Finish one job
- Parameters
jid (int) – job ID
score (float | None) – job for the experiment
-
get_available
(username, rtype, rid_blacklist=None)[source]¶ method to get the available resource to run a job
- Parameters
username (str) – username for job running
rtype (str) – resource type
rid_blacklist ([int]) – resource ids to ignore
- Returns
a random selection of all available resource IDs
- Return type
int
-
is_job_stopped
(jid)[source]¶ Returns whether or not a specific job stop is pending
- Parameters
jid (int) – job ID
- Returns
whether or not the given job ID is in the list of pending job stops
- Return type
bool
-
abstract
run
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-
run_job
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running interface, this is called by
aup.EE.Experiment
.It is a wrapper for
run()
.- Parameters
job (Job) – Job to run
rid (int) – resource ID
exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – call back function to update result
-
get_resource_manager
(resource, connector, n_parallel, auppath='.aup', **kwargs)[source]¶ Get resource manager for a specific resource type
- Parameters
resource (str) – gpu or cpu type resource
connector (AbstractConnector) – database connector
n_parallel (int) – how many parallel jobs to be run
auppath (str) – aup environment folder
- Returns
resource manager
- Return type
aup.EE.Resource.CPUResourceManager¶
Resource Manager for CPUs on a single machine.
However, user can specify arbitrary number for parallel computing, no real control of resources (yet).
APIs¶
-
class
CPUResourceManager
(connector, n_parallel, *args, **kwargs)[source]¶ Bases:
aup.EE.Resource.AbstractResourceManager.AbstractResourceManager
-
finish
(maximize=True, status='FINISHED')[source]¶ Finish up the resource allocation. :param status: status of the experiment :type status: string
- Returns
Max/Min result in experiment (job id, score)
- Return type
None | [int, float]
-
run
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-
aup.EE.Resource.GPUResourceManager¶
Resource manager for Single GPU machine.
It supports:
Multiple Cards
Multiple jobs running on a shared card (no control over GPU resource limit)
APIs¶
-
class
GPUResourceManager
(connector, n_parallel, auppath='.aup', *args, **kwargs)[source]¶ Bases:
aup.EE.Resource.CPUResourceManager.CPUResourceManager
-
run
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-
aup.EE.Resource.SSHResourceManager¶
APIs¶
-
class
SSHResourceManager
(connector, n_parallel, key='node_mapping', auppath='.aup', async_reconnect=30, async_timeout=None, async_run=False, reconn_wait_time=30, max_retries=3, **kwargs)[source]¶ Bases:
aup.EE.Resource.CPUResourceManager.CPUResourceManager
-
static
load_node_mapping
(key='node_mapping', auppath='.aup')[source]¶ Loads ssh configurations from file.
-
run
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-
static
aup.EE.Resource.AWSResourceManager¶
APIs¶
-
class
AWSResourceManager
(*args, **kwargs)[source]¶ Bases:
aup.EE.Resource.SSHResourceManager.SSHResourceManager
-
run
(job, rid, *args, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-
aup.EE.Resource.PassiveResourceManger¶
Leave the user to run script interactively.
It supports only one job running at a time.
It prints the command on the screen and asks user to return the value
APIs¶
-
class
PassiveResourceManager
(connector, *args, **kwargs)[source]¶ Bases:
aup.EE.Resource.AbstractResourceManager.AbstractResourceManager
-
get_available
(username, rtype)[source]¶ method to get the available resource to run a job
- Parameters
username (str) – username for job running
rtype (str) – resource type
rid_blacklist ([int]) – resource ids to ignore
- Returns
a random selection of all available resource IDs
- Return type
int
-
run
(job, rid, exp_config, call_back_func, **kwargs)[source]¶ Job running implemented for the specific resource manager. It is called by
run_job()
.- Parameters
job (Job) – a job object
rid (int) – resource id returned from
get_available()
.exp_config (BasicConfig) – experiment configuration
call_back_func (function object) – function to trigger after job finished
-