Agent module¶

baconian.core.agent¶

class baconian.core.agent.Agent(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)¶

INIT_STATUS = 'CREATED'¶

STATUS_LIST = ('CREATED', 'INITED', 'TRAIN', 'TEST')¶

__init__(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)¶

Parameters:

name (str) – the name of the agent instance
env (Env) – environment that interacts with agent
algo (Algo) – algorithm of the agent
env_spec (EnvSpec) – environment specifications: action apace and environment space
sampler (Sampler) – sampler
reset_noise_every_terminal_state (bool) – reset the noise every sampled trajectory
reset_state_every_sample (bool) – reset the state everytime perofrm the sample/rollout
noise_adder (AgentActionNoiseWrapper) – add action noise for exploration in action space
exploration_strategy (ExplorationStrategy) – exploration strategy in action space
algo_saving_scheduler (EventSchedule) – control the schedule the varying parameters in training process

init()¶: Initialize the algorithm, and set status to ‘INITED’.

is_testing¶

Check whether the agent is testing. Return a boolean value.

Returns:	true if the agent is testing
Return type:	bool

is_training¶

Check whether the agent is training. Return a boolean value.

Returns:	true if the agent is training
Return type:	bool

predict(**kwargs)¶

predict the action given the state

Parameters:	kwargs – rest parameters, include key: obs
Returns:	predicted action
Return type:	numpy ndarray

required_key_dict = {}¶

reset_on_terminal_state()¶

sample(env, sample_count: int, in_which_status: str = 'TRAIN', store_flag=False, sample_type: str = 'transition') -> (<class 'baconian.common.sampler.sample_data.TransitionData'>, <class 'baconian.common.sampler.sample_data.TrajectoryData'>)¶

sample a certain number of data from environment

Parameters:	env – environment to sample sample_count – int, sample count in_which_status – string, environment status store_flag – to store environment samples or not, default False sample_type – the type of sample, ‘transition’ by default
Returns:	sample data from environment
Return type:	some subclass of SampleData: TrajectoryData or TransitionData

store_samples(samples: baconian.common.sampler.sample_data.SampleData)¶

store the samples into memory/replay buffer if the algorithm that agent hold need to do so, like DQN, DDPG

Parameters:	samples (SampleData) – sample data of the experiment

test(sample_count) → baconian.common.sampler.sample_data.SampleData¶

test the agent

Parameters:	sample_count (int) – how many trajectories used to evaluate the agent’s performance
Returns:	SampleData object.

train(*args, **kwargs)¶

train the agent

Returns:	True for successfully train the agent, false if memory buffer did not have enough data.
Return type:	bool