Agent module

baconian.core.agent

class baconian.core.agent.Agent(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)
INIT_STATUS = 'CREATED'
STATUS_LIST = ('CREATED', 'INITED', 'TRAIN', 'TEST')
__init__(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)
Parameters:
  • name (str) – the name of the agent instance
  • env (Env) – environment that interacts with agent
  • algo (Algo) – algorithm of the agent
  • env_spec (EnvSpec) – environment specifications: action apace and environment space
  • sampler (Sampler) – sampler
  • reset_noise_every_terminal_state (bool) – reset the noise every sampled trajectory
  • reset_state_every_sample (bool) – reset the state everytime perofrm the sample/rollout
  • noise_adder (AgentActionNoiseWrapper) – add action noise for exploration in action space
  • exploration_strategy (ExplorationStrategy) – exploration strategy in action space
  • algo_saving_scheduler (EventSchedule) – control the schedule the varying parameters in training process
init()

Initialize the algorithm, and set status to ‘INITED’.

is_testing

Check whether the agent is testing. Return a boolean value.

Returns:true if the agent is testing
Return type:bool
is_training

Check whether the agent is training. Return a boolean value.

Returns:true if the agent is training
Return type:bool
predict(**kwargs)

predict the action given the state

Parameters:kwargs – rest parameters, include key: obs
Returns:predicted action
Return type:numpy ndarray
required_key_dict = {}
reset_on_terminal_state()
sample(env, sample_count: int, in_which_status: str = 'TRAIN', store_flag=False, sample_type: str = 'transition') -> (<class 'baconian.common.sampler.sample_data.TransitionData'>, <class 'baconian.common.sampler.sample_data.TrajectoryData'>)

sample a certain number of data from environment

Parameters:
  • env – environment to sample
  • sample_count – int, sample count
  • in_which_status – string, environment status
  • store_flag – to store environment samples or not, default False
  • sample_type – the type of sample, ‘transition’ by default
Returns:

sample data from environment

Return type:

some subclass of SampleData: TrajectoryData or TransitionData

store_samples(samples: baconian.common.sampler.sample_data.SampleData)

store the samples into memory/replay buffer if the algorithm that agent hold need to do so, like DQN, DDPG

Parameters:samples (SampleData) – sample data of the experiment
test(sample_count) → baconian.common.sampler.sample_data.SampleData

test the agent

Parameters:sample_count (int) – how many trajectories used to evaluate the agent’s performance
Returns:SampleData object.
train(*args, **kwargs)

train the agent

Returns:True for successfully train the agent, false if memory buffer did not have enough data.
Return type:bool