Agent module¶
baconian.core.agent¶
-
class
baconian.core.agent.
Agent
(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)¶ -
INIT_STATUS
= 'CREATED'¶
-
STATUS_LIST
= ('CREATED', 'INITED', 'TRAIN', 'TEST')¶
-
__init__
(name, env: (<class 'baconian.core.core.Env'>, <class 'baconian.envs.env_wrapper.Wrapper'>), algo: baconian.algo.algo.Algo, env_spec: baconian.core.core.EnvSpec, sampler: baconian.common.sampler.sampler.Sampler = None, noise_adder: baconian.common.noise.AgentActionNoiseWrapper = None, reset_noise_every_terminal_state=False, reset_state_every_sample=False, exploration_strategy: baconian.algo.misc.epsilon_greedy.ExplorationStrategy = None, algo_saving_scheduler: baconian.common.schedules.EventScheduler = None)¶ Parameters: - name (str) – the name of the agent instance
- env (Env) – environment that interacts with agent
- algo (Algo) – algorithm of the agent
- env_spec (EnvSpec) – environment specifications: action apace and environment space
- sampler (Sampler) – sampler
- reset_noise_every_terminal_state (bool) – reset the noise every sampled trajectory
- reset_state_every_sample (bool) – reset the state everytime perofrm the sample/rollout
- noise_adder (AgentActionNoiseWrapper) – add action noise for exploration in action space
- exploration_strategy (ExplorationStrategy) – exploration strategy in action space
- algo_saving_scheduler (EventSchedule) – control the schedule the varying parameters in training process
-
init
()¶ Initialize the algorithm, and set status to ‘INITED’.
-
is_testing
¶ Check whether the agent is testing. Return a boolean value.
Returns: true if the agent is testing Return type: bool
-
is_training
¶ Check whether the agent is training. Return a boolean value.
Returns: true if the agent is training Return type: bool
-
predict
(**kwargs)¶ predict the action given the state
Parameters: kwargs – rest parameters, include key: obs Returns: predicted action Return type: numpy ndarray
-
required_key_dict
= {}¶
-
reset_on_terminal_state
()¶
-
sample
(env, sample_count: int, in_which_status: str = 'TRAIN', store_flag=False, sample_type: str = 'transition') -> (<class 'baconian.common.sampler.sample_data.TransitionData'>, <class 'baconian.common.sampler.sample_data.TrajectoryData'>)¶ sample a certain number of data from environment
Parameters: - env – environment to sample
- sample_count – int, sample count
- in_which_status – string, environment status
- store_flag – to store environment samples or not, default False
- sample_type – the type of sample, ‘transition’ by default
Returns: sample data from environment
Return type: some subclass of SampleData: TrajectoryData or TransitionData
-
store_samples
(samples: baconian.common.sampler.sample_data.SampleData)¶ store the samples into memory/replay buffer if the algorithm that agent hold need to do so, like DQN, DDPG
Parameters: samples (SampleData) – sample data of the experiment
-
test
(sample_count) → baconian.common.sampler.sample_data.SampleData¶ test the agent
Parameters: sample_count (int) – how many trajectories used to evaluate the agent’s performance Returns: SampleData object.
-
train
(*args, **kwargs)¶ train the agent
Returns: True for successfully train the agent, false if memory buffer did not have enough data. Return type: bool
-