core

Core AgentOS classes.

class agentos.core.Agent(**kwargs)

An Agent observes and takes actions in its Environment.

The primary methods on an Agent are:

Agent.advance() - Takes on action within the Environment as
determined by the Agent’s policy.

Agent.rollout() - Advances the Agent through one episode within its
Environment allowing it to gather experience and learn.

__init__(**kwargs) → None: Sets all the kwargs as members on the class

property active_agent_run: agentos.agent_run.AgentRun

advance() → None: Takes one action within the Environment as dictated by the Policy

end_agent_run(print_results: bool = False) → None

evaluate(num_episodes, should_learn=False, max_transitions=None, backup_dst=None, print_stats=True, parent_run=None) → None

Runs an agent specified by a given [agent_file]

Parameters

num_episodes – number of episodes to run the agent through
should_learn – boolean, if True we will call policy.improve
max_transitions – If not None, max transitions performed before truncating an episode.
backup_dst – if specified, will print backup path to stdout
print_stats – if True, will print run stats to stdout
parent_run – If set, then the AgentRun created by this function will set this as their parent. Else, it will try to set the currently active component run, else it won’t set a parent.

Returns

None

learn(num_episodes, test_every, test_num_episodes, max_transitions=None) → None: Trains an agent by calling its learn() method in a loop.

rollout(should_learn, max_transitions=None) → int

Generates one episode of transitions and allows the Agent to learn from its experience.

Parameters

should_learn – if True, then Trainer.improve() will be called every time the Agent advances one step through the environment and the core training metrics (step_count and episode_count) will be updated after the rollout.
max_transitions – If not None, the episode and rollout will be truncated after the specified number of transitions.

Returns

Number of transitions experienced in this episode.

start_agent_run(run_type: str, parent: agentos.agent_run.AgentRun) → None

class agentos.core.Dataset(**kwargs)

The Dataset is responsible for recording the experience of the Agent so that it can be used later for training.

The primary methods on Dataset are:

add() - Adds a transition into the Dataset.
next() - Pulls a set of transitions from the Dataset for learning.

add(prev_obs, action, curr_obs, reward, done, info): Adds a transition into the Dataset

next(*args, **kwargs): Pulls a set of transitions from the Dataset for learning

class agentos.core.Environment(**kwargs)

An Env inspired by OpenAI’s gym.Env and DM_Env https://github.com/openai/gym/blob/master/gym/core.py https://github.com/deepmind/dm_env/blob/master/docs/index.md

__init__(**kwargs): Sets all the kwargs as members on the class

close(mode)

get_spec()

render(mode)

reset(): Resets the environment to an initial state.

seed(seed)

step(action): Perform the action in the environment.

class agentos.core.EnvironmentSpec(observations, actions, rewards, discounts)

actions: Alias for field number 1

discounts: Alias for field number 3

observations: Alias for field number 0

rewards: Alias for field number 2

class agentos.core.MemberInitializer(**kwargs)

Takes all constructor kwargs and sets them as class members.

For example, if MyClass is a MemberInitializer:

a = MyClass(foo=’bar’) assert a.foo == ‘bar’

__init__(**kwargs): Sets all the kwargs as members on the class

class agentos.core.Policy(**kwargs)

Pick next action based on last observation from environment.

Policies are used by Agents to encapsulate any state or logic necessary to decide on a next action given the last observation from an env.

decide(observation)

Takes an observation and valid actions and returns next action to take.

Parameters: observation – should be in the observation_space of the environments that this policy is compatible with.
Returns: action to take, should be in action_space of the environments that this policy is compatible with.

class agentos.core.Runnable

run(hz=40, max_iters=None, as_thread=False) → Optional[threading.Thread]

Run an agent, optionally in a new thread.

If as_thread is True, agent is run in a thread, and the thread object is returned to the caller. The caller may need to call join on that that thread depending on their use case for this agent_run.

Parameters

agent – The agent object you want to run
hz – Rate at which to call agent’s advance function. If None, call advance repeatedly in a tight loop (i.e., as fast as possible).
max_iters – Maximum times to call agent’s advance function, defaults to None.
as_thread – Set to True to run this agent in a new thread, defaults to False.

Returns

Either a running thread (if as_thread=True) or None.

class agentos.core.Trainer(**kwargs)

The Trainer is responsible for improving the Policy of the Agent as experience is collected.

The primary method on the Trainer is the improve() method which gets called for every step taken within the episode. It is up to the particular implementation to decide if this tempo is appropriate for training.

improve(dataset, policy): This method updates the policy based on the experience in the dataset

agentos.core.default_rollout_step(policy, obs, step_num): The default rollout step function is the policy’s decide function. A rollout step function allows a developer to specify the behavior that will occur at every step of the rollout–given a policy and the last observation from the env–to decide what action to take next. This usually involves the rollout’s policy and may perform learning. It also, may involve using, updating, or saving learning related state including hyper-parameters such as epsilon in epsilon greedy. You can provide your own function with the same signature as this default if you want to have a more complex behavior at each step of the rollout.

agentos.core.rollout(policy, env_class, step_fn=<function default_rollout_step>, max_steps=None)

Perform rollout using provided policy and env.

Parameters

policy – policy to use when simulating these episodes.
env_class – class to instantiate an env object from.
step_fn –
a function to be called at each step of rollout. The function can have 2 or 3 parameters, and must return an action:
- 2 parameter definition: policy, observation.
- 3 parameter definition: policy, observation, step_num.
Default value is agentos.core.default_rollout_step.
max_steps – cap on number of steps per episode.

Returns

the trajectory that was followed during this rollout. A trajectory is a named tuple that contains the initial observation (a scalar) as well as the following arrays: actions, observations, rewards, dones, contexts. The ith entry of each array corresponds to the action taken at the ith step of the rollout, and the respective results returned by the environment after taking that action. To learn more about the semantics of these, see the documentation and code of gym.Env.

agentos.core.rollouts(policy, env_class, num_rollouts, step_fn=<function default_rollout_step>, max_steps=None)

Parameters

policy – policy to use when simulating these episodes.
env_class – class to instatiate an env object from.
num_rollouts – how many rollouts (i.e., episodes) to perform
step_fn – a function to be called at each step of each rollout. The function can have 2 or 3 parameters. 2 parameter definition: policy, observation. 3 parameter definition: policy, observation, step_num. The function must return an action.
max_steps – cap on number of steps per episode.

Returns

array with one namedtuple per rollout, each tuple containing the following arrays: observations, rewards, dones, ctxs