Programming Agents

Contents:

Environments
Policies
Agents: putting it all together
Rollouts

To develop an agent using AgentOS, the most important concepts to understand are Agents, Policies, and Environments (or Envs).

An Agent is an entity that can take action over time. It must have an environment. It can also have one or more Policy that it uses to make decisions.

Environments 

Environments can be either simulators (e.g., CartPole) or connectors to the real world (e.g., an environment for a chatbot that passes messages back and forth between the agent and other agents, humans, etc.).

AgentOS does not define its own Environment API, instead we reuse gym.Env. Environments must:

Descend from gym.Env.
Define a function step(action) -> observation, reward, done, context that takes an action and returns an observation, reward, etc.
Define action and observation spaces.

Policies 

In AgentOS, a Policy is a function that takes an observation as input and returns an action. Policies must:

Descend from AgentOS.Policy.
Define the compute_action(observation) -> action function.
Define action and observation spaces.

Agents: putting it all together 

The architeture and API that AgentOS provides for Agents is minimal in order to provide flexibility, because different agents should be able to perform very different types of tasks. But it is also expected that Agents will be highly sophisticated. So then most of the complexity of agents will be outside of the core AgentOS abstraction (e.g., the agentos.Agent class).

To be compatible with AgentOS, an agent class must:

Descend from agentos.Agent. Agents must take an environment class as the
first argument to their __init__() function.
Agents must define a instance function called advance() that returns a boolean.

That’s it. It is up to each agent developer how they want to structure the internals of their agent, but from AgentOS’s perspective, the only way that an agent can do anything is via its advance() function.

Guidelines for structuring `advance()`

We recommend Agents keep the advance function as clean and minimal as possible, with code living in other functions that are called with in the advance function, or even better in other modules. Agent’s are intended to be minimal and easy to read, and mostly be used to import and compose functionality contained in “agent libraries” (see Architecture and Design).

Background on agent design

This design is inspired by operating systems where the core kernel code is kept minimal and most functionality is implemented in libraries (cite microkernels, exakernel).

Rollouts 

A rollout, also called an episode, is a concept that comes from Reinforcement Learning. Conceptually, you can think of a rollout as a simulation of an agent advance()-ing through time in order to learn.

Technically, a rollout is a process involving an instance of a Policy and an instance of an Env that proceeds as demonstrated by the following pseudocode:

def rollout(Env_class, Policy_class):
    """Pseudocode implementation of simplified rollout function.
    See agentos/core.py for the actual implementation."""
    env = get new instance of Env
    obs = initial observation from env
    policy = initialize a new Policy
    trajectory = []
    done = False
    until done:
        action = policy.compute_action(obs)
        obs, reward, done, _ = env.step(action)
        trajectory += [action, obs, reward]
    return trajectory

As you can see, performing a rollout generates a trajectory, which you can think of as a simulation of how an agent might advance through the given environment, and what rewards it might receive along the way, if it were to use the given policy.

Different types of agents and algorithms might use rollouts for different purposes, but rollouts always consist of the same basic structure.

Since rollouts are used frequently and have a standard structure, AgentOS includes the agentos.core.rollout() utility function, but note that the psuedocode above is a simplified version of agentos.core.rollout().

Programming Agents

Environments

Policies

Agents: putting it all together

Guidelines for structuring advance()