******************
Programming Agents
******************
.. contents:: Contents:
   :depth: 1
   :local:
.. _gym.Env: https://github.com/openai/gym/blob/master/gym/core.py

To develop an agent using AgentOS, the most important concepts to understand
are Agents, Policies, and Environments (or Envs).

An Agent is an entity that can take action over time. It must have an
environment. It can also have one or more Policy that it uses to make
decisions.

.. todo::

   Why is is an agent required to have an Env but not required to have a
   policy? Why not make the Agent API even more minimal and only require it
   have an advance function and leave it up to the agent developer to decide
   how the agent ends up with an env?


Environments
============
Environments can be either simulators (e.g., CartPole) or connectors to the
real world (e.g., an environment for a chatbot that passes messages back and
forth between the agent and other agents, humans, etc.).

AgentOS does not define its own Environment API, instead we reuse `gym.Env`_.
Environments must:

* Descend from ``gym.Env``.
* Define a function ``step(action) -> observation, reward, done, context``
  that takes an action and returns an observation, reward, etc.
* Define action and observation spaces.


Policies
========

In AgentOS, a Policy is a function that takes an observation as input and
returns an action. Policies must:

* Descend from ``AgentOS.Policy``.
* Define the ``compute_action(observation) -> action`` function.
* Define action and observation spaces.

.. todo::
   We should rename ``compute_action()`` to be something more concise and with
   semantics inspired by ``Env.step()``, such as ``decide()``.


Agents: putting it all together
================================
The architeture and API that AgentOS provides for Agents is minimal in order to
provide flexibility, because different agents should be able to perform very
different types of tasks. But it is also expected that Agents will be highly
sophisticated. So then most of the complexity of agents will be outside of
the core AgentOS abstraction (e.g., the ``agentos.Agent`` class).

To be compatible with AgentOS, an agent class must:

* Descend from ``agentos.Agent``.  Agents must take an environment class as the
* first argument to their
  ``__init__()`` function.
* Agents must define a instance function called ``advance()`` that returns a
  boolean.

That's it. It is up to each agent developer how they want to structure the
internals of their agent, but from AgentOS's perspective, the only way that an
agent can do anything is via its ``advance()`` function.


Guidelines for structuring ``advance()``
----------------------------------------
We recommend Agents keep the advance function as clean and minimal as possible,
with code living in other functions that are called with in the advance
function, or even better in other modules. Agent's are intended to be minimal
and easy to read, and mostly be used to import and compose functionality
contained in "agent libraries" (see :doc:`architecture_and_design`).


Background on agent design
--------------------------
This design is inspired by operating systems where the core kernel code is kept
minimal and most functionality is implemented in libraries (cite microkernels,
exakernel).


Rollouts
========
A rollout, also called an episode, is a concept that comes from Reinforcement
Learning. Conceptually, you can think of a rollout as a simulation of an agent
`advance()`-ing through time in order to learn.

Technically, a rollout is a process involving an instance of a
Policy and an instance of an Env that proceeds as demonstrated by the
following pseudocode:

.. code-block:: none

  def rollout(Env_class, Policy_class):
      """Pseudocode implementation of simplified rollout function.
      See agentos/core.py for the actual implementation."""
      env = get new instance of Env
      obs = initial observation from env
      policy = initialize a new Policy
      trajectory = []
      done = False
      until done:
          action = policy.compute_action(obs)
          obs, reward, done, _ = env.step(action)
          trajectory += [action, obs, reward]
      return trajectory

As you can see, performing a rollout generates a ``trajectory``, which you
can think of as a simulation of how an agent might advance through the given
environment, and what rewards it might receive along the way, if it were
to use the given policy.

Different types of agents and algorithms might use rollouts for
different purposes, but rollouts always consist of the same basic structure.

Since rollouts are used frequently and have a standard structure, AgentOS
includes the ``agentos.core.rollout()`` utility function, but **note that the
psuedocode above is a simplified version of** ``agentos.core.rollout()``.