Agents names

agent_i for i in [0, 99]

Action Space


Observation Space

Box(0.0, 100.0, (5,), float32)

Reward Space

Box(0.0, 12.881808, (2,), float32)



A Parallel 2-objective environment of the Beach problem domain.

Observation Space

The observation space is a continuous box with the length 5 containing:

  • agent type

  • section id (where the agent is)

  • section capacity

  • section consumption

  • percentage of agents of the agent’s type in the section

Example: [a_type, section_id, section_capacity, section_consumption, %_of_a_of_current_type]

Action Space

The action space is a Discrete space [0, 1, 2], corresponding to moving left, moving right, staying in place.

Reward Space

The reward space is a 2D vector containing rewards for two different modes (‘individual’ or ‘team’) for:

  • the occupation level

  • the mixture level If the mode is ‘individual’, the reward is given for the currently occupied section. If the mode is ‘team’, the reward is summed over all sections.

Starting State

The initial position is a uniform random distribution of agents over the sections. This can be changed via the ‘position_distribution’ argument. The agent types are also randomly distributed according to the ‘type_distribution’ argument. The default is a uniform distribution over all types.

Episode Termination

The episode is terminated if num_timesteps is reached. The default value is 100. Agents only receive the reward after the last timestep.

Episode Truncation

The problem is not truncated. It has a maximum number of timesteps.


  • ‘num_timesteps (int)’: number of timesteps in the domain. Default: 1

  • ‘num_agents (int)’: number of agents in the domain. Default: 100

  • ‘reward_mode (str)’: the reward mode to use (‘individual’, or ‘team’). Default: individual

  • ‘sections (int)’: number of beach sections in the domain. Default: 6

  • ‘capacity (int)’: capacity of each beach section. Default: 7

  • ‘type_distribution (tuple)’: the distribution of agent types in the domain. Default: 2 types equally distributed (0.3, 0.7).

  • ‘position_distribution (tuple)’: the initial distribution of agents in the domain. Default: uniform over all sections (None).

  • ‘render_mode (str)’: render mode. Default: None