MO-Beach¶
Agents names |
|
Action Space |
Discrete(3) |
Observation Space |
Box(0.0, 100.0, (5,), float32) |
Reward Space |
Box(0.0, 12.881808, (2,), float32) |
Import |
|
A Parallel
2-objective environment of the Beach problem domain.
Observation Space¶
The observation space is a continuous box with the length 5
containing:
agent type
section id (where the agent is)
section capacity
section consumption
percentage of agents of the agent’s type in the section
Example:
[a_type, section_id, section_capacity, section_consumption, %_of_a_of_current_type]
Action Space¶
The action space is a Discrete space [0, 1, 2], corresponding to moving left, moving right, staying in place.
Reward Space¶
The reward space is a 2D vector containing rewards for two different modes (‘individual’ or ‘team’) for:
the occupation level
the mixture level If the mode is ‘individual’, the reward is given for the currently occupied section. If the mode is ‘team’, the reward is summed over all sections.
Starting State¶
The initial position is a uniform random distribution of agents over the sections. This can be changed via the ‘position_distribution’ argument. The agent types are also randomly distributed according to the ‘type_distribution’ argument. The default is a uniform distribution over all types.
Episode Termination¶
The episode is terminated if num_timesteps is reached. The default value is 100. Agents only receive the reward after the last timestep.
Episode Truncation¶
The problem is not truncated. It has a maximum number of timesteps.
Arguments¶
‘num_timesteps (int)’: number of timesteps in the domain. Default: 1
‘num_agents (int)’: number of agents in the domain. Default: 100
‘reward_mode (str)’: the reward mode to use (‘individual’, or ‘team’). Default: individual
‘sections (int)’: number of beach sections in the domain. Default: 6
‘capacity (int)’: capacity of each beach section. Default: 7
‘type_distribution (tuple)’: the distribution of agent types in the domain. Default: 2 types equally distributed (0.3, 0.7).
‘position_distribution (tuple)’: the initial distribution of agents in the domain. Default: uniform over all sections (None).
‘render_mode (str)’: render mode. Default: None