Surround¶


Agents names	`agent_i for i in [0, 3]`
Action Space	Box(-1.0, 1.0, (3,), float32)
Observation Space	Box([-3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0.], 3.0, (15,), float32)
Reward Space	Box(-10.0, [ 1. inf], (2,), float32)
Import	`momaland.envs.surround_v0`

A Parallel environment where drones learn how to surround a static target point.

Observation Space¶

The observation space is a continuous box with the length (num_drones + 1) * 3 where each 3 values represent the XYZ coordinates of the drones in this order:

the agent.
the target.
the other agents.

Example: [x_0, y_0, z_0, x_targ, y_targ, z_targ, x_1, y_1, z_1, ..., x_n, y_n, z_n]

Action Space¶

The action space is a 3D speed vector representing the direction in which the agent should move.

Reward Space¶

The reward space is a 2D vector containing rewards for:

Minimizing distance towards the target
Maximizing average distance towards other agents (avoiding collision).

Starting State¶

Where size = 3, the initial starting positions of the agents are [0, 0, 1], [1, 1, 1], [0, 1, 1], [2, 2, 1] while the target position is [1, 1, 2.5]

Episode Termination¶

The episode is terminated if one of the following conditions are met:

2 agents collide.
An agent and the target collide.
An agent collides with the ground.

Episode Truncation¶

The episode is truncated when an agent reaches 200 steps.

Arguments¶

render_mode (str, optional): The mode to display the rendering of the environment. Can be human or None.
size (int, optional): Size of the area sides
num_drones (int, optional): Amount of drones
init_flying_pos (nparray[float], optional): 2d array containing the coordinates of the agents is a (3)-shaped array containing the initial XYZ position of the drones.
init_target_location (nparray[float], optional): A (3)-shaped array for the XYZ position of the target.
target_speed (float, optional): Distance traveled by the target at each timestep
final_target_location (nparray[float], optional): Array of the final position of the moving target
num_intermediate_points (int, optional): Number of intermediate points in the target trajectory

Credits¶

The code was adapted from Felten’s source. See also the YouTube video here.