Agents names

agent_i for i in [0, 3]

Action Space

Box(-1.0, 1.0, (3,), float32)

Observation Space

Box([-3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0.], 3.0, (15,), float32)

Reward Space

Box(-10.0, [ 1. inf], (2,), float32)



A Parallel environment where drones learn how to surround a static target point.

Observation Space

The observation space is a continuous box with the length (num_drones + 1) * 3 where each 3 values represent the XYZ coordinates of the drones in this order:

  • the agent.

  • the target.

  • the other agents.

Example: [x_0, y_0, z_0, x_targ, y_targ, z_targ, x_1, y_1, z_1, ..., x_n, y_n, z_n]

Action Space

The action space is a 3D speed vector representing the direction in which the agent should move.

Reward Space

The reward space is a 2D vector containing rewards for:

  • Minimizing distance towards the target

  • Maximizing average distance towards other agents (avoiding collision).

Starting State

Where size = 3, the initial starting positions of the agents are [0, 0, 1], [1, 1, 1], [0, 1, 1], [2, 2, 1] while the target position is [1, 1, 2.5]

Episode Termination

The episode is terminated if one of the following conditions are met:

  • 2 agents collide.

  • An agent and the target collide.

  • An agent collides with the ground.

Episode Truncation

The episode is truncated when an agent reaches 200 steps.


  • render_mode (str, optional): The mode to display the rendering of the environment. Can be human or None.

  • size (int, optional): Size of the area sides

  • num_drones (int, optional): Amount of drones

  • init_flying_pos (nparray[float], optional): 2d array containing the coordinates of the agents is a (3)-shaped array containing the initial XYZ position of the drones.

  • init_target_location (nparray[float], optional): A (3)-shaped array for the XYZ position of the target.

  • target_speed (float, optional): Distance traveled by the target at each timestep

  • final_target_location (nparray[float], optional): Array of the final position of the moving target

  • num_intermediate_points (int, optional): Number of intermediate points in the target trajectory


The code was adapted from Felten’s source. See also the YouTube video here.