Escort¶
Agents names |
|
Action Space |
Box(-1.0, 1.0, (3,), float32) |
Observation Space |
Box([-3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0. -3. -3. 0.], 3.0, (15,), float32) |
Reward Space |
Box(-10.0, [ 1. inf], (2,), float32) |
Import |
|
A Parallel
environment where drones learn how to escort a moving target.
Observation Space¶
The observation space is a continuous box with the length (num_drones + 1) * 3
where each 3 values represent the XYZ coordinates of the drones in this order:
the agent.
the target.
the other agents.
Example:
[x_0, y_0, z_0, x_targ, y_targ, z_targ, x_1, y_1, z_1, ..., x_n, y_n, z_n]
Action Space¶
The action space is a 3D speed vector representing the direction in which the agent should move.
Reward Space¶
The reward space is a 2D vector containing rewards for:
Minimizing distance towards the target
Maximizing average distance towards other agents (avoiding collision).
Starting State¶
Where size = 3
, the initial starting positions of the agents are [0, 0, 1], [1, 1, 1], [0, 1, 1], [2, 2, 1]
while the target position is [1, 1, 2.5]
Episode Termination¶
The episode is terminated if one of the following conditions are met:
2 agents collide.
An agent and the target collide.
An agent collides with the ground.
Episode Truncation¶
The episode is truncated when an agent reaches 200 steps.
Arguments¶
render_mode (str, optional)
: The mode to display the rendering of the environment. Can be human or None.size (int, optional)
: Size of the area sidesnum_drones (int, optional)
: Amount of dronesinit_flying_pos (nparray[float], optional)
: 2d array containing the coordinates of the agents is a (3)-shaped array containing the initial XYZ position of the drones.init_target_location (nparray[float], optional)
: A (3)-shaped array for the XYZ position of the target.target_speed (float, optional)
: Distance traveled by the target at each timestepfinal_target_location (nparray[float], optional)
: Array of the final position of the moving targetnum_intermediate_points (int, optional)
: Number of intermediate points in the target trajectory
Credits¶
The code was adapted from Felten’s source. See also the YouTube video here.