MO-ItemGathering¶
Agents names |
|
Action Space |
Discrete(5) |
Observation Space |
Tuple(Discrete(2, start=-2), Box(-2, 3, (8, 8), int64)) |
Reward Space |
Box(0.0, 3.0, (3,), float32) |
Import |
|
A Parallel
multi-objective environment of the Item Gathering problem.
Observation Space¶
The observation space is a tuple containing the agent id (a negative integer) and the 2D map observation, where 0 is an empty cell, negative integers represent agent IDs, and positive integers represent items
Action Space¶
The action space is a Discrete space, where:
0: stay
1: up
2: down
3: left
4: right
Reward Space¶
The reward space is a vector containing rewards for each type of items available in the environment
Starting State¶
The initial position of the agent is determined by the 1 entries of the initial map.
Episode Termination¶
The episode is terminated if all the items have been gathered.
Episode Truncation¶
The episode termination occurs if the maximum number of timesteps is reached.
Arguments¶
‘num_timesteps’: number of timesteps to run the environment for. Default: 10
‘initial_map’: map of the environment. Default: 8x8 grid, 2 agents, 3 objectives (Källström and Heintz, 2019)
‘randomise’: whether to randomise the map, at each episode. Default: False
‘reward_mode’: reward mode for the environment (‘individual’ or ‘team’). Default: ‘individual’
‘render_mode’: render mode for the environment. Default: None